Get updated with 2019-nCoV (or any) real-time statistics with Google Sheets and Apps Script [1/3]

Claire Chung
5 min readJan 24, 2020

--

As the pneumonia outbreak due to the novel coronavirus 2019-nCoV spreads, many are concerned about its current status, and are hopefully practicing strengthened hygiene already.

Surgical mask and N95 Respirator. You can see most people wearing masks downtown in Hong Kong today as highly suspected cases are announced.

Some open data attempts are made to help the public reach the latest information such as the number of confirmed and suspected cases in each area.

What if you want to catch up with the numbers regularly, say, in particular areas? Of course you can keep scrolling your phone for it, but there are chiller ways to do so.

Procedures

  • Step 1: Import data from an online source with Google Sheets
  • Step 2: Organize new data into new rows upon update
  • Step 3: Set notification

Step 1: Import data

Importing online structured data

Let’s first introduce the key function: IMPORTXML.

According to the documentation, it “imports data from any of various structured data types including XML, HTML, CSV, TSV, and RSS and ATOM XML feeds.”

The syntax is very simple, just: IMPORTXML(url, xpath_query)

In the following, I will use a mainland healthcare community website DXY.cn that has consolidated official statistics for regular update as an example. https://3g.dxy.cn/newh5/view/pneumonia

Note that the website has updated since Jan 27 to use non-HTML4 tags to store the values, so IMPORTXML doesn’t work anymore, and should be fetched with script [to update when I have time], but I will keep this tutorial for reference. An hourly updated HK version (data usually updated daily) serves as a temporary replacement demo in the shared spreadsheet.

https://stackoverflow.com/questions/16519367/google-spreadsheet-importxml-error-na-not-received-any-data-as-a-result-of-xpat

Screenshot for the real-time report of 2019-nCoV statistics in China by DXY.cn

We can see that there is a line of Confirmed (确诊), Suspected (疑似), Healed (治愈), and Died (死亡) cases with respective updating numbers. We will use this for demonstration. You may add numbers from specific provinces of interest to your trial.

Inspect the relevant element and copy its XPath on Google Chrome

After getting the relevant XPaths of the numbers, as well as the timestamp, we place them in the IMPORTXML formula in each cell respectively. Notice that the ID text string is double-quoted, so we need to replace them with single quotes.

Tidy the output

For neater display, we also want to get rid of extra words in the timestamp. One way to do it is with REGEXPEXTRACT.

REGEX, or sometimes abbreviated as REGEXP, is short for “regular expression”, which means denoting the pattern of text with a set of symbols and fixed logic. It is a useful technique in programming, data analysis, and even many power text editors to match, extract, replace or manipulate text strings in other ways. Here is a nice introduction. The syntax is often similar with slight differences in the representation in each programming languages and tool. You can try on https://regex101.com/

For example, REGEXEXTRACT(“123cat456”, “[a-z]”) would yield the word “cat”.

Here, we observe that the date and time only consist of numbers, hyphens (-), and a colon (:), so the regular expression would be [\d\-\:]+, i.e. a series of any digit or hyphen or colon. [] marks a set with content defined between the brackets; \d means any digit and can be replaced by a range expression 0–9; the backslash in \-\: confirms the symbols are used literally instead of denoting other meaning as in \d; and finally, + repeats the previous element, i.e. the set, to make a series.

So here we go.

We put the original IMPORTXML data retrieval command and the regular expression into the REGEXEXTRACT function. Now we have a neat timestamp!

Force per-minute reload [Updated on 2020/01/25]

IMPORTXML() only runs periodically, maybe every 2 hour. We have to force the cells to reload data more frequently.

While we can do it the hard way by setting locks with scripts, there is a simpler way: Keep changing the URL.

As suggested by Martin Franz, we can append unused parameter, e.g. “?abc=xyz”, to the URL for the purpose. Unfortunately, functions that are automatically recalculated at time-intervals, i.e. NOW(), TODAY(), RAND(), and RANDBETWEEN(), cannot be evaluated in a formula. So we need an extra function to copy an automatically refreshed value to another cell.

  1. Generate a random number. For simplicity, I used RANDBETWEEN() to generate integer result.
  2. Go to “File > Spreadsheet settings” to set a per-minute recalculation.
  3. Write a function to transfer the evaluated random value to another cell.
  4. Append the plain random number as a parameter to the URL.
  5. Set up a per-minute trigger to copy and paste the random number.

Below is a more detailed illustration.

Generating a random integer

Go to “Tools > Script Editor”, and place the following script, which basically opens the spreadsheet, get the generated random value in cell C3, and then paste it to D3.

var id = "xxx"; //copy id from URL https://docs.google.com/spreadsheets/d/xxx/function refresh() {
var ss = SpreadsheetApp.openById(id);
var updatesheet = ss.getSheetByName('updating'); // where the IMPORTXML function locates
var r = updatesheet.getRange("C3").getValue();
updatesheet.getRange("D3").setValue(r);
}

We can then append the plain number to the URL with the operator &.

Append the plain random number as a parameter to the URL. Notice the delay as the per-minute run of the copy function does not sync with the spreadsheet recalculation, but it does not matter here.

By setting a trigger to run the refresh function every minute, we force the data reload. We will discuss more on how to use the script editor and set function triggers in the next section.

Let’s see how to log the data and get notified for update!

References

[1] 全国新型肺炎疫情实时动态 — 丁香园·丁香医生https://3g.dxy.cn/newh5/view/pneumonia

[2] Simple Web Scraping using Google Sheets (2020 updated) https://www.octoparse.com/blog/simple-web-scraping-using-google-sheets

[3] ImportXML using Google Spreadsheets and Save Results Over-Time
https://stackoverflow.com/questions/37490561/importxml-using-google-spreadsheets-and-save-results-over-time

[4] Periodically refresh IMPORTXML() spreadsheet function
https://stackoverflow.com/questions/33872967/periodically-refresh-importxml-spreadsheet-function

Please clap here as well if you like my post :)

--

--

No responses yet