Using the WHO Database

by Ed Sawicki - May 26, 2020

Covis-19 and the WHO

Suppose you're a computer geek working in a news agency that wants to publish COVID-19 data from the World Health Organization (WHO) on their website. The WHO makes the data available on their website, so you can do it without having to reenter the data manually. You download their CSV file and store it on your computer. Then you import it into a spreadsheet. You spend time working your spreadsheet magic so you have a display suitable for inclusion on your company website.

Now you have to copy that display to the web server where it gets integrated into a web page using some process that likely requires more human labor.

The result is pleasing to your bosses and they want daily updates. Great. Now you have to do it again tomorrow when the WHO releases new data. We can say one thing about the state of the information technology industry — it often doesn't see manual processes and the consumption of human assets by routine tasks as something to be corrected.

There's a better way. This article shows how to automate the process almost completely. The thing stopping us from full automation is the WHO. Their COVID-19 data file can't be accessed directly. It takes a human clicking a link to download the file.

We'll use common tools found on a Linux system, primarily the SQLite database. Since the WHO doesn't allow the automatic downloading of its data file directly, you'll first need to manually download it. In this example, the file called WHO-COVID-19-global-data.csv, resides in the /home/ed/Downloads directory.

You'll need to create two files. I named them whodata.sh and whodata.sql but you can use whatever names you like. There are links to these files in the Sources section below. The whodata.sh file is the bash script that ties the various processes together into an application:

#!/bin/bash
# If the WHO allowed access to their data file directly,
# this next line is an example of how to download it.
# wget https://covid19.who.int/path/to/WHO-COVID-19-global-data.csv

sqlite3 whodata.db ".read whodata.sql"

# Add the other parts of the web page.
mv whodata.html temp
cat header temp footer > whodata.html

# securely copy the HTML file to the webserver
scp whodata.html user@domain.tld:/srv/www/domain.tld/

The whodata.sql file contains instructions for SQLite for importing the CSV file into a database table and generating your desired output. These instructions are rather straightforward for someone who knows the SQL language:

DROP TABLE IF EXISTS whodata;
DROP VIEW IF EXISTS today;
CREATE TABLE whodata (date TEXT, country_code TEXT, country TEXT,
        region TEXT, new_cases INTEGER, total_cases INTEGER,
        new_deaths INTEGER, total_deaths INTEGER);
.mode csv
.import /home/ed/Downloads/WHO-COVID-19-global-data.csv whodata

# Remove the time portion of the date-time stamp
UPDATE whodata SET date = SUBSTR(date, 1,INSTR(date,'T')-1);
# Remove parenthetical country descriptions, such as Iran (Islamic Republic of)
UPDATE whodata SET country = SUBSTR(country, 1,INSTR(country,'(')-1)
	WHERE INSTR(country,'(') > 0;

CREATE VIEW today AS SELECT * FROM whodata where
	 date=(SELECT date FROM whodata ORDER BY date DESC LIMIT 1);
SELECT country, new_deaths, total_deaths FROM today
	ORDER BY new_deaths DESC LIMIT 10;
SELECT country, new_cases, total_cases FROM today
	ORDER BY new_cases DESC LIMIT 10;

This works but the resulting output is not suitable for insertion into a webpage. Other instructions need to be added to convert the output into HTML and make it look nice. The SELECT statements need to be modified with the printf function so large numbers display with commas as thousands separators (1,128,534 instead of 1128534).

DROP TABLE IF EXISTS whodata;
DROP VIEW IF EXISTS today;
CREATE TABLE whodata (date TEXT, country_code TEXT, country TEXT,
        region TEXT, new_cases INTEGER, total_cases INTEGER,
        new_deaths INTEGER, total_deaths INTEGER);
.mode csv
.import /home/ed/Downloads/WHO-COVID-19-global-data.csv whodata

# Remove the time portion of the date-time stamp
UPDATE whodata SET date = SUBSTR(date, 1,INSTR(date,'T')-1);
# Remove parenthetical country descriptions, such as Iran (Islamic Republic of)
UPDATE whodata SET country = SUBSTR(country, 1,INSTR(country,'(')-1)
	WHERE INSTR(country,'(') > 0;
CREATE VIEW today AS SELECT * FROM whodata where 
         date=(SELECT date FROM whodata ORDER BY date DESC LIMIT 1);

#output HTML
.mode html
.headers off
.output whodata.html
.print "<h1>COVID-19 data for "
SELECT date FROM today LIMIT 1;
.print "</h1>"
.print "<style>td, th {padding-right:1em;text-align:right;}</style>"
.print "<table>"
.print "<tr><td>Top Ten Countries for new deaths today</td><tr>"
.print "<tr><th>Country</th><th>New</th><th>Total</th></tr>"
SELECT country, printf("%,d",new_deaths), printf("%,d",total_deaths)
	FROM today ORDER BY new_deaths DESC LIMIT 10;
.print "<tr><td>Top Ten Countries for new cases today</td></tr>"
.print "<tr><th>Country</th><th>New</th><th>Total</th></tr>"
SELECT country, printf("%,d",new_cases), printf("%,d",total_cases)
	FROM today ORDER BY new_cases DESC LIMIT 10;
.print "</table>"
.print "<p><small>Data from the World Health Organization</small></p>"

You need to set the whodata.sh file to be executable with the command:

chmod +x whodata.sh

Run the script:

./whodata.sh

It will take less than one second to produce the HTML file. Copying it to the server may take a few seconds. Here's the result when you look at it with a web browser.

COVID-19 data for 2020-05-27

Top Ten Countries by new deaths
CountryNewTotal
Spain 2,201 29,035
Brazil 807 23,473
United States of America 620 97,529
Mexico 239 7,633
Peru 173 3,629
India 170 4,337
Russian Federation 161 3,968
The United Kingdom 134 37,048
Canada 113 6,566
France 98 28,477
Top Ten Countries for new cases
CountryNewTotal
United States of America 15,253 1,634,010
Brazil 11,687 374,898
Russian Federation 8,338 370,680
India 6,387 151,767
The United Kingdom 4,043 265,231
Peru 4,020 123,979
Chile 3,964 77,961
Mexico 2,485 71,105
Saudi Arabia 1,931 76,726
Iran (Islamic Republic of) 1,787 139,511

Data from the World Health Organization

You can have this script run automatically with cron. Cron is a scheduler that will run the whodata.sh script daily at times of your choosing. Since WHO doesn't allow direct access to their file, you don't know when the newest version of their file has been downloaded. So, you configure cron to run periodically, say, every 30 minutes. It's too bad. The process would be less wasteful if WHO allowed us access.

There's room for enhancement, especially in styling (CSS3) the display. You could, for example, add a table of countries with their populations, thus allowing you to display the percentage of cases or deaths to the population.

Knowing how to automate tasks saves time and money. It's made easier when you use a versatile and low-cost platform like Linux. You can hone your skills with the tutorials listed in the Sources section below.

Sources

WHO Coronavirus Disease (COVID-19) Dashboard

SQLite Tutorial

How to Use SCP Command to Securely Transfer Files

A Beginners Guide To Cron Jobs

If you're unfamiliar with SQLite, here are some advocacy documents to consider.

Well-Known Users of SQLite — SQLite is probably the database engine that's installed on more computers on Earth than any other.

35% Faster Than The Filesystem

SQLite As An Application File Format