A worldwide epidemiological database for COVID-19 at fine-grained spatial resolution


This database provides the daily time-series of COVID-19 cases, deaths, recovered people, tests, vaccinations, and hospitalizations, for more than 230 countries, 760 regions, and 12,000 lower-level administrative divisions. The geographical entities are associated with identifiers to match with hydrometeorological, geospatial, and mobility data. The database includes policy measures at the national and, when available, sub-national levels. The data acquisition pipeline is open-source and fully automated. As most governments revise the data retrospectively, the database always updates the complete time-series to mirror the original source. Vintage data, immutable snapshots of the data taken each day, are provided to ensure research reproducibility. The latest data are updated on an hourly basis, and the vintage data are available since April 14, 2020. All the data are available in CSV files or SQLite format. By unifying the access to the data, this work makes it possible to study the pandemic on a global scale with high resolution, taking into account within-country variations, nonpharmaceutical interventions, and environmental and exogenous variables.

Scientific Data, vol. 9(1), pag. 1-7. Nature Publishing Group
Emanuele Guidotti
Emanuele Guidotti
PhD Student in Finance

Interdisciplinary research at the intersection of finance, data science, and statistics.