Novel Corona Virus (COVID-19) epidemiological data since 22 January 2020. The data is compiled by the Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE) from various sources including the World Health Organization (WHO), DXY.cn. Pneumonia. 2020, BNO News, National Health Commission of the People’s Republic of China (NHC), China CDC (CCDC), Hong Kong Department of Health, Macau Government, Taiwan CDC, US CDC, Government of Canada, Australia Government Department of Health, European Centre for Disease Prevention and Control (ECDC), Ministry of Health Singapore (MOH).
The one of the best GitHub repo we found on coronavirus is data from John Hopkins University CSSE. Us the link below to set up your own analysis on COVID-19.
This dataset is an import of data from John Hopkins University CSSE. The source of this data is this GitHub Repository. Additionally, there are tables on disease characteristics sourced from the China Center for Disease Control Feb. 11, 2020 report. Lastly, there is individual case details sourced from the Singapore government, the Hong Kong government, the South Korean government, and the Philippines government. This data updates hourly.
On the case_details_virological_dot_org branch, we have imported data from virological.org. This data is of suspect quality but it is voluminous with approximately 45,000 individual cases, thus, the separate branch.
This blog post published on Feb 23, 2020 describes the dataset, how it was modeled, how it is imported, and some the features dolt provides. It’s the best source of documentation on the dataset.
https://github.com/CSSEGISandData/COVID-19
Novel Corona Virus (COVID-19) epidemiological data since 22 January 2020. The data is compiled by the Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE) from various sources including the World Health Organization (WHO), DXY.cn. Pneumonia. 2020, BNO News, National Health Commission of the People’s Republic of China (NHC), China CDC (CCDC), Hong Kong Department of Health, Macau Government, Taiwan CDC, US CDC, Government of Canada, Australia Government Department of Health, European Centre for Disease Prevention and Control (ECDC), Ministry of Health Singapore (MOH).
JSU CCSE maintains the data on the 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository on github. Fields available in the data include Province/State, Country/Region, Last Update, Confirmed, Suspected, Recovered, Deaths.
This dataset is an import of data from John Hopkins University CSSE. The source of this data is this GitHub Repository. Additionally, there are tables on disease characteristics sourced from the China Center for Disease Control Feb. 11, 2020 report. Lastly, there is individual case details sourced from the Singapore government, the Hong Kong government, the South Korean government, and the Philippines government. This data updates hourly.
On the case_details_virological_dot_org branch, we have imported data from virological.org. This data is of suspect quality but it is voluminous with approximately 45,000 individual cases, thus, the separate branch.
This blog post published on Feb 23, 2020 describes the dataset, how it was modeled, how it is imported, and some the features dolt provides. It’s the best source of documentation on the dataset.
Kaggle Dataset
https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset
It uses the same Johns Hopkins Github repository as the dataset. However, the contry level datasets are also provided.
Content
2019 Novel Coronavirus (2019-nCoV) is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people – CDC
This dataset has daily level information on the number of affected cases, deaths and recovery from 2019 novel coronavirus. Please note that this is a time series data and so the number of cases on any given day is the cumulative number.
The data is available from 22 Jan, 2020.
Column Description
Main file in this dataset is covid_19_data.csv
and the detailed descriptions are below.
covid_19_data.csv
- Sno – Serial number
- ObservationDate – Date of the observation in MM/DD/YYYY
- Province/State – Province or state of the observation (Could be empty when missing)
- Country/Region – Country of observation
- Last Update – Time in UTC at which the row is updated for the given province or country. (Not standardised and so please clean before using it)
- Confirmed – Cumulative number of confirmed cases till that date
- Deaths – Cumulative number of of deaths till that date
- Recovered – Cumulative number of recovered cases till that date
2019_ncov_data.csv
This is older file and is not being updated now. Please use the covid_19_data.csv
file
Added two new files with individual level information
COVID_open_line_list_data.csv
This file is obtained from this link
COVID19_line_list_data.csv
This files is obtained from this link
Country level datasets
If you are interested in knowing country level data, please refer to the following Kaggle datasets:
India – https://www.kaggle.com/sudalairajkumar/covid19-in-india
South Korea – https://www.kaggle.com/kimjihoo/coronavirusdataset
Italy – https://www.kaggle.com/sudalairajkumar/covid19-in-italy
Brazil – https://www.kaggle.com/unanimad/corona-virus-brazil
USA – https://www.kaggle.com/sudalairajkumar/covid19-in-usa
Switzerland – https://www.kaggle.com/daenuprobst/covid19-cases-switzerland
Indonesia – https://www.kaggle.com/ardisragen/indonesia-coronavirus-cases