First of all, shout-out to Johns Hopkins University for posting COVID-19 datasets on their Github. Their datasets can be found here. These are the best datasets I have found so far. They have datasets that include data of confirmed cases, recoveries, and deaths. The data is quite clean and contains data on a province/state-level as well as country-level.
For this particular analysis I have used the ‘confirmed cases’ dataset. I wanted the look at the top 10 countries with the most reported cases as of now. Furthermore, I wanted to see a time series on country-level and on a global scale. Lastly, we look at the progression in the Netherlands.
What is great about this specific dataset is that a new column is added every day with the new reported cases of yesterday. It is important to note that the data is cumulative, this means that every day we see the total number of confirmed cases per country.
Furthermore, the date format in this dataset is mm/dd/yy.
Top 10 countries with the most reported cases
This graph shows the cumulative number of confirmed cases of the top 10 countries as of the 20th of March, 2020. With China still having the highest number of confirmed cases.
Time series of top 5 countries with most reported cases of COVID-19
This graph displays a time series of the confirmed cases in the top 5 countries. While Italy, Spain, Germany, and Iran are still steadily increasing in numbers, we see that China’s cases have started to stagnate since March. We also see that around the time the stagnation in China takes place, cases start to be reported in Europe and Iran.
Even though South Korea has reported cases before Europe and Iran, they are not part of this graph. The countries in this graph were selected on the condition of having the most cases on the 20th of March, 2020.
Time series of top 5 countries with most reported cases of COVID-19 (excluding China and Italy)
Let’s exclude China and Italy for a moment. Here we can see that Spain, Germany and the US have had quite similar trajectories in terms of reported cases over time.
Time series of top 10 countries with most reported cases of COVID-19 (excluding China and Italy)
South Korea clearly stands out in their COVID-19 trajectory. They have managed to quickly respond and flatten the curve to a point where it’s almost stagnant.
Time series globally
This graph displays the global trajectory of the confirmed COVID-19 cases. In February and March there appears to be a small dip around the same time of the month. After that second small dip in March we see a major increase in reported cases around the world.
Time series of reported COVID-19 cases in the Netherlands
This graph looks at the confirmed cases in the Netherlands. March 12 stands out here as there were no reported cases on this day. However, after this stagnation we see a larger increase compared to the trajectory before March 12.
Want to see my full notebook of code to see how I made these graphs? Go to this post.