Analysis: Video Game Sales

After scavenging Kaggle for new datasets to play around with, I found an older one I have been interested in for a while now: video game sales. It’s a dataset from about three years ago that is scraped from a website that looks at video games sales and ratings.


The data is three years old, which is quite unfortunate as the video game market has greatly expanded over the last years. Multiplayer online games such as Fortnite now have a dominant position on the market. Therefore, I wanted to find a way to get a dataset that contains data from the last three years as well. After scraping for hours, I have an up-to-date dataset. However, I quickly noticed that the dataset I ‘created’ was missing a lot of important values. Thus, I have decided to stick to the dataset I found on Kaggle.

If you want to try using the scraping script that I found on Github, download the script here. I would recommend using time sleep in the ‘for loop’ that scrapes the data. If you do not do this, you might get an “error” (HTTPS 429) as you’re sending too many requests in a short amount of time to their server.

import time

I found that the shortest time possible to not get an error was 25 seconds. You will have to let it run for days if you want to have a full dataset. But if you would rather not let it run for days, try changing the amount of ‘pages’ at the beginning of the script. This will reduce the amount of data, but you won’t have to wait for 4 days for your data to be done. I ended up scraping 1 page. Unfortunately, I did notice that the scraped data had a lot of missing values.

For all of these reasons, I decided to stick to the premade dataset I found on Kaggle.

If you do decide to scrape the whole dataset, my advice would be to slightly change the scraping script. Try moving up the portion of the code that saves the scraped to dataframe and csv file. If your internet connection drops or you get some kind of error, at least you will still have data saved to disk.


Please keep in mind that the dataset is from about three years ago. Therefore, you will not see games such as Minecraft or PUBG on the list.

For the analysis I wanted to know 4 different things:

  • Top 10 titles in gaming
  • Sales per publisher
  • Sales per year
  • Sales per platform

Top 10 titles in gaming

Here we see the top 10 titles in video games (three years ago), the ranking is based on the amount of global sales. As you can see Wii Sports did quite well. I have a speculation for the first place here though. I remember that Wii Sports came with the Wii console itself. So the global sales for Wii Sports might reflect (some of) the Wii console sales as opposed to people intentionally buying Wii Sports.

Next in line is Super Mario Bros, a classic game that has been around since 1985, according to this dataset.

Sales per publisher

Up next we have the top 20 game publishers based on global sales. Earlier we saw that Wii Sports is the most sold title, here we see that Wii Sports’ publisher, Nintendo, also has the most sales out of all publishers. If you scroll back up to look at the top 10 titles, you’ll see that it is completely dominated by Nintendo.

Sales per year

This is my favorite graph for this dataset. It reflects the trend in sales over the last 30+ years. We see a clear upward trend for sales. I would carefully speculate that video games have greatly increased in popularity. We see a bit of a downward trend towards the latter part of the graph. However, I would like to point out that this might be an illusion as adding more contemporary data will change this trend. I would assume that the global video game sales trend is currently still on the increase.

Sales per platform

This one surprised me the most. Based on the other graphs and table, I expected for Wii to do much better. But it appears that PS2 was the most sold console three years ago.


For this dataset I carried out a simple analysis that some basic trends. Unfortunately, the dataset is not up to date. I would assume that you would find even more interesting trends if you were to include data from the last three years.

However, the most important piece of information this dataset provided is that gaming appears to have grown as a market. My assumption would be the fact that video games have grown more diverse in their genres and titles and therefore caters to a wider audience.

Do you want to use this dataset? Download it here on Kaggle. You’ll need to create an account, this is completely free of any cost.


Analysis: Weather in Amsterdam (November 9 – 16, 2019)

In a previous post I embedded a Jupyter Notebook that used DarkSky’s API to pull weather data about a random location in Amsterdam. In this post I display some of the graphs plotted from the data.

The data consists of 168 data points, these are hourly predictions of the weather in Amsterdam. The data ranges from November 9th 19:00 to November 16th 19:00.

Unfortunately, I was not able to properly save the plots as images. Therefore, to read the x axis you might have to resort to some eye squinting and zooming in (ctrl + scroll up).

Data cleaning

After I loaded the API data into a dataframe, I looked through DarkSky’s documentation to understand the data. The API call that I used mostly fetched data in units from the imperial system. As I am more familiar with the metric system, I used formulas to change to convert it to other units. Second, I changed the Unix time column in the dataframe to ‘datetime’. I also added a shortened version of the time for it to be more legible on the x axis of the plots.

You should also be able to fetch data in units from the metric system through the API. However, I wanted to play around with the units myself.

# Fahrenheit to celsius
df['temperature'] = (df['temperature'] - 32) * (5/9)
df['apparentTemperature'] = (df['apparentTemperature'] - 32) * (5/9)
df['dewPoint'] = (df['dewPoint']- 32) * (5/9)
# Miles to kilometers
df['visibility'] = df['visibility'] * 1.609344
# Unix to datetime
df['time'] = pd.to_datetime(df['time'],unit='s')
# Changing the datetime format
# I want to see: day, shortened month name & hour and minutes
df['time_short'] = df['time'].dt.strftime('%d %b %H:%M')


In order to plot these graphs, I used matplotlib. To save the images, I used:


Temperatures in Amsterdam

This line graph displays two different lines. The blue line predicts the actual temperatures per hour. The orange line shows what the actual temperature will feel like.

Precipitation in Amsterdam

This line graph shows the precipitation in milliliters per hour.

Wind speeds in Amsterdam

This line graph shows the wind speeds in Amsterdam in kilometers per hour.

Dew points in Amsterdam

This line graph shows the different dew points over the course of time in Amsterdam, the data points are in degree Celsius.

Want to use this weather API?

Go to DarkSky‘s website, make an account and get your own free key!


Jupyter Notebook: Weather in Amsterdam

I found an interesting and free API: darksky. Darksky allows you to pull weather related data, from any location you desire. All you have to do is sign up and you’ll receive a key that will give you access to their API.

I pulled hourly data from their API for a random location in Amsterdam. Below I have embedded a Jupyter Notebook in which I have plotted different weather related factors such as: temperature, wind speed, visibility, and precipitation.


How to embed your Jupyter Notebook into a WordPress post. No plugin needed. [beginner’s guide]

You’ll need:
– Jupyter Notebooks in ‘tree mode’
– Jupyter Notebook terminal or Linux terminal
– A Github account
– A WordPress account
– The ‘Gist’ extension

This guide will first help you to create a ‘gist’ of your notebook, which will then be embedded into your wordpress post.

Installing the gist extension on my Windows machine

To get the gist extension up and running on my Windows machine I ran the following code in the Anaconda terminal:

pip install jupyter_contrib_nbextensions

pip install jupyter_nbextensions_configurator

Installing the gist extension on my Linux machine

The windows installation method did not work for my Linux machine. To achieve the same thing, I used the following code in my Linux terminal:

conda install -c conda-forge jupyter_contrib_nbextensions

Enabling the Gist extension

Next, make sure that you’re viewing the Jupyter Notebook you want to embed in your WordPress in ‘tree mode’.

Second, we are going to enable the ‘Gist’ extension. To do this, go to the Edit > nbextensions config. This will open a new window with a list of extensions. Look for ‘Gist-it‘. When you click on ‘Gist-it’. You’ll be able to set the parameters for the extension.

The first parameter we need to set is the ‘Github personal access token‘. To get this token, you’ll need to go to your personal Github account. This token can be generated at: https://github.com/settings/tokens. Click on ‘generate new token‘. Then fill in a name for the token in the ‘Note‘ textbox. After that, select ‘gist‘. Right after creating the token, you’ll see your own personal access token. Copy it. DO NOT share this token with other people.

Set the parameters for the gist extension in your nbextension config window.

Get your personal access token on your Github account. Click “generate new token”.

After copying the personal access token from your Github account, go back to the nbextension config window and paste the token into the first parameter field ‘Gitbub personal access token‘. Next, check the ‘Gists default to public‘ box.

Sharing the notebook to gist


Go to the notebook you would like to add to your Github gist (in tree mode). And click on the Github logo that is now present in the Jupyter Notebook task bar.

When you click on the logo a new window will appear that asks for a gist id. If this is the first notebook your adding to your gist, you won’t need to fill in an id. So skip this field. Check the ‘make the gist public box‘. And add a description, this will be the title of your gist.

Go to your gists on Github. You can easily access these by clicking on your profile picture in the righthand corner. A dropdown menu will appear. Click on ‘your gists‘. If it went well, you’ll see your Jupyter notebook on this page.

Embedding your gist on WordPress

Open the gist by clicking on it. Then copy the URL from your browser. This is the URL to your gist. Make a new WordPress post. Make sure your WordPress ‘block’ is switched to HTML. Paste the URL into the HTML block. And you’re done. Publish your post and look at your site. Your first Jupyter notebook on WordPress.