After scavenging Kaggle for new datasets to play around with, I found an older one I have been interested in for a while now: video game sales. It’s a dataset from about three years ago that is scraped from a website that looks at video games sales and ratings.
The data is three years old, which is quite unfortunate as the video game market has greatly expanded over the last years. Multiplayer online games such as Fortnite now have a dominant position on the market. Therefore, I wanted to find a way to get a dataset that contains data from the last three years as well. After scraping for hours, I have an up-to-date dataset. However, I quickly noticed that the dataset I ‘created’ was missing a lot of important values. Thus, I have decided to stick to the dataset I found on Kaggle.
If you want to try using the scraping script that I found on Github, download the script here. I would recommend using time sleep in the ‘for loop’ that scrapes the data. If you do not do this, you might get an “error” (HTTPS 429) as you’re sending too many requests in a short amount of time to their server.
I found that the shortest time possible to not get an error was 25 seconds. You will have to let it run for days if you want to have a full dataset. But if you would rather not let it run for days, try changing the amount of ‘pages’ at the beginning of the script. This will reduce the amount of data, but you won’t have to wait for 4 days for your data to be done. I ended up scraping 1 page. Unfortunately, I did notice that the scraped data had a lot of missing values.
For all of these reasons, I decided to stick to the premade dataset I found on Kaggle.
If you do decide to scrape the whole dataset, my advice would be to slightly change the scraping script. Try moving up the portion of the code that saves the scraped to dataframe and csv file. If your internet connection drops or you get some kind of error, at least you will still have data saved to disk.
Please keep in mind that the dataset is from about three years ago. Therefore, you will not see games such as Minecraft or PUBG on the list.
For the analysis I wanted to know 4 different things:
- Top 10 titles in gaming
- Sales per publisher
- Sales per year
- Sales per platform
Top 10 titles in gaming
Here we see the top 10 titles in video games (three years ago), the ranking is based on the amount of global sales. As you can see Wii Sports did quite well. I have a speculation for the first place here though. I remember that Wii Sports came with the Wii console itself. So the global sales for Wii Sports might reflect (some of) the Wii console sales as opposed to people intentionally buying Wii Sports.
Next in line is Super Mario Bros, a classic game that has been around since 1985, according to this dataset.
Sales per publisher
Up next we have the top 20 game publishers based on global sales. Earlier we saw that Wii Sports is the most sold title, here we see that Wii Sports’ publisher, Nintendo, also has the most sales out of all publishers. If you scroll back up to look at the top 10 titles, you’ll see that it is completely dominated by Nintendo.
Sales per year
This is my favorite graph for this dataset. It reflects the trend in sales over the last 30+ years. We see a clear upward trend for sales. I would carefully speculate that video games have greatly increased in popularity. We see a bit of a downward trend towards the latter part of the graph. However, I would like to point out that this might be an illusion as adding more contemporary data will change this trend. I would assume that the global video game sales trend is currently still on the increase.
Sales per platform
This one surprised me the most. Based on the other graphs and table, I expected for Wii to do much better. But it appears that PS2 was the most sold console three years ago.
For this dataset I carried out a simple analysis that some basic trends. Unfortunately, the dataset is not up to date. I would assume that you would find even more interesting trends if you were to include data from the last three years.
However, the most important piece of information this dataset provided is that gaming appears to have grown as a market. My assumption would be the fact that video games have grown more diverse in their genres and titles and therefore caters to a wider audience.
Do you want to use this dataset? Download it here on Kaggle. You’ll need to create an account, this is completely free of any cost.