Analysis: Dutch Elections

Using data from Wikipedia, I have created an analysis of the Dutch Elections. Wikipedia often has articles that include tables with interesting data. I simply copy and paste this data into an Excel sheet, clean it, and import it into Tableau.

The full dashboard and analysis is available on my Tableau Public profile. Unfortunately, I am unable to insert any HTML code that includes Javascript on this WordPress site, therefore I can’t embed the dashboard into this post.

The non interactive dashboard can be seen below. The interactive version can be see on my Tableau Public profile.

Click on the dashboard to be redirected to the interactive version.

Data Analysis: Women in politics and education around the world

I have been using Python for data analysis for about a year now. In the last couple of months, I have started to work with Tableau. Tableau is a visualization tool, which I find to be a very different experience from using the Python Panda’s library. The visualizations for today’s analysis can be found below and on Tableau Public.

Datasets and data prep

The UN has interesting (and free) datasets available on their website that look at countries around the world. For today’s data analysis, I first used a dataset that lists the share (%) of women in parliament around the world. And second, I used a dataset that provides the ratios between girls and boys for different levels of education (primary, secondary, and tertiary). Before I created graphs in Tableau, I slightly changed the format of the datasets using Tableau Prep. The country/region variable included different levels of countries and regions. Therefore, both continents (e.g. Africa) and countries (e.g. India) were included in the variable. I created a variable that only includes regions on a country level. Furthermore, the ‘share of women’ in parliament was in a format which Tableau won’t recognize as a percentage. This meant that I had to create a new variable for the share of women as well.


See the interactive visualization here. The first map displays the share of women in parliament around the world. You can select a specific year using the dropdown menu next to the map. Hover over the map to see the percentage, year, and country name. The higher the percentage, the higher share of women in that country’s parliament.

The second map displays the ratio (girls:boys) in education around the world. Using the dorpdown menus, you can filter for year and education level. A ratio above 1 indicates that there are more women in that level of education than men. A ratio below 1 indicates the opposite, more men in education than women.

The third graph displays the datapoints for each country, based on the share of women in parliament and the ratio of girls to boys in education. Countries below the horizontal reference line (of 1), have more men than women in enrolled in education. Countries above this reference line have more women enrolled. The countries to the left of the vertical reference line have a share of less than 50% women in parliament. Countries on the right side of the vertical reference line have parliaments that are made up of at least 50% women.

The third graph can be filtered using the ‘select education level’ filter (next to the second map). When you select different education levels, you will see that the share of women in parliament does not necessarily correlate with the ratio of girls to boys in education.

See interactive version here.

Analysis: Streaming services

The fragmentation of streaming services has been said to lead to an increase in torrenting traffic. Subscribing to multiple platforms means more bills coming in to be paid. Furthermore, it requires more actions from people to be able to get to their favorite shows. Currently there are several big players on the streaming services market, these include Netflix, Prime Video, Disney+, and Hulu. Each service might offer exclusive content, but they also overlap in some of their content.


On Kaggle I found a dataset which lists movies hosted by the four aforementioned streaming platforms. The dataset was created by scraping content from Reelgood.com, combined with an IMDb dataset. The IMDb dataset serves to display the IMDb ratings for the movies in the Reelgood.com dataset. The Kaggle dataset was quite clean, it mostly just contained NaNs for unknown values. Therefore not much cleaning was needed. I only needed to shape the data into a usable form for the insights that I wanted to provide. To see this code, take a look at my Jupyter Notebook.


  1. Which genres do streaming services offer?
  2. Do streaming services offer old or new movies?
  3. How are movies rated on the platforms?
  4. Bonus: Do the streaming platforms have movies in common?

1. Which genres do streaming services offer?

What are the top 10 genres for each streaming platform? This shows the number of movies each streaming platform hosts for each genre.

These bar charts show the top 10 genres for each streaming platform. This top 10 is calculated using the amount of movies that are present on each platform per genre. We see that for Netflix, Prime Video, and Hulu that the number one genre is Drama followed by Comedy as a number two.

Disney+ has the Family genre as its number one. Family is much lower down the list for the other three platforms.

2. Do streaming services offer old or new movies?

This chart displays the number of movies according to the year of release for Netflix.
This chart displays the number of movies according to the year of release for Prime Video.
This chart displays the number of movies according to the year of release for Disney+.
This chart displays the number of movies according to the year of release for Hulu.
This chart displays the number of movies according to the year of release for all platforms.

The charts above show the number of movies according to the year of release. From these graphs we can infer that most movies hosted on these platforms are from the last 20 years.

Prime and Disney+ seem to have the most varied selection in terms of movie release dates.

3. How are movies rated on the platforms?

This histogram shows the number of movies hosted by Netflix according to the IMDb rating.
This histogram shows the number of movies hosted by Prime Video according to the IMDb rating.
This histogram shows the number of movies hosted by Disney+ according to the IMDb rating.
This histogram shows the number of movies hosted by Hulu according to the IMDb rating.

These graphs display the IMDb rating frequency for each streaming platform. IMDb ratings range from 1 – 10 stars. The higher the rating, the better.

4. Bonus: Do the streaming platforms have movies in common?

Title Netflix Hulu Prime Video Disney+
0 Amy Yes No Yes Yes
1 The Square Yes Yes Yes No
2 The Interview Yes Yes Yes No
3 Blame! Yes Yes Yes No
4 Evolution Yes Yes Yes No
5 No Game No Life: Zero Yes Yes Yes No
6 Zapped Yes Yes No Yes
7 Mother Yes Yes Yes No
8 The Kid No Yes Yes Yes
9 Inside Out No Yes Yes Yes

Interestingly, there isn’t one movie that all platforms host. However, there are 10 different movies that three of four platforms stream (see table above). The ‘yes / no’ values indicate whether the movie in the Title column is hosted on the platform.


Hyperreality: the reality of “Facetuning”

Image by ErikaWittlieb from Pixabay

Facetune is an app which allows its users to retouch their pictures. You can add filters to your pictures or you can alter facial features. This is now commonly referred to as “facetuning”. Edited photos can be uploaded to any social media site desired. The free version of the Facetune app, Facetune2, boldly displays their slogan on their website: “Wow your friends with every selfie”.

Body image?

Facetuning tools are used by users of different backgrounds. It has also been embraced by celebrities and influencers on social media. The consensus on using such tools have been widely debated. Users have come forward and admitted to using the tools and have expressed their positive sentiment towards using them. However, on the other side of the debate are those concerned about the possible consequences of (young) social media users seeing these altered images. These concerns focus on the negative impact on body image as the edited images showcase an unattainable appearance.

Spotting wobbly door frames

There is also an online fascination with trying to catch users in the act of using these tools. The Reddit community r/instagramreality seeks to spot inconsistencies in pictures that might give alterations away. The hunt to spot wobbly door frames is also carried out by different creators on YouTube. It has become more of a game as these tools have gotten more sophisticated over time. It is now also possible to edit body parts in moving videos. Content creators on TikTok have been found to edit their waist to look smaller in videos of them dancing. It remains a race between those developing Facetune tools and those trying to spot the evidence of these tools being used.

Hyperreality: what is even real

Facetune is the new Disneyland. Disneyland is a common example used to explain “hyperreality”, which is the failure to recognize reality in certain contexts. Disneyland, with its tagline “The Happiest Place On Earth” attempts to create a new reality using elements from actual reality. Baudrillard explains Disneyland as a hyperreality in Simulacra and Simulation:

But what attracts the crowds the most is without a doubt the social microcosm, the religious, miniaturized pleasure of real America, of its constraints and joys. […] Thus, everywhere in Disneyland the objective profile of America, down to the morphology of individuals and of the crowd, is drawn. All its values are exalted by the miniature and the comic strip. Embalmed and pacified.

Jean Baudrillard in Simulacra and Simulation

The Magic Castle, the tiny houses, the princes and princesses, the bright colors… all of these cater to the hyperreal experience that is Disneyland. As Baudrillard writes in his book, the juxtaposition between hyperreality and reality is felt especially when one stands in Disneyland’s parking lot. Only then do you realize how your perception of reality can be altered.

Living my (Kardashian) fantasy

With tools such as Facetune we are able to create the “Disney experience” of ourselves. You can shape your facial features or body parts exactly how you would like them to look in a particular context. Whether or not people will see the “real” you outside of Instagram no longer matters. The Kardashians, avid for-profit Instagram users, have been “caught” editing their pictures. But it does not matter, most of their followers or fans will never see the Kardashians in the flesh. Just as most of us do not get to see Disneyland backstage. Facetune caters to a fantasy, not a reality, as Valentina, a RuPaul’s Drag Race: All Stars‘ contestant once said on the show:

When it comes to me and living in my world, in this little coconut head that I got, it’s a lots of fantasies, and when I feel the fantasy it is my reality! And nobody can change that.

Valentina on RuPaul’s Drag Race: All Stars’ fourth season

I want what they’re having

What is the difference between looking at reality or a simulation of reality? The hyperreality that is Disneyland or Facetune will not go away. We will still go out of our way to be able to buy into a fantasy. Augmented and virtual reality are as popular as ever. The whole point of media is to make the experience as real and as authentic as possible. The movie industry seeks to implement sophisticated CGI tools to make movies feel more “real”. The gaming industry continues to explore ways in which gamers can fully immerse themselves into their gaming experience. We might or might not be entirely aware of how reality around is intentionally constructed, but we also don’t care that much.

Interesting content related to experiencing (hyper)reality


Analysis: Big 5 Personality Test (openpsychometrics)

On openpsychometrics you can take the famous Big Five personality test. This test will assess how a person scores on five different personality traits. These traits are: openness, conscientiousness, extraversion, agreeableness, and neuroticism. To see what these traits entail, check out this post.

People score differently on these traits. For instance, some people score high on extraversion and actively seek out a lot of social interaction. Those who score high on conscientiousness prefer to keep things organized. Scoring low on neuroticism means experiencing less stress and anxiety. If you would like to see how you score on these traits, go to openpsychometrics. You can also download the dataset on their website.

1. using this questionnaire for analysis

I have looked at the questionnaire itself, it is comprised of 50 statements, with 10 statements per trait. I am a bit unsure of the questions for the trait “openness”. These questions mainly focus on imagination and abstract thinking. However, I would argue that openness is also about general curiosity and the willingness to try new things.

Furthermore, this test is online and free for anyone to take. If you are looking for a representative sample, this might not be the perfect sample for you. People taking this test have access to a computer and internet, are aware of personality testing, and are interested in assessing their personality. I am sure that this is a mere subset of the whole human population.

Therefore, I am not fully convinced of the reliability (“what does the test measure?”) of the test and the representativeness. Though, I will still like to look at the data and figure out the trends within this sample.

2. what does the questionnaire look like?

As you can see in the screenshot above, participants have to rate whether statements are applicable to them. To do so, they can pick on a scale that ranges from “disagree” to “agree”, in the data set this translates to scoring a 1, 2, 3, 4, or 5. With 1 being “disagree” and 5 being “agree”. As aforementioned, the questionnaire consists of 50 statements, with 10 statements per trait.

3. prepping the data for analysis

As can be seen in the screenshot above, the statements are either positive or negative. For instance, “I am the life of the party” is a positive statement that would indicate a high score on extraversion. But “I don’t talk a lot” is a negative statement, which would indicate a low score on extraversion. To deal with these inconsistencies I have to reverse score some of the statements. See my full Jupyter Notebook.

To reverse score the whole dataframe, I wrote the following code:

#this piece of code uses pandas
import pandas as pd
#dictionary to reverse score 'negative' statements
rev = {1:5, 2:4, 4:2, 5:1}
#iterate over dataframe columns that need remapping
ext_n = ["EXT2", "EXT4", "EXT6", "EXT8", "EXT10"]
for i in ext_n:
    EXT.replace({i: rev}, inplace = True)

First, I created a dictionary to reverse score the numbers. Second, I created a list of columns that needed to be adjusted. Third, I created a loop that would iterate over the columns that needed to be remapped, using the dictionary.


All Big Five personality traits

These 5 graphs are histograms of all five personality traits. A histogram shows how many people score what on average per trait. For example, for the trait openness we see that most participants score a ‘4’, which is towards the higher end of the spectrum. So each bar represents the frequency of participants and their average score.


How all participants on average score on openness
Example of a question to show how participants score on one of the openness questions


How all participants on average score on conscientiousness

Example of a question to show how participants score on one of the conscientiousness questions


How all participants on average score on extraversion
Example of a question to show how participants score on one of the extraversion questions


How all participants on average score on agreeableness
Example of a question to show how participants score on one of the agreeableness questions


How all participants on average score on neuroticism
Example of a question to show how participants score on one of the neuroticism questions

Analysis: COVID-19 confirmed cases around the world

First of all, shout-out to Johns Hopkins University for posting COVID-19 datasets on their Github. Their datasets can be found here. These are the best datasets I have found so far. They have datasets that include data of confirmed cases, recoveries, and deaths. The data is quite clean and contains data on a province/state-level as well as country-level.

For this particular analysis I have used the ‘confirmed cases’ dataset. I wanted the look at the top 10 countries with the most reported cases as of now. Furthermore, I wanted to see a time series on country-level and on a global scale. Lastly, we look at the progression in the Netherlands.

What is great about this specific dataset is that a new column is added every day with the new reported cases of yesterday. It is important to note that the data is cumulative, this means that every day we see the total number of confirmed cases per country.

Furthermore, the date format in this dataset is mm/dd/yy.

Top 10 countries with the most reported cases

This graph shows the cumulative number of confirmed cases of the top 10 countries as of the 20th of March, 2020. With China still having the highest number of confirmed cases.

Time series of top 5 countries with most reported cases of COVID-19

This graph displays a time series of the confirmed cases in the top 5 countries. While Italy, Spain, Germany, and Iran are still steadily increasing in numbers, we see that China’s cases have started to stagnate since March. We also see that around the time the stagnation in China takes place, cases start to be reported in Europe and Iran.

Even though South Korea has reported cases before Europe and Iran, they are not part of this graph. The countries in this graph were selected on the condition of having the most cases on the 20th of March, 2020.

Time series of top 5 countries with most reported cases of COVID-19 (excluding China and Italy)

Let’s exclude China and Italy for a moment. Here we can see that Spain, Germany and the US have had quite similar trajectories in terms of reported cases over time.

Time series of top 10 countries with most reported cases of COVID-19 (excluding China and Italy)

South Korea clearly stands out in their COVID-19 trajectory. They have managed to quickly respond and flatten the curve to a point where it’s almost stagnant.

Time series globally

This graph displays the global trajectory of the confirmed COVID-19 cases. In February and March there appears to be a small dip around the same time of the month. After that second small dip in March we see a major increase in reported cases around the world.

Time series of reported COVID-19 cases in the Netherlands

This graph looks at the confirmed cases in the Netherlands. March 12 stands out here as there were no reported cases on this day. However, after this stagnation we see a larger increase compared to the trajectory before March 12.

Want to see my full notebook of code to see how I made these graphs? Go to this post.


programming made me impatient: from psychology to python

with python this could have been automated…

1. psychology

The human psyche has been a long time fascination of mine. So much that I felt I needed to watch any lecture I could find on William James before I even enrolled in any psych study. I was fully immersed into the world of the bystander effect, availability heuristics, and personality disorders.

2. doing the math

Quickly after actually enrolling, I found myself drowning in statistical methods. Math did not necessarily come easy to me. I definitely had to put in extra hours to achieve any type of passing grade. I fully understood the concepts, the math bits however… I don’t recall ever feeling that frustrated before. Though, the catch was, I actually really enjoyed all of it, once I did manage to grasp it. Statistics became a puzzle I needed to solve. I wanted nothing more than to figure out the significance of any piece of research. I suddenly had a goal to dissect any statistics used by researchers in scientific journals. What kind of flaws were they hiding? I felt like those people who always find themselves automatically spellchecking any piece of text they read.

3. first encounter: writing syntax

I then seriously considered to study statistics. In my spare time I would download datasets and perform any kind of analysis on them. My free uni edition of SPSS was absolutely godsend at this point. Most importantly, I fully enjoyed writing SPSS syntax. I was able to trace my thought process and I could quickly replicate tests. Yet, it didn’t take long for this sentiment to dissolve. I was shocked to find out how limiting SPSS really was. I mean, yes, it is nice software, but what if I want to go outside of SPSS capabilities? That is when I found out about R.

4. whoever created R…

I downloaded RStudio, and again, I felt as confused as I first felt when I was confronted with just the idea of statistics. R made no sense to me. At this point most of my statistics journey took place outside of university. I decided to not go for a statistics master. I wanted to understand internet culture. So, I was limited to making sense of R on the weekends. My master’s was all about qualitative research, so no statistics in sight. However, to understand internet culture, I needed to use tools scrape the web. Suddenly, I realized I needed to learn an actual programming language (Sorry, R). In order to pull data using an API, I needed to use python.

5. python

My R weekends were soon replaced with python weekends. This is when my love-hate relationship started with programming. I felt on top of the world whenever my code worked. But unbelievably impatient and frustrated when I couldn’t get it to work. This was also the first time I ever experienced what they call flow. I have pretty good time management skills. But python threw it all out of the window. I worked for hours on writing a script that would order a string into alphabetical order. I couldn’t believe it, I seemed to forget the very concept of time. My love-hate relationship turned into a full on love for python and programming.

6. more data and more pandas please

I still enjoy reading about psychology and groundbreaking experiments. And I frequently try to catch up on developments in internet culture. However, I felt I needed to further develop my technical skills. I no longer wanted to work with a small dataset, I wanted big data. That is why I decided to go for a traineeship in data engineering. I dropped python basics for a python library: pandas. It was like doing statistics on steroids. Never have I experienced statistics like that.

7. think like a computer

Don’t worry, I have not actually abandoned python basics, I just temporarily put those lessons on hold. But now I’m back at it. When I first tried python, I could not get myself to think like a computer. I wrote two lines of simple code and expected the IDE to just “get it”. I read ‘Python for Dummies’ and found an anecdote that finally made me understand computers. If I tell someone that has never toasted bread before to “just put the bread in the toaster”, they will probably try to force a loaf of bread, packaging and all, into a toaster. You can’t just tell a computer to “do something”. It needs a full rundown.

8. impatient

Now that my programming has gotten a bit better and computers and I vibe well, I have grown impatient. Any time I find myself using software such as excel, powerbi or even querying languages such as SQL, I get impatient. “With python this could have been automated”. Or “with python this would have been solved in 3 steps instead of 10”. Programming will make you realize how much you can customize. This even means the software you use. Imagine if you could tweak everything you use? This thought process led me to using Linux. I loved windows for its user friendliness, however, it does feel like your stuck in a box. Linux to a hardcore windows-user has not exactly been smooth sailing. I am still trying to figure out some of the compatibility issues that I am experiencing. Yet, I have close to full control.