Uncategorized

Hyperreality: the reality of “Facetuning”

Image by ErikaWittlieb from Pixabay

Facetune is an app which allows its users to retouch their pictures. You can add filters to your pictures or you can alter facial features. This is now commonly referred to as “facetuning”. Edited photos can be uploaded to any social media site desired. The free version of the Facetune app, Facetune2, boldly displays their slogan on their website: “Wow your friends with every selfie”.

Body image?

Facetuning tools are used by users of different backgrounds. It has also been embraced by celebrities and influencers on social media. The consensus on using such tools have been widely debated. Users have come forward and admitted to using the tools and have expressed their positive sentiment towards using them. However, on the other side of the debate are those concerned about the possible consequences of (young) social media users seeing these altered images. These concerns focus on the negative impact on body image as the edited images showcase an unattainable appearance.

Spotting wobbly door frames

There is also an online fascination with trying to catch users in the act of using these tools. The Reddit community r/instagramreality seeks to spot inconsistencies in pictures that might give alterations away. The hunt to spot wobbly door frames is also carried out by different creators on YouTube. It has become more of a game as these tools have gotten more sophisticated over time. It is now also possible to edit body parts in moving videos. Content creators on TikTok have been found to edit their waist to look smaller in videos of them dancing. It remains a race between those developing Facetune tools and those trying to spot the evidence of these tools being used.

Hyperreality: what is even real

Facetune is the new Disneyland. Disneyland is a common example used to explain “hyperreality”, which is the failure to recognize reality in certain contexts. Disneyland, with its tagline “The Happiest Place On Earth” attempts to create a new reality using elements from actual reality. Baudrillard explains Disneyland as a hyperreality in Simulacra and Simulation:

But what attracts the crowds the most is without a doubt the social microcosm, the religious, miniaturized pleasure of real America, of its constraints and joys. […] Thus, everywhere in Disneyland the objective profile of America, down to the morphology of individuals and of the crowd, is drawn. All its values are exalted by the miniature and the comic strip. Embalmed and pacified.

Jean Baudrillard in Simulacra and Simulation

The Magic Castle, the tiny houses, the princes and princesses, the bright colors… all of these cater to the hyperreal experience that is Disneyland. As Baudrillard writes in his book, the juxtaposition between hyperreality and reality is felt especially when one stands in Disneyland’s parking lot. Only then do you realize how your perception of reality can be altered.

Living my (Kardashian) fantasy

With tools such as Facetune we are able to create the “Disney experience” of ourselves. You can shape your facial features or body parts exactly how you would like them to look in a particular context. Whether or not people will see the “real” you outside of Instagram no longer matters. The Kardashians, avid for-profit Instagram users, have been “caught” editing their pictures. But it does not matter, most of their followers or fans will never see the Kardashians in the flesh. Just as most of us do not get to see Disneyland backstage. Facetune caters to a fantasy, not a reality, as Valentina, a RuPaul’s Drag Race: All Stars‘ contestant once said on the show:

When it comes to me and living in my world, in this little coconut head that I got, it’s a lots of fantasies, and when I feel the fantasy it is my reality! And nobody can change that.

Valentina on RuPaul’s Drag Race: All Stars’ fourth season

I want what they’re having

What is the difference between looking at reality or a simulation of reality? The hyperreality that is Disneyland or Facetune will not go away. We will still go out of our way to be able to buy into a fantasy. Augmented and virtual reality are as popular as ever. The whole point of media is to make the experience as real and as authentic as possible. The movie industry seeks to implement sophisticated CGI tools to make movies feel more “real”. The gaming industry continues to explore ways in which gamers can fully immerse themselves into their gaming experience. We might or might not be entirely aware of how reality around is intentionally constructed, but we also don’t care that much.

Interesting content related to experiencing (hyper)reality

Uncategorized

Analysis: Big 5 Personality Test (openpsychometrics)

On openpsychometrics you can take the famous Big Five personality test. This test will assess how a person scores on five different personality traits. These traits are: openness, conscientiousness, extraversion, agreeableness, and neuroticism. To see what these traits entail, check out this post.

People score differently on these traits. For instance, some people score high on extraversion and actively seek out a lot of social interaction. Those who score high on conscientiousness prefer to keep things organized. Scoring low on neuroticism means experiencing less stress and anxiety. If you would like to see how you score on these traits, go to openpsychometrics. You can also download the dataset on their website.

1. using this questionnaire for analysis

I have looked at the questionnaire itself, it is comprised of 50 statements, with 10 statements per trait. I am a bit unsure of the questions for the trait “openness”. These questions mainly focus on imagination and abstract thinking. However, I would argue that openness is also about general curiosity and the willingness to try new things.

Furthermore, this test is online and free for anyone to take. If you are looking for a representative sample, this might not be the perfect sample for you. People taking this test have access to a computer and internet, are aware of personality testing, and are interested in assessing their personality. I am sure that this is a mere subset of the whole human population.

Therefore, I am not fully convinced of the reliability (“what does the test measure?”) of the test and the representativeness. Though, I will still like to look at the data and figure out the trends within this sample.

2. what does the questionnaire look like?

As you can see in the screenshot above, participants have to rate whether statements are applicable to them. To do so, they can pick on a scale that ranges from “disagree” to “agree”, in the data set this translates to scoring a 1, 2, 3, 4, or 5. With 1 being “disagree” and 5 being “agree”. As aforementioned, the questionnaire consists of 50 statements, with 10 statements per trait.

3. prepping the data for analysis

As can be seen in the screenshot above, the statements are either positive or negative. For instance, “I am the life of the party” is a positive statement that would indicate a high score on extraversion. But “I don’t talk a lot” is a negative statement, which would indicate a low score on extraversion. To deal with these inconsistencies I have to reverse score some of the statements. See my full Jupyter Notebook.

To reverse score the whole dataframe, I wrote the following code:

#this piece of code uses pandas
import pandas as pd
#dictionary to reverse score 'negative' statements
rev = {1:5, 2:4, 4:2, 5:1}
#iterate over dataframe columns that need remapping
ext_n = ["EXT2", "EXT4", "EXT6", "EXT8", "EXT10"]
for i in ext_n:
    EXT.replace({i: rev}, inplace = True)

First, I created a dictionary to reverse score the numbers. Second, I created a list of columns that needed to be adjusted. Third, I created a loop that would iterate over the columns that needed to be remapped, using the dictionary.

Analysis

All Big Five personality traits

These 5 graphs are histograms of all five personality traits. A histogram shows how many people score what on average per trait. For example, for the trait openness we see that most participants score a ‘4’, which is towards the higher end of the spectrum. So each bar represents the frequency of participants and their average score.

Openness

How all participants on average score on openness
Example of a question to show how participants score on one of the openness questions

Conscientiousness

How all participants on average score on conscientiousness

Example of a question to show how participants score on one of the conscientiousness questions

Extraversion

How all participants on average score on extraversion
Example of a question to show how participants score on one of the extraversion questions

Agreeableness

How all participants on average score on agreeableness
Example of a question to show how participants score on one of the agreeableness questions

Neuroticism

How all participants on average score on neuroticism
Example of a question to show how participants score on one of the neuroticism questions
Uncategorized

Analysis: COVID-19 confirmed cases around the world

First of all, shout-out to Johns Hopkins University for posting COVID-19 datasets on their Github. Their datasets can be found here. These are the best datasets I have found so far. They have datasets that include data of confirmed cases, recoveries, and deaths. The data is quite clean and contains data on a province/state-level as well as country-level.

For this particular analysis I have used the ‘confirmed cases’ dataset. I wanted the look at the top 10 countries with the most reported cases as of now. Furthermore, I wanted to see a time series on country-level and on a global scale. Lastly, we look at the progression in the Netherlands.

What is great about this specific dataset is that a new column is added every day with the new reported cases of yesterday. It is important to note that the data is cumulative, this means that every day we see the total number of confirmed cases per country.

Furthermore, the date format in this dataset is mm/dd/yy.

Top 10 countries with the most reported cases

This graph shows the cumulative number of confirmed cases of the top 10 countries as of the 20th of March, 2020. With China still having the highest number of confirmed cases.

Time series of top 5 countries with most reported cases of COVID-19

This graph displays a time series of the confirmed cases in the top 5 countries. While Italy, Spain, Germany, and Iran are still steadily increasing in numbers, we see that China’s cases have started to stagnate since March. We also see that around the time the stagnation in China takes place, cases start to be reported in Europe and Iran.

Even though South Korea has reported cases before Europe and Iran, they are not part of this graph. The countries in this graph were selected on the condition of having the most cases on the 20th of March, 2020.

Time series of top 5 countries with most reported cases of COVID-19 (excluding China and Italy)

Let’s exclude China and Italy for a moment. Here we can see that Spain, Germany and the US have had quite similar trajectories in terms of reported cases over time.

Time series of top 10 countries with most reported cases of COVID-19 (excluding China and Italy)

South Korea clearly stands out in their COVID-19 trajectory. They have managed to quickly respond and flatten the curve to a point where it’s almost stagnant.

Time series globally

This graph displays the global trajectory of the confirmed COVID-19 cases. In February and March there appears to be a small dip around the same time of the month. After that second small dip in March we see a major increase in reported cases around the world.

Time series of reported COVID-19 cases in the Netherlands

This graph looks at the confirmed cases in the Netherlands. March 12 stands out here as there were no reported cases on this day. However, after this stagnation we see a larger increase compared to the trajectory before March 12.

Want to see my full notebook of code to see how I made these graphs? Go to this post.

Uncategorized

programming made me impatient: from psychology to python

with python this could have been automated…

1. psychology

The human psyche has been a long time fascination of mine. So much that I felt I needed to watch any lecture I could find on William James before I even enrolled in any psych study. I was fully immersed into the world of the bystander effect, availability heuristics, and personality disorders.

2. doing the math

Quickly after actually enrolling, I found myself drowning in statistical methods. Math did not necessarily come easy to me. I definitely had to put in extra hours to achieve any type of passing grade. I fully understood the concepts, the math bits however… I don’t recall ever feeling that frustrated before. Though, the catch was, I actually really enjoyed all of it, once I did manage to grasp it. Statistics became a puzzle I needed to solve. I wanted nothing more than to figure out the significance of any piece of research. I suddenly had a goal to dissect any statistics used by researchers in scientific journals. What kind of flaws were they hiding? I felt like those people who always find themselves automatically spellchecking any piece of text they read.

3. first encounter: writing syntax

I then seriously considered to study statistics. In my spare time I would download datasets and perform any kind of analysis on them. My free uni edition of SPSS was absolutely godsend at this point. Most importantly, I fully enjoyed writing SPSS syntax. I was able to trace my thought process and I could quickly replicate tests. Yet, it didn’t take long for this sentiment to dissolve. I was shocked to find out how limiting SPSS really was. I mean, yes, it is nice software, but what if I want to go outside of SPSS capabilities? That is when I found out about R.

4. whoever created R…

I downloaded RStudio, and again, I felt as confused as I first felt when I was confronted with just the idea of statistics. R made no sense to me. At this point most of my statistics journey took place outside of university. I decided to not go for a statistics master. I wanted to understand internet culture. So, I was limited to making sense of R on the weekends. My master’s was all about qualitative research, so no statistics in sight. However, to understand internet culture, I needed to use tools scrape the web. Suddenly, I realized I needed to learn an actual programming language (Sorry, R). In order to pull data using an API, I needed to use python.

5. python

My R weekends were soon replaced with python weekends. This is when my love-hate relationship started with programming. I felt on top of the world whenever my code worked. But unbelievably impatient and frustrated when I couldn’t get it to work. This was also the first time I ever experienced what they call flow. I have pretty good time management skills. But python threw it all out of the window. I worked for hours on writing a script that would order a string into alphabetical order. I couldn’t believe it, I seemed to forget the very concept of time. My love-hate relationship turned into a full on love for python and programming.

6. more data and more pandas please

I still enjoy reading about psychology and groundbreaking experiments. And I frequently try to catch up on developments in internet culture. However, I felt I needed to further develop my technical skills. I no longer wanted to work with a small dataset, I wanted big data. That is why I decided to go for a traineeship in data engineering. I dropped python basics for a python library: pandas. It was like doing statistics on steroids. Never have I experienced statistics like that.

7. think like a computer

Don’t worry, I have not actually abandoned python basics, I just temporarily put those lessons on hold. But now I’m back at it. When I first tried python, I could not get myself to think like a computer. I wrote two lines of simple code and expected the IDE to just “get it”. I read ‘Python for Dummies’ and found an anecdote that finally made me understand computers. If I tell someone that has never toasted bread before to “just put the bread in the toaster”, they will probably try to force a loaf of bread, packaging and all, into a toaster. You can’t just tell a computer to “do something”. It needs a full rundown.

8. impatient

Now that my programming has gotten a bit better and computers and I vibe well, I have grown impatient. Any time I find myself using software such as excel, powerbi or even querying languages such as SQL, I get impatient. “With python this could have been automated”. Or “with python this would have been solved in 3 steps instead of 10”. Programming will make you realize how much you can customize. This even means the software you use. Imagine if you could tweak everything you use? This thought process led me to using Linux. I loved windows for its user friendliness, however, it does feel like your stuck in a box. Linux to a hardcore windows-user has not exactly been smooth sailing. I am still trying to figure out some of the compatibility issues that I am experiencing. Yet, I have close to full control.

Uncategorized

Analysis: Video Game Sales

After scavenging Kaggle for new datasets to play around with, I found an older one I have been interested in for a while now: video game sales. It’s a dataset from about three years ago that is scraped from a website that looks at video games sales and ratings.

Data

The data is three years old, which is quite unfortunate as the video game market has greatly expanded over the last years. Multiplayer online games such as Fortnite now have a dominant position on the market. Therefore, I wanted to find a way to get a dataset that contains data from the last three years as well. After scraping for hours, I have an up-to-date dataset. However, I quickly noticed that the dataset I ‘created’ was missing a lot of important values. Thus, I have decided to stick to the dataset I found on Kaggle.

If you want to try using the scraping script that I found on Github, download the script here. I would recommend using time sleep in the ‘for loop’ that scrapes the data. If you do not do this, you might get an “error” (HTTPS 429) as you’re sending too many requests in a short amount of time to their server.

import time
time.sleep(25)

I found that the shortest time possible to not get an error was 25 seconds. You will have to let it run for days if you want to have a full dataset. But if you would rather not let it run for days, try changing the amount of ‘pages’ at the beginning of the script. This will reduce the amount of data, but you won’t have to wait for 4 days for your data to be done. I ended up scraping 1 page. Unfortunately, I did notice that the scraped data had a lot of missing values.

For all of these reasons, I decided to stick to the premade dataset I found on Kaggle.

If you do decide to scrape the whole dataset, my advice would be to slightly change the scraping script. Try moving up the portion of the code that saves the scraped to dataframe and csv file. If your internet connection drops or you get some kind of error, at least you will still have data saved to disk.

Analysis

Please keep in mind that the dataset is from about three years ago. Therefore, you will not see games such as Minecraft or PUBG on the list.

For the analysis I wanted to know 4 different things:

  • Top 10 titles in gaming
  • Sales per publisher
  • Sales per year
  • Sales per platform

Top 10 titles in gaming

Here we see the top 10 titles in video games (three years ago), the ranking is based on the amount of global sales. As you can see Wii Sports did quite well. I have a speculation for the first place here though. I remember that Wii Sports came with the Wii console itself. So the global sales for Wii Sports might reflect (some of) the Wii console sales as opposed to people intentionally buying Wii Sports.

Next in line is Super Mario Bros, a classic game that has been around since 1985, according to this dataset.

Sales per publisher

Up next we have the top 20 game publishers based on global sales. Earlier we saw that Wii Sports is the most sold title, here we see that Wii Sports’ publisher, Nintendo, also has the most sales out of all publishers. If you scroll back up to look at the top 10 titles, you’ll see that it is completely dominated by Nintendo.

Sales per year

This is my favorite graph for this dataset. It reflects the trend in sales over the last 30+ years. We see a clear upward trend for sales. I would carefully speculate that video games have greatly increased in popularity. We see a bit of a downward trend towards the latter part of the graph. However, I would like to point out that this might be an illusion as adding more contemporary data will change this trend. I would assume that the global video game sales trend is currently still on the increase.

Sales per platform

This one surprised me the most. Based on the other graphs and table, I expected for Wii to do much better. But it appears that PS2 was the most sold console three years ago.

Conclusion

For this dataset I carried out a simple analysis that some basic trends. Unfortunately, the dataset is not up to date. I would assume that you would find even more interesting trends if you were to include data from the last three years.

However, the most important piece of information this dataset provided is that gaming appears to have grown as a market. My assumption would be the fact that video games have grown more diverse in their genres and titles and therefore caters to a wider audience.

Do you want to use this dataset? Download it here on Kaggle. You’ll need to create an account, this is completely free of any cost.