Doug Laney introduced us to the first 3 Vs of big data back in 2001. The three original Vs were volume, velocity, and variety. As we have amassed more data over time, the volume of data has increased. Think about sensor-meters on machines. We can now investigate how machines in a factory or a warehouse are doing based on continuous sensor readings. Furthermore, through our smart devices and social media platforms, even more data is being generated. The emergence of the internet of things (IoTs) has brought us a goldmine of datasets.
What makes big data even more special is that this data arrives repeatedly in realtime. We can monitor machines as they drill oil, likes on social media posts are instantly registered, and rainfall measures are continually measured and recorded. These three examples fall under the second V, velocity. The speed at which data arrives has increased incredibly over the last decades. This has been facilitated by the increase in bandwidth and internet speeds.
Moreover, we have a myriad of different types of data formats now. Think about the different types of data an online store can generate. First, there are the click paths that people go through on the website. The information the customer fills in on the website. What items a customer ends up buying. What payment method they used. And those are just a few examples. All of these actions generate different data formats that need to be stored, processed, and analyzed.
Other scholars and big data engineers have added other Vs to the mix. Examples are variability, veracity, and visualization. But in this post I would like to discuss a different V, referred to as value. Big data on the surface of things seems great. We have a lot of information. Which pleases those who adhere to the ‘law of large numbers’. In statistics there are many principles that point toward the idea of bigger is better. Think of the central limit theorem, the larger the sample size, the more likely the sample will morph into a normal distribution. And increasing the sample size is all about getting an answer that is closer to ‘reality’. We want a sample represents the population.
But let’s say we’re a company. We have loads of data. Statisticians would be jealous of our datasets. So much data. But what now? We let the data sit in a database or a distributed file system for a couple of weeks before we analyze it. We analyze it, and oops – it’s already too late. The interesting trends we found through our analysis are now irrelevant. That is why we should seek value in our data. It means acting fast, it means performing the right analyses. It also means realizing what we want to achieve with our data. Do we want to increase sales? Do we want to understand our population better? Do we want to facilitate better decision-making processes?
We have to know what we’re doing, that is why the value principle is so important. This principle is based on our other Vs as well, the volume, variety, and velocity of the data. Value is often overlooked but it is definitely imperative to your big data solution.