We call Big Data to those data sets whose volume exceeds the capabilities of conventional data processing tools. Big Data also refers to the collection of all this data and our ability to use it in our favor in a wide range of areas, from business to politics.
The big data revolution has to do directly with the great advances in computing and information storage capabilities, whose acceleration curve continues to grow. The social, political and economic reality has always offered an enormous amount of information and man has always done his best to interpret it, but it is current times that humanity count on the technological infrastructure and knowledge to process large data sets in their entirety, and no longer just representative samples, which allows to identify correlations and segment the universe of data in ways that were previously impossible.
Everyday, more and more data is produced and available for analysis to the extent that technology allows recording virtually any trait of human activity: from the heartbeat to consumption habits and thinking patterns. When studying huge amounts of data, correlations between phenomena or variables that were previously hidden begin to emerge, and these relationships allow us to learn and make smarter decisions.
Using algorithmic programming, models are built and millions of simulations are executed by adjusting all possible variables until finding a pattern, or a data point, that helps solve the problem you are working on.
The concept of Big Data is constantly evolving, as it remains the driving force behind many factors in digital transformation, including artificial intelligence, data science and Internet of things.
Currently, every two days it is produced as much data as from the beginning of time to the year 2000. And this capacity continues to increase rapidly.
Today, almost all the actions we take leave a digital trail. We generate data every time we connect to the internet, when we activate GPS on our smartphones, when we communicate with our friends through social networks or messaging applications, and when we buy anything. You could say that we leave a fingerprint with everything we do that involves a digital action, which is almost everything. In addition to this, the amount of data generated by machines is also growing rapidly. Data is generated and shared when our “smart” home devices communicate with each other or with their parent servers. Industrial machinery in plants and factories around the world is increasingly equipped with sensors that collect and transmit data.
The term “Big Data” refers to the collection of all this data and our ability to use it in a wide range of areas, including business.
The main application areas of Big Data technologies are the following:
It is the combination of technology, tools and processes that allow to transform stored data into information, this information into knowledge and this knowledge imay be directed to a business plan or strategy. Business intelligence is a technique that must become an essential part of the business strategy, since it allows you to optimize the use of resources, monitor the fulfillment of the company’s objectives and the ability to make good decisions in order to obtain better results.
Business Intelligence can be applied to commercial companies as well as to public or private institutions. The decision-making based on data provides enormous benefits such as: full and real-time knowledge of all processes of the organization, better understanding of the clients or target population, cost control, optimal development of management indicators, permanent evaluation of the effectiveness and impact of plans and lines of work.
This is the use of information about social groups and subgroups, populations, audiences, etc., extracted from the activity in social networks and channels. The large amounts of data (Big Data) that are available on digital communication platforms represent a gold mine for companies, brands, media, advertising agencies, as well as for political organizations and even governments. Through proper analysis, you can extract value from this information in different ways:
Identify profiles of groups and individuals
Record and relate behaviors
Segment populations (discover subgroups)
Forecast behavioral changes
The advance in the capacities of computation and storage of data has allowed a better use social networks information. Companies and organizations no longer only “monitor” the behavior of their brand on the network or the amount of mentions they may have obtained. Now they focus more on “knowing” the audience better and understanding it organically. Understanding that what was previously called “the public” is actually made up of different groups and individuals with particular interests, tastes, opinions and feelings has been a great contribution to both commercial and political communication. And this has been possible thanks to the ability to process really huge data sets to establish patterns with high security. That is, it has been possible thanks to Big Data.
In Political Marketing, the application of Big Data to social networks assumes a particular importance. While its use is similar to what companies do when studying and segmenting consumer populations (Microtargeting, opinion analysis, establishing behavioral patterns, voting prediction), Big Data processing also has an important impact in the area of public policies.
Through the analysis and combination of different data sets (social networks, service consumption data, economic and socioeconomic indexes, etc.), systems for the identification and geolocation of needs may be established, so that governments can be more efficient and promote innovation in public services.
Types of data
Databases are classified according to their structure, that is, the way the data is presented. So we have three big sets:
This category refers to the information that is usually found in most databases. They are text files that are usually displayed on spreadsheets that have rows and columns with titles. They are data that can be easily sorted and processed by all data mining tools. We could see it as if it were a perfectly organized filing cabinet where everything is identified, labeled and easily accessible. An example of this is a database of standard customers, it includes the name, email address, telephone number, etc. That is, they give name to each field in the database and as a consequence, this type of data is easy to enter, analyze and store.
Semi Structured Data
This data is usually presented in a type of format that can be defined but it is not easy to understand by the user, and usually require the use of complex rules that help determine how to read each piece of information. One example is the tag record in HTML, XML, XTML languages that are written on a website or blog.
Generally, about 80% of the information relevant to a business originates in an unstructured way, mainly in text format. Unstructured data has no identifiable internal structure. It is a massive and disorganized conglomerate of several objects that have no value until they are identified and stored in an organized manner. Once they are organized, the elements that make up their content can be searched and categorized for information. Examples: audios, videos, photographs, printed documents, emails, Twitter, etc.
Data can also be classified by source and by the way of collection:
1.- Web and Social Media: Includes web content and information obtained from blogs and social networks such as Facebook, Twitter, LinkedIn, etc.
2.- Machine to Machine (M2M): M2M refers to the technologies that allow connecting to other devices. M2M uses devices such as sensors or meters that capture a particular event (speed, temperature, pressure, meteorological variables, chemical variables such as salinity, etc.) which transmit through wired, wireless or hybrid networks to other applications that translate these events in meaningful information.
3.- Transactional Big Data: Includes billing records, telecommunications detailed call records (CDR), etc. This transactional data is available in both semi-structured and unstructured formats.
4.- Biometrics: Biometric information that includes fingerprints, retinal scanning, facial recognition, genetics, etc. In the area of security and intelligence, biometric data has been important information for research agencies.
5.- Human Generation: People generate different amounts of data such as the information that a call center keeps when establishing a phone call, voice notes, emails, electronic documents, medical studies, etc.
Ways to collect data
1- Created Data: This is data that would not exist unless we ask for it or obtain it through questions from people. Thus, to obtain data from this category you need to conduct surveys of people and establish a mechanism for capturing and analyzing this information. Examples of created data are all those obtained actively through online forms, market studies, consumer groups, employee surveys etc. Generally this type of data implies that a person voluntarily participates in the process of creating it.
Created data is usually structured and semi-structured data and can be both internal and external to the organization.
2- Provoked data: provoked data is in some way also created data. But this data is obtained passively. People are generally expected to express an opinion about the experience they have had about a product or service but without asking them to do so. A good example of all this would be the evaluation systems or “reviews” like the used in Amazon, where you can rate the product based on a certain number of stars.
This data is usually structured or semi-structured data and can also be both internal and external.
3- Transactional data: This is data generated each time a customer makes a purchase. This way of collecting data is very popular among consumer and retail companies. It allows to obtain information about what has been bought, when it has been bought, where it has been bought and who has bought something.
This kind of data makes a lot of sense for companies where there is a large volume of transactions for a large number of customers. Combined with other information, it allows to improve offers and develop specific marketing strategies.
These is internal and fully structured data.
4- Compiled data: This is data previously collected by companies that later sell them to third parties. These are companies whose activity is based on creating large databases with information about people or companies and then sell these data to be exploited by third parties.
Usually, compiled data is structured and external.
5- Experimental data: This data is a hybrid between created data and transactional data. In any case, it involves designing experiments in which the consumers different marketing treatment (created data) to see what their response to these stimuli (transactions) is. Here we could talk about the famous A / B testing that is done for example in the design of certain elements online or offline. For example, changing the design of a landing page or the window of a physical store. That is why it is called experimental data. Because we are testing and trying to optimize the public’s response to a series of stimuli like in a laboratory.
6- Captured data: This has to do with data passively collected about the behavior of people and machines, generated through the use of web devices and applications but without beingt aware that we create them. Examples of this type of data is smartphone GPS data used to develop applications on traffic, or Google search data, as well as data from sensors that measure our behavior, such as smart wristbands.
Captured data is generally unstructured and generated internally or externally to the company.
These is unstructured data and generally external to the company.
7- Data generated by users: This is data that both people and companies generate consciously. It includes comments on forums, social networks and blogs, changes in web pages etc. Everything related to the activity of people on the internet.
These is unstructured data and generally external to the company.
Types of analysis
There are at least four categories in which we can classify the different types of Big Data analysis:
It is about knowing all the indicators that allow apprehending the current state of the organization, the market or the area that you want to study. Data such as sales, consumption, production, income, expenses, etc., processed and related can help determine anomalies and possible threats.
It refers to the process of deepening the analysis of the data to isolate the origin of the current situation, the root of a problem.
This analysis implements techniques that allow determining the probability of an event occurring in the future, the forecast of a quantifiable amount or the estimation of a point in time at which something could happen.
It consists of an understanding of what has happened, why it has happened and a variety of “what could happen” analysis to help the user determine the best course of action to take.
The 5 V of Big Data
The first feature is the huge amount of data to process. That’s why we talk about “Big” data. Amount of data matters. It can be data of unknown value, such as Twitter feeds, click sequences on a web page or mobile application, or equipment enabled with sensors. What used to be a lot, now it is not so much. In any case, it depends on the characteristics of the organization being analyzed. For some organizations, this could be tens of terabytes of data. For others, they can be hundreds of petabytes.
Nowadays, some smart products and services available on the Internet operate in real time or almost in real time, so they require evaluation and action also in real time. In many cases the amount of information grows vertiginously, that is why the processing time of the information is a fundamental factor for this treatment to provide advantages that make a difference.
There are many types of data available. Traditional data was structured and fit perfectly in a relational database. With the rise of Big Data, information is often presented in the form of unstructured data. Unstructured and semi-structured data, such as text, audio and video, require additional preprocessing to provide meaning and value.
With a high volume of information that grows at such speed and presents such a variety, doubts arise in the analysis about its veracity. Therefore, it is necessary to clean the data to ensure the best use of them.
Recent technological advances have exponentially reduced the cost of data storage and calculation, making it easier and less expensive than ever to accumulate more data. With a larger volume of Big Data now more economical and accessible, it is easier to make more accurate business decisions. But finding value in Big Data is not just organizing information. It is a complete discovery process that requires insightful analysts, project leaders who ask the right questions, specialists able to design and implement models, recognize patterns, build scenarios and make predictions.
The Big Data Revolution
The increasing flow of information from sensors, photographs, text, voice and video data implies that we can now use data in ways that were not possible a few years ago. This is revolutionizing the business world in almost every industry. Companies can now predict precisely which specific customer segments will want to buy, and when, with incredible accuracy. Big Data also helps companies execute their operations in a much more efficient way.
Data is changing our world and the way we live at an unprecedented rate. The amount of data available to us will not stop increasing, and the analysis technologies will everyday be more sophisticated. The ability to take advantage of Big Data will become increasingly important for companies in the coming years. Those who see the data as a strategic asset are those who will survive and prosper. Those who ignore this revolution will run the risk of being left behind.