Every day, terms sneak into our everyday language that we use frequently, lightly, and whose meaning we tend to ignore. Some of them are SQL, Big Data, MapReduce, NoSQL or Machine Learning.
In this article we will explain the meaning of each one from a historical perspective so that you can participate in everyday conversations in which these issues come up.
Relational databases and SQL language
Before the 1960s, information was stored on computers through files that contained data from each of the stakeholders of a company: customers, suppliers, etc. And, to answer questions such as what users do you have in Madrid, you had to open these files one by one; or, create a specific program that would review its content and select this information.
It was from then on that the first programs appeared that allowed a company’s data to be stored and consulted in a structured way. They were called database management systems .
At first, the information was organized in the form of a network, but in the 1970s, tables began to be used. Each row represented a client of the company and each column one of their data (identity document, name, address, etc.). It was the relational model and the SQL language was used to retrieve information.
Currently, the relational model is also used, since it has necessary properties in certain cases. For example, in bank databases it is essential to be certain that there are no inconsistencies in the information (data with different values ??in various places, movements that are reflected in the origin account, but not in the destination account, etc. ) and that the operations are completely performed , without remaining in an intermediate step (atomicity).
The origins of Big Data: NoSQL databases
In recent times, many people have redirected their daily actions to the digital world. This has generated thousands of data that companies and organizations store and process in order to obtain useful information that can give them a competitive advantage. It is the well-known Big Data.
The treatment of this data has given rise to 3 new needs to which the relational model does not fully respond:
- Volume: more efficient and flexible ways of storing information are sought.
- Variability: these must allow the modification of the structures of the tables in hot.
- Speed: They have to respond in a matter of seconds to processing that can include TB of data.
This has favored the appearance of NoSQL databases .
MapReduce: what does this programming model consist of?
Another factor that has allowed to obtain feasible solutions has been the parallel and distributed data processing of MapReduce.
Broadly speaking, it is an algorithm that allows dividing the problem into parts, obtaining partial results from each one and combining them to achieve a global solution to the initial approach.
Obviously it is not always possible to apply MapReduce, but it can be done in many cases related to Big Data.
It’s the turn of the dnRDBS
In the first half of the 2000s , distributed non-relational database systems (dnRDBS) emerged. Some of them still have a very significant market share today:
- MongoDB: It is a document database . Each element of the database is a document with a free structure, defined and different from the previously stored reports.
- Cassandra – is a key value database . Information is stored similar to a dictionary.
- Apache HBase: is a columnar database . The information is stored in tables and the number of columns for each record can be variable.
Big Data and the importance of scalability
Another very interesting issue related to Big Data is scalability . For example, on Black Friday an e-commerce can multiply its business volume and, therefore, its servers receive more requests than usual. To do this, it needs the infrastructure that supports it to be scalable, that is, to have more operating machines on time than it has on days with normal traffic .
Also, there is the possibility that this service peak cannot be predicted and, therefore, it is necessary for the number of machines that are providing service to grow or decrease dynamically depending on the demand that takes place at any given time. This is a feature that Cloud infrastructures such as Google Cloud Platform, Microsoft Azure, AWS or IBM Cloud, among others, have.
SQL is still required
Despite the expansion of Cloud Computing platforms that allow the use of NoSQL databases for Big Data, relational data models (and the SQL language) continue to be used in applications and systems for several reasons:
- There are circumstances where relational databases offer a better answer.
- Many NoSQL databases use dialects of the SQL language to operate.
- In the first stage of a Big Data project, it is necessary to know the input data and perform preprocessing so that they are in a format that can be easily processed by the Machine Learning algorithm that is going to be used. In many cases, SQL is used for this phase.
Another proof that this language is in vogue is the high demand for professional profiles in the current market . If you search for jobs with this requirement, it is easy to find around 1,500 job offers on Infojobs. And, according to the TicJob portal, it is the third most requested technology. In addition, it should not be overlooked that they are well-paid jobs and that they are easily around 30,000 euros per year.