A study by Bain & Company estimates that managers who rely on data analytics make decisions five times faster than those without. This is possible thanks to the ETL processes, which allow to obtain the maximum utility of the data.

Analytics and business intelligence are areas that are gaining strength in the business area. In fact, a Gartner report predicts that the market value of these areas will reach $ 20 billion in 2019, multiplying the demand for technologies related to data processing.

What are ETL processes?

ETL processes ( extract , transform and load ) are those by which information is extracted from one or more data sources. It is transformed to suit the needs of business and then be loaded on a shared site for consultation by all stakeholders.

One of the most important characteristics of the ETL system is that it allows data from heterogeneous sources to be integrated into a homogeneous environment. However, the workflow setup has a certain level of complexity, and a poorly designed ETL process can cause costly operational problems.

Why are ETL processes important?

ETL processes are the main source of information for the business layer of the company. Sometimes the only one. These processes allow managers to have a clear, detailed and in-depth picture of business management . And thanks to this information, in a later phase of analysis, patterns and trends can be detected on which to make high-impact strategic decisions.

However, the data alone do not add any value, since they are collected in a raw state, which does not allow reading or analysis by people. For data to be valuable, it must be processed and transformed from its raw state to a readable and practical state. This is where the ETL process comes into play.

For example, a customer’s purchase history is made up of, among other things, the following information:

  • Customer contact details and other relevant information.
  • List of items purchased.
  • Details about these items: descriptions, prices, quantities, etc.
  • Final amount of each order.
  • Transactional status of orders (in progress, completed, canceled, etc.).

In order for the purchase history to provide all this information in a clear and legible way, the ETL process has to intervene to obtain and process the raw data until it is adapted to the appropriate formats. This is because the data is not stored in the organization’s systems as it is shown in the history.

If a manager wants to consult the history of a single client, the system should limit his search to that only data (among the huge amount handled by the company), perform calculations, select the fields to be displayed in the history template and give them the right format for their presentation.

Phases of an ETL process

1. Extraction (extract)

The first phase of an ETL process is data extraction. In this, the “raw material” is obtained with which the following two phases will be worked, until a practical utility is achieved.

The data comes from different origins and sources : system records, CRM, points of sale and devices used, among others. It is very likely that these sources are heterogeneous, therefore the data must be normalized. Data containing the same type of information, even if it comes from different sources, must also be collated in common structures.

The objective of the extraction phase is to synthesize all the information in one or more common structures in which the normalized data are stored, leaving them completely homogenized, organized and ready for the transformation phase. It is possible that in this first stage some transformations specific to the data source will be executed. The objective is that these are optimized for the transformation stage.

An example of extraction would be a file of sales data aggregated by months and departments, a row for each department and a column for each month, that is, each new month a column would be added. This format is suitable for handling manual files but very inefficient for a database.

On the other hand, in this technical phase most of the I / O (input / output) operations are carried out, which are usually quite expensive.

2. Transformation (transform)

It is in the transformation phase that we add value to the data obtained in the extraction phase. It is where the foundation of the ETL process lies.

Being isolated from the first, in which technical issues are managed, the business layer can be involved in the development of this transformation, with the aim of transforming the data and adding value.

3. Load

The third and final phase is charging. It is when the system receives the duly processed data and stores it in the destination that has been defined. At this stage, the transformation phase is also isolated from possible changes in the destination.

For example, it is possible that initially the result of the ETL is stored in a file for use by the business layer, but later it is decided to store those results in a database.

In certain environments, issues such as defining, validating, or implementing a data model and ingest processes can take months. For this reason, the business layer can obtain great value by having the possibility of generating immediate results in files and, later, storing them in the database by changing only the loading phase.

Flexible and agile development methodology in ETL processes

The way companies run ETL systems is evolving. Today, everything revolves around an agile and efficient perspective. That is why, according to a Gartner study , 90% of large companies will have hired a Chief Data Officer in 2019. This with the aim, among others, of having staff specialized in ETL optimizations.

All three phases can be developed in an integrated way, especially when the ETL processes are not complex. However, developing them flexibly and independently  provides the following advantages:

  • Each phase can be assigned to different teams to shorten work deadlines.
  • The development of each stage can be done with different technologies.
  • Changes that affect one phase (for example, the change in data source) can be isolated from other phases.
  • Tests can be done at each stage separately, since only the stage in which the test is being entered will be affected.
  • Work done for one ETL process can be reused for other ETL processes, especially the extraction and loading phases.
  • The business layer can be involved in the transformation phase, in order to define the logic of the ETL system, without addressing technical issues that are isolated in the extraction phase and would make it difficult to understand the business layer.
  • The results can be generated in the short and long term without affecting the logic of the ETL process, making changes only in the loading phase.

More Relevant Topics:-

It is important to apply a flexible development approach, since its greatest importance is to be able to involve the business layer in the definition of the ETL logic in order to generate short-term results , without the need to get involved in technical issues.