Statistics studies the behavior of data. Thanks to this exact science we can build our conclusions on these data based on the different variables that have been applied. Thus, when dealing with population studies, there are two approaches on which to base the analyzes: what is the difference between descriptive and inferential statistics? How do these types of studies interfere with big data analysis?
Descriptive and inferential statistics: describe or analyze
As a tool in various fields of study, we use statistical science, which is essential to draw conclusions on various topics. The object of study may have to do with:
- the behavior of groups of people (as in the case of sociology studies)
- the behavior of data of a more scientific nature that does not derive from human attitudes
Once the objective to work on has been identified, we need to collect data on which it will be necessary to decide from which approach we will approach its analysis: descriptive or inferential statistics? The first tends to make a description of the data and the second makes what is called inferences, trying to go beyond a description.
Features of descriptive statistics and inferential statistics
There is no more valid statistical method than another, but the choice about which one to use will depend on what you want to study or the type of application being investigated. Let’s look at each concept in greater detail.
If we think of a census-type population list in a specific time in which the personal data of the people who live in each address of each street in each population center appear, what is being practiced here is descriptive statistics .
If later a portion of the same census data is taken and certain conclusions are drawn with them using arithmetic operations , it is inferential statistics .
Traditional statistics is descriptive statistics. The approach he proposes is the analysis of the variables decided and then proceed to a description of the data . This is why it is said to be based on precision. This type of statistic aims to organize and establish a classification of the data obtained from a population group, for example.
You can establish a categorization within it and the use of certain technical concepts:
Within the framework of a given variable, a distance occurs between values. This difference is called dispersion.
The average is the mean and therefore the trend within a variable. That is, the result of dividing the sum of values ??by the number of them.
The bias or kurtosis of a variable is the quality of the data curve . That is, the value that has to do with the distance and proximity of the data with respect to the average.
The materialization or presentation of the data derived from an analysis is carried out in the form of a graphic representation . There is a wide variety of graphs: bars, circles, linear, polygons …
The different data of a variable are distributed with respect to the average in a certain way; this value is called skewness.
Inferential statistics looks at a sample of data and draws conclusions that it applies to the set through inferences. This type of approach, being the result of a probabilistic calculation , carries a certain margin of error.
The analyzes carried out by this type of statistic want to be able to predict the behavior of certain information. It is at this point where probability models and machine learning and artificial intelligence techniques come in, as well as predictive models .
Inferential statistics can be categorized into two large groups:
- Hypothesis testing
It is about validating those conclusions that have been built with respect to that portion of the data studied.
- Confidence intervals
They are random values ??that serve to identify the margins of error that may exist. They are usually a pair of numbers or several pairs of them between which it is estimated that a specific value is likely to be found.
Thus, descriptive and inferential statistics are different tools within this science of analysis. The first collects data so that it can be displayed in summary form. The description of the data can be of a set of the entire population or of a subset of the population. The point is that the conclusions that are drawn are 100% valid because they are based on the description of all the data of an entire defined group.
Inferential statistics, meanwhile, what it does is take a sample and establish the probability of a conclusion. The data are probabilistic in nature and a certain error must be assumed.
Descriptive and inferential statistics are not opposite sides of the same coin but rather different ways of approaching the data. In a traditional way, statistics have been identified with the collection of data. However, statistics advance and are updated with the times and today tools and approaches that have to do with computing intervene and that, in fact, lay the foundations for the development of artificial intelligence and machine learning .