Data tabulation


The tabulation of data is, in statistics, the set of operations that allow presenting them grouped and, in turn, in the form of graphs or tables.

Therefore, it is a process by which we group the data and show it through graphs or tables to better understand them.

Tabulation is an essential step in descriptive analysis prior to others such as inference. In this way, once we obtain them, we must prepare them for later use, and we do that by grouping them through tabulation.

Origin of data tabulation

At the beginning of the 19th century, statistics was already focused on the collection and classification of data. William Playfair (1759-1823) was the creator of the line, bar or sector charts that we know today. Thus, its usefulness is more than relevant for analysis.

Regarding the tabulation, this process was given later as a way to synthesize these collected and classified data. Its automation is due to Herman Hollerith (1860-1929), who created a punch card machine.

Over time, this method has improved considerably, especially with the advent of computing. On the other hand, the creation of applications such as spreadsheets or the use of specific software has made it possible to handle large amounts of data.

Data tabulation process

The data tabulation process will depend on the type of variable we use. That is, if it is qualitative, quantitative, discrete or continuous. In the example we will see a practical application.

Qualitative variable

Qualitative variables express categories, for example, degree completed. Tabulating data of this type is perhaps the simplest.

The table would have, on the one hand, the numerical data. On the other hand, absolute frequencies (count of each value) and relative frequencies (each absolute divided by the total) would be included. Two more columns are added with the accumulated absolute and relative frequencies.

Discrete quantitative variable

We are facing variables that can be added, therefore, averages, standard deviations and other descriptive statistics of position, dispersion or shape can be calculated. What we propose is to use the same columns as in the previous case.

Continuous quantitative variable

They are variables that can take infinite values. In this case, the tabulation is done by grouping by intervals. These should be enough not to lose too much information, but not too much. Formulas can be used to calculate the appropriate number of them.

Example of data tabulation with spreadsheet

Let's see, to finish, an example made with a spreadsheet. We have used the variables number of children, age and height.

As qualitative (ordinal) we have the number of children, in this case. Although they could be added together, it does not make sense, since they are different household sizes. We could also use nominal variables, which do not follow an order, such as sex.

In the case of quantitative ones, the discrete one would be the age in years and the continuous one is the height in meters and centimeters. The following image shows the data and our grouping proposal. We calculate the absolute (fi) and relative (hi) frequencies, as well as the accumulated (Fi and Hi).

We can say that only two cases had four children, which represents 10% of the total. That households with less than three children are 70% (with one and two children). For example, that people under 40 years old were 65%, or that those who measured 1.75 were four (20% of the total).

As we can see, data tabulation is important to analyze statistical information. In addition, as a later step, you can use bar, line or sector diagrams for a visual, and clearer, representation of these.

Tags:  history biography economic-analysis 

Interesting Articles