A sufficient statistic for a parameter Θ is one capable of collecting or summarizing all the information that the sample of a random variable X contains.
We know that a statistic is a real function of the sample. That is, it takes real values contained in the sample. From there, as we have seen in the article in which the concept of statistic is defined, we must ensure that the statistician has certain properties. Why demand such properties from him? To ensure that the statistic is useful for our purposes.
Sufficiency is one of those properties. In a much simpler way, we will say that a statistic is sufficient if it uses all the information contained in the sample.
How to know if a statistic is enough?
Logically, the question that arises is: How can I know if a T statistic fulfills the sufficiency property? Or, How can I find, if it exists, a statistic that fulfills the sufficiency property. The answer to these two questions is found in two theorems:
- Fisher-Neyman factoring criterion: This criterion states that given a T statistic, if it meets certain conditions, then it will be a sufficient statistic.
- Darmois theorem: This theorem answers the second question. That is, it allows us to find a sufficient statistic through a series of procedures.
Example of a sufficient statistic
Suppose we want to calculate the average annual income of families residing in Chile. To do this, we will follow the following process:
- Collect information (sample): As we cannot ask each and every one of the families residing in Chile how much they earn annually, we will take a representative sample of, for example, 1,000 families.
- Identify the random variable under study: The random variable under study is family income. Thus: X → Family income
- Choose the right statistic: The right statistic to calculate mean income is none other than the expectation of X. In other words, the sample mean of X.
- How can I know if the sample mean statistic is a sufficient statistic? As we already have the mathematical expression of the statistic, we will use the Fisher-Neyman factoring criterion. Or, the Darmois Theorem. They are formulas created for this purpose.
After applying the proper calculations, we conclude that the sample mean statistic meets the requirement or property of sufficiency. By ensuring that it meets this requirement, we are ensuring that this (statistical) function, which allows us to synthesize the information (the mean income), uses all the information contained in the sample (the 1,000 families).
Why is it important that you use all the information in the sample?
Now that we know that the sample mean is a sufficient statistic, let's assume a case. What sense would it make to want to calculate the average income based on those 1,000 Chilean families and that we only use the data of 500 families?
Of course, that would not make any sense. We want a summary of all the information. That is, what we have defined as sufficient statistic.