top of page

Descriptive Statistics: A Case Study

In our previous blog, we gave a brief outline of measures of central tendency that form part of descriptive statistics. In essence, we are summarising and describing data from samples to understand the population at large. In this blog, we will use a case study to show you the importance of this step in data analysis and to anticipate the analysis methods you may use downstream.


Let’s imagine doing a descriptive statistics analysis for an online retail using one year of transactional and operational data. When running an online business, it’s not enough to look at sales numbers. You need to know what’s typical, what’s the exception, and what truly drives performance. We will use the measures of central tendency to uncover such insights.



1.    Arithmetic Mean

Business Question: On average, how much are we selling per month/day?


Different datasets revealed how the mean behaves under various distributions:

  • Daily transactions (Normal-like Distribution): The mean here was representative because the data clusters symmetrically around it. However, it was sensitive to holiday peaks.

  • Monthly sales (Right-skewed): A few high-performing months created a long right tail. Consequently, the mean was pulled upward, overestimating the “typical” revenue. The median offered a clearer picture of monthly sales.

  • Refund amounts (Left-skewed): Most refunds were small, but large refund amounts pulled the mean downward. Again, the median better represented the “typical” refund amount.

  • Customer spending (Bimodal): This data showed two groups: budget and premium shoppers. The mean sat between them. It did not represent the two groups well. The mode revealed these two clusters.

  • ·Product sales (Heavy-tailed, Pareto Pattern): Only a small number of products contributed to the majority of revenue. The mean was unstable, showing how outliers dominated this distribution.

  • Promo participation (Uniform distribution): Participation in a promotional campaign done by the store was evenly spread. Here, the mean (midpoint) was sufficient.


Visual representation of common statistical distributions: normal, right skewed, left skewed, heavy tail, uniform, and bimodal, each annotated with mean, median, and mode values.
Visual representation of common statistical distributions: normal, right skewed, left skewed, heavy tail, uniform, and bimodal, each annotated with mean, median, and mode values.

NB: For continuous data, the mode value itself is often not meaningful; it may return a random number. The graphs are more useful for interpretation, as they show where data values cluster most frequently.

Visualisations: The histograms show how the mean shifted across different distributions.


2.    Median

Boxplots Question: What is a typical sales month like for us?

Boxplots revealed how the median was more resistant to outliers:


  • Monthly sales: The median reflected the “typical” month more clearly than the mean.

  • Refunds: Extreme refunds distorted the mean but not the median.


Insight: The median offered a stable view of performance. The mean was quantified, but it was affected by extreme events. Together, they provided a complete story.


3.     Mode

Business Question: What do our customers buy most often?

For continuous data like revenue, the mode was less helpful. But for the categorical data (e.g, product categories, ratings), it was the right measure.

  • Example: The most frequently purchased category was “Household Essentials”’

Insight: The most identified best-selling categories and common frequent ratings, guiding inventory, promotion and quality checks.


Visualisation: Bar Charts helped highlight frequency-based insights.


4.    Variants of the Mean

  • Geometric Mean (Growth Multipliers):

    Question: What’s our true growth rate over time?

The arithmetic mean overstated performance. The geometric mean, on the other hand, gave the true compound growth rate, crucial for forecasting and investment decisions.

  • Harmonic Mean (Cost per unit):

    Question: What’s the real average cost per unit (or efficiency rate)?

With varying supplier order sizes, the harmonic mean gave a more accurate measure of average costs, ensuring that smaller orders did not mask true efficiency.

  • Trimmed Mean:

    Question: What happens if we ignore extreme highs and lows?

By removing the most extreme sales months (top and bottom 5%), the retailer saw a more balanced view of performance. This was less volatile than the mean, but more representative than the median. This was especially useful for budgeting and planning.


Summary Table

 

Measure

Business Question

Insight for the Retailer (with dataset values)

Arithmetic Mean

On average, how much are we selling per month/day?

Normal-like sales: Mean ≈ 99.8.Good benchmark. Right-skewed sales: Mean ≈ 90.4, vs Median ≈ 81.8 (overstated).

Median

What is a typical sales month like for us?

In right-skewed sales, Median ≈ 81.8 vs Mean ≈ 90.4. Median better reflects a typical month.

Mode

What do our customers buy most often?

Product categories: highlights most common choices. In bimodal sales: Mode ≈ 59.1, revealing customer groupings.

Geometric Mean

What’s our true growth rate over time?

Shows true compound growth rate, avoiding inflated averages.

Harmonic Mean

What’s our real average cost per unit or efficiency?

Revealed effective cost per unit, lower than arithmetic mean.

Trimmed Mean

If we ignore extreme highs and lows, what’s balanced?

In heavy-tailed data Mean ≈ 83.6, Median ≈ 65.9. Trimmed mean would balanced between them.

Distribution Shape

Do outliers or groups drive our results?

Bimodal sales: Mean ≈ 99.9, Median ≈ 98.1, Mode ≈ 59.1. Mean didn’t capture either group well.

 

Key Business Questions Answered

  • What is “typical” for us?

  • Are we overestimating performance due to outliers?

  • What products drive the most revenue?

  • What is our true growth rate?

  • How efficient are our suppliers?

  • How do we separate exceptional months from normal performance?


Conclusion

Measures of central tendency may look simple, but when used thoughtfully, they provide powerful insights into business performance. By knowing when to rely on the mean, median, mode, or their variants, retailers can avoid misleading conclusions, focus on what’s truly typical, and make better decisions.


Sometimes, analysis stops at descriptive statistics, and that’s enough. Other times, these measures become the foundation for deeper techniques. In the next blog, we’ll move from central tendency to measures of dispersion, to see how data spread tells the rest of the story.

Comments


© 2025 Nova Data Analytics. All rights reserved.

bottom of page