Update all PDFs

Haircut Costs


Alignments to Content Standards: S-ID.A.1 S-ID.A.2 S-ID.A.3

Task

Seventy-five female college students and 24 male college students reported the cost (in dollars) of his or her most recent haircut. The resulting data are summarized in the following table.

FemalesMales
No. of observations7524
Minimum00
Maximum15035
1st Quartile209.25
Median3117
3rd Quartile7520
Mean52.5320.13
  1. Using the minimum, maximum, quartiles and median, sketch two side by side box plots to compare the hair cut costs between males and females in this student’s school.
  2. How would you describe the difference in haircut costs between males and females? Be sure you discuss differences/similarities in shape, center and spread.
  3. Why is the mean greater than the median both for males and for females? Explain your reasoning.
  4. Is the median or mean a more appropriate choice for describing the “centers” of these two distributions?

IM Commentary

This problem could be used as an introductory lesson to introduce group comparisons and to engage students in a question they may find amusing and interesting. More generally, the idea of the lesson could be used as a template for a project where students develop a questionnaire, sample students at their school and report on their findings.

Being able to use data to compare two groups is an important skill. These distributions have similarities (both appear to be skewed); we can also see that haircut cost tends to be greater for females than males and that there is more variability in haircut cost for females.

The data can also be used to start (or continue) a discussion about what we should report as a “typical” haircut cost. The data distributions appear to be skewed (for the females more so than the males). It allows us to see how extreme values in the data “pull” the mean toward the high end of hair cut costs. With strongly skewed data, measures such as the mean and standard deviation aren’t very useful.

These data came from a survey given to a class of introductory statistics students in a college class.

Solution

  1. Students can sketch out a basic box plot with whiskers extending to the min and max, a box extending from the first quartile to the third quartile, and a line at the median, as shown below. In order to compare haircut costs of males and females, the two boxplots should be plotted side by side on the same scale.

    Sol_1_fff0250e2e5117912bcd546995721b15
  2. Both boxplots show distributions that are skewed to the right. It makes sense that most haircuts will not cost too much, but a few students will spend a large amount. Since the cost will always be a positive number, the minimum cannot be less than 0 and there is a long right tail. The centers and spreads are quite different. The median cost for females is about twice that of males, and there is much more variability in the haircut costs for women. The interquartile range (IQR) for women is \$55, while for men it is \$10.75.
  3. We should not be surprised that the mean is larger than the median because the distribution appears to be skewed to the right. The mean averages all the values in the data, so is “pulled” toward the high ones. The median is the 50th percentile and is resistant to the extreme values.
  4. Since the median gives a better description of the center, or a “typical” haircut cost, it is more appropriate. Note that the mean for males is about equal to the 3rd quartile, indicating that 75% of the males paid less than the mean haircut cost for males. For women, the median is \$31, indicating that half of women spent \$31 or less, but the mean haircut cost for women is \$52.53. The mean doesn’t give us a good idea of what we could expect for a typical student's haircut cost. It is best to only use the mean when the data distribution is reasonably symmetric.

lhwalker says:

almost 2 years

"It allows us to see how extreme values in the data “pull” the mean toward the high end of hair cut costs. " I am wondering if this is accurate based on this: http://www.amstat.org/publications/jse/v13n2/vonhippel.html

roxypeck says:

over 1 year

Thanks for the comment. The article that you mention is an interesting one and it points out that there are exceptions to the usual relationship between the mean and the median for skewed distributions. But in this case, the boxplots indicate that the distributions are not symmetric and the students are told that the means and the medians are in that "usual" relationship and are just asked to explain why the mean is greater than the median. So I think it is OK here.