# Describing data distributions

## • Calculate and interpret the standard deviation as a measure of variability.  • Use the normal distribution as a model for data distributions that are approximately symmetric and bell-shaped.    • Use the least squares regression line to model linear relationships in bivariate numerical data.

This section reviews material from grades 6–8 and previous units and forms a bridge between the previous section on collecting data and later sections on drawing conclusions from data.

The focus of this section is on three main ideas: (1) quantifying variability in a data set, (2) the normal distribution as a model for data distributions, and (3) the least squares regression line as a summary of a bivariate linear relationship. (Note that students are not required to know the term “least squares regression line,” referring to it instead as “line of best fit.”)

In the unit One Variable Statistics, students were introduced to the idea of standard deviation as a measure of variability. The focus in this unit should be on calculating the standard deviation (using technology) and on interpreting the standard deviation in context.

Students will probably not have seen the normal distribution before this unit, so it will need to be introduced here. Focus on properties of the normal distribution. Students will need to be able to find areas under a normal curve and percentiles for a normal distribution. The use of technology to assist in these calculations is encouraged. Students should be asked to interpret areas under a normal curve and percentiles for a normal distribution in a variety of different contexts.

In the unit Bivariate Statistics, students used technology to find the line of best fit (least squares regression line). This is revisited here with a focus on interpretation and modeling.

## External Resources

1 Exploring Linear Data

#### Description

WHAT: In this lesson, students use a technology-supported environment to model bivariate data in a variety of settings that range from car repair costs to sports to medicine. Students work to construct scatter plots, interpret data points and trends (MP.2), and investigate the notion of line of best fit.

WHY: Complete set of resources, including teacher notes and student worksheets. Three different contexts include both positive and negative linear relationships. A good review of previous work with lines of best fit.

2 Reasoning About the Standard Deviation

#### Description

WHAT: This lesson includes two activities. The first explores variability in data and develops the concept of standard deviation as typical deviation from the mean, using data on hand spans collected by students and plotted with Fathom or other software. The second has students connect standard deviation to data distributions by studying pairs of histograms and identifying which data distribution has the greater standard deviation.

WHY: These activities develop studentsâ€™ understanding of the standard deviation as a measure of variability and connects this understanding to graphical representations of data.

3 The Normal Distribution as a Model

#### Description

WHAT: This lesson includes two activities. First, students consider a number of physical measurements (such as height, arm span, and head circumference) and make conjectures about the shape of the data distributions. They then construct graphical displays and assess the appropriateness of the normal distribution as a model. They use a web applet to explore properties of the normal distribution. In the second activity, students use the normal distribution to model real data sets. (Note that the link (http://bcs.whfreeman.com/bps3e/) in Student Handout 1 needs to be replaced.)

WHY: These activities introduce the normal distribution as a way to model data distributions that are approximately bell-shaped and symmetric. Students also see the normal distribution used in a variety of contexts.