data analysis and probability

Data Analysis and Probability

In this age of information and technology, it is increasingly important to understand how information is processed and translated into usable knowledge. The National Council of Teachers of Mathematics recognizes the importance of having all students develop an awareness of the concepts and processes of statistics and probability so that they can become intelligent consumers who make critical and informed decisions. The Council advocates (PSSM, 2000) a middle school mathematics curriculum that emphasizes more than just reading and interpreting graphs. Indeed, it is recommended that students formulate key questions, collect and organize data, represent data in a variety of ways, draw inferences from the data, and communicate their findings in a convincing manner. Standards-based curricular materials provide robust opportunities for middle school students to learn sound and significant mathematics using a problem-solving approach to statistical ideas. Accordingly, it is imperative that middle grade mathematics teachers possess a rich, integrated knowledge of concepts and processes in probability and statistics so that they can facilitate mathematical discourse and deliver instruction that promotes students genuine understanding of the content.

Students in the Data Analysis and Probability course could expect to regularly work in collaborative groups, examining specific standards-based middle grade curricular materials dealing with these concepts. For example, the eighth grade unit Samples and Populations from the Connected Math Project, Dale Seymour Publications, 1998, provides students the opportunity to learn how numerous statistical concepts can be applied in real world contexts. This curricular unit explores many of the big ideas in data analysis, and demonstrates the interrelationship between statistical concepts, displays of visual data, and techniques of data analysis. In particular, this unit, as well as others in this series, model the National Council of Teachers of Mathematics vision of a coherent school mathematics curriculum by offering connections between key statistical topics.

A particular lesson from Samples and Populations (Investigation 1, Comparing Quality Ratings, page 23a) focuses on results of a consumer product study. More specifically, the data consists of information about the quality, sodium content, and price of 37 brands of peanut butter classified by four attributes: natural or regular, creamy or chunky, salted or unsalted, and name brand or store brand. Peanut butter is a common food in many households and, with the increasing attention being paid to healthful diets, such nutritional information is important. In order to make informed purchasing decisions, numerous questions arise from the data. Is there a lot of salt in peanut butter? Is there much variation in quality ratings among different kinds of peanut butter? What is the best buy if I am most interested in quality? What is the best buy if I am interested in price? Students will use data from Consumer Reports to determine the existence of relationships between sodium content and quality rating, and will investigate whether name brands of peanut butter outscore store brands in quality ratings. They will use multiple displays of visual data and numerous data analysis techniques to justify their conclusions in order to convince their instructor and classmates.

Through their interaction with the curricular materials, college students will learn the valuable connections between topics in data analysis and probability, as well as the importance of statistical processes as a means of solving real world problems. After collaborative groups justify their inferences in a whole-class setting, students will then examine the topics in greater depth and with more mathematical rigor. In particular, investigation into whether a relationship exists between name brand/store brand and quality ratings gives rise to the study of non-parametric statistics. Examination of the relationship between sodium content and quality ratings will lead to the formal study of correlation and regression models. Linear regression affords opportunities to introduce related statistical topics such as Mean Square Error, Pearson Correlation Coefficient, and the Method of Least Squares for determining a line of best fit. Moreover, the real-world context of problems studied in our course will enable students to find meaning in the coefficients for slope and vertical intercept. Not only will linear regression models serve as a powerful means of making predictions, they will also provide a solid foundation for the study of proportionality as it relates to coefficient for slope.

Graphing calculators will be used extensively throughout the course, especially in the study of visual displays of data. In particular, data sets from Investigation 1 will be entered as arrays for subsequent analysis. Statistical displays will be created and analyzed using graphing calculators. For example, quality ratings for regular brands can be displayed as a stem plot, rotated 90 degrees, and transformed into a histogram. Quality ratings for natural and regular brands can be displayed as a back-to-back stem and leaf plot, and measures of central tendency can be subsequently calculated. In addition, graphing calculators can readily display a variety statistical plots (e.g., box and whiskers plots, histograms, etc.). Inferences, valid or invalid, from such visual displays of data will be drawn and discussed in small group and whole-class settings.