Purpose of Statistics Package Exercises : The Probability & Statistics course focuses on the processes you use to convert data into useful information. This involves
Summarizing data, and
In addition to being able to apply these processes, you can learn how to use statistical software packages to help manage, summarize, and interpret data. The statistics package exercises included throughout the course provide you the opportunity to explore a dataset and answer questions based on the output using R, Statcrunch, TI Calculator, Minitab, or Excel. In each exercise, you can choose to view instructions for completing the activity in R, Statcrunch, TI Calculator, Minitab, or Excel, depending on which statistics package you choose to use.
The statistics package exercises are an extension of activities already embedded in the course and require you to use a statistics package to generate output and answer a different set of questions.
To Download R
To download R, a free software environment for statistical computing and graphics, go to: https://www.r-project.org/ This link opens in a new tab and follow the instructions provided.
Throughout the statistics package exercises, you will be given commands to execute in R. You can use the following steps to avoid having to type all of these commands in by hand:
Highlight the command with your mouse.
On the browser menu, click "Edit," then "Copy."
Click on the R command window, then at the top of the R window, click "Edit," then "Paste."
You may have to press
The R instructions are current through version 3.2.5 released on April 14, 2016. Instructions in these statistics package exercises may not work with newer releases of R.
For help with installing R for MAC OS X or Windows click here
The purpose of this activity is to show you how to solve word problems involving the normal distribution. Most statistical software packages, much like the normal table, are set up to give answers to problems involving "less than." This means that the software does the "finding z-scores and looking up the table" work for us, but we still need to make sure that we pose the question in terms of "less than," and/or, if needed, adjust the answer that the software gives us.
Recall that we have two types of problems that are of interest: finding probabilities given values, and finding values given probabilities. Below are the instructions for both types.
Finding Probabilities (Given Values)
To illustrate this we'll use an example from a previous activity: Finding P(X > 700) where X is the SAT-M score which has a normal distribution with a mean of 507 and standard deviation of 111.
One advantage of R is that it will find probabilities without first requiring you to reduce the data to z-scores. To find the probability P(X < 700), start R and enter the command:
pnorm(700, mean=507, sd=111).
You should see that the probability is equal to 0.9589596. To calculate the probability P(X > 700) = 1 - P(X < 700), enter
1 - pnorm(700, mean=507, sd=111).
Note that when we solved this problem using the table in an earlier activity, we rounded z-scores and therefore our answer was slightly different, but very close (.0409).
Similarly, if we wanted to find P(400 < X < 600), we would need to do two separate calculations; one for P(X < 600), and one for P(X < 400), and subtract.
Now use R to work through the following exercise:
R provides: So roughly 7.66% of males are less than 65 inches tall.
R provides: (b) R provides: So roughly 1.6% of males are more than 75 inches tall.
(c) R provides: So roughly 71.6% of males are between 66 and 72 inches tall.
As mentioned before, is set up to find a value x that satisfies P(X < x) = some given probability. To illustrate this, we'll use an example from a previous activity: Finding the value of x that satisfies P(X > x) = .02 where X is the SAT-M score, which has a normal distribution with a mean of 507 and standard deviation of 111. Before we start, it will be useful to rephrase the problem in terms of "X < x"; we are looking for the value of x that satisfies P(X < x) = .98.
To do this with R, we're going to do the inverse of the procedure we described above.
This time, we're given the population mean and standard deviation, but instead of being given an X and asked to find the probability, we're given a probability and asked to find the corresponding X value.
To find the X value, enter the command:
qnorm(0.98, mean=507, sd=111)
R tells us that the value that we are looking for (the 98th percentile) is 734.966. Our solution in the previous activity using the table gave us the answer 734.55 due to rounding.
Now use R to work through the following exercise:
(a) Here we need to find the value x that satisfies P(X < x) = 0.005. R provides: So, in order for a male to be among the shortest 0.5% of males, he needs to be less than 61.8 inches tall.
Here we need to find the value x that satisfies P(X > x) = 0.0025, or P(X < x) = 0.9975. R provides: So, in order for a male to be among the tallest 0.25% of males, he needs to be more than 76.9 inches tall.