Our intuition tells us that the best estimator for \( \mu \) should be \( \overline{X} \), and the best estimator for p should be \( \hat{p} \).

In the Sampling Distributions module of the Probability unit, we learned about the sampling distributions of \( \overline{X} \) and found that as long as a sample is taken at random,

**the distribution of sample means is exactly centered at the value of population mean.**\( \overline{X} \) is therefore said to be an unbiased estimator for \( \mu \) . Any particular sample mean might turn out to be less than the actual population mean, or it might turn out to be more.

**But in the long run, such sample means are "on target" in that they will not underestimate any more or less often than they overestimate**.Likewise, we learned that the sampling distribution of the sample proportion, \( \hat{p} \), is centered at the population proportion p (as long as the sample is taken at random), thus making \( \hat{p} \) an unbiased estimator for p.

If the sample of U.S. adults in (example 2 on the previous page) was not random, but instead included predominantly college students, then .56 would be a biased estimate for p, the proportion of all U.S. adults who believe marijuana should be legalized. If the survey design were flawed, such as loading the question with a reminder about the dangers of marijuana leading to hard drugs, or a reminder about the benefits of marijuana for cancer patients,

**then .56 would be biased on the low or high side, respectively. Our point estimates are truly unbiased estimates for the population parameter only if the sample is random and the study design is not flawed**.Not only are sample mean and sample proportion on target as long as the samples are random, but their accuracy improves as sample size increases.

Recall that the sampling distribution of the sample mean \( \overline{X} \) is, as we mentioned before, centered at the population mean \( \mu \) and has a standard deviation of \( \frac{\sigma}{\sqrt{n}} \). As a result, as the sample size n increases, the sampling distribution of \( \overline{X} \) gets less spread out.

**This means that values of \( \overline{X} \) that are based on a larger sample are more likely to be closer to \( \mu \) (as the figure below illustrates):**Similarly, since the sampling distribution of \( \hat{p} \) is centered at p and has a standard deviation of \( \sqrt{\frac{p(1-p)}{n}} \), which decreases

**as the sample size gets larger, values of \( \hat{p} \) are more likely to be closer to p when the sample size is larger**.Another example of a point estimate is using sample variance, \( s^2 = \frac{(x_1 - \overline{x})^2 + ... + (x_n- \overline{x})^2}{n-1} \), to estimate population variance, \( \sigma^2 \).

**\( s^2 \) is an unbiased estimator for \( \sigma^2 \)**.**Division by n - 1 accomplishes the goal of making this point estimator unbiased**. Making unbiased estimators a top priority is, in fact, the reason that our formula for s, introduced in the Exploratory Data Analysis unit, involves division by n - 1 instead of by n.

**Explanation :**It is not an unbiased estimator for µ because the sample was not a random sample of 150 students from the entire student body. In addition, students who leave the university gym following a workout are likely students who exercise on a regular basis and therefore tend to exercise more, on average, than students in general.

**Explanation :**The larger the sample the point estimate is based on, the closer it is likely to be to the parameter it estimates

**Explanation :**The estimate is based on a random sample (and is therefore unbiased) and is also based on a larger sample, which makes it more accurate.

**Explanation :**The estimate is based on a random sample (and is therefore unbiased) and is also based on a large sample, which makes it more accurate.

Point estimation is simple and intuitive,

**but also a bit problematic. Here is why:**When we estimate, say, μ by the sample mean \( \overline{X} \), we are almost guaranteed to make some kind of error. Even though we know that the values of \( \overline{X} \) fall around μ,

**it is very unlikely that the value of \( \overline{X} \) will fall exactly at μ**.Given that such errors are a fact of life for point estimates (by the mere fact that we are basing our estimate on one sample that is a small fraction of the population), these estimates are in themselves of

**limited usefulness, unless we are able to quantify the extent of the estimation error**.Interval estimation addresses this issue. The idea behind interval estimation is, therefore, to enhance the simple point estimates by supplying information about the size of the error attached.

In this introduction, we'll provide examples that will give you a solid intuition about the basic idea behind interval estimation.

**Example :**Consider the example that we discussed in the point estimation section:Suppose that we are interested in studying the IQ levels of students in a University. In particular (since IQ level is a quantitative variable), we are interested in estimating μ, the mean IQ level of all the students in Univ. A random sample of 100 Univ students was chosen, and their (sample) mean IQ level was found to be \( \overline{X} = 115 \).

In point estimation we used \( \overline{X} = 115 \) as the point estimate for μ. However, we had no idea of what the estimation error involved in such an estimation might be.

**Interval estimation takes point estimation a step further and says something like:**"I am 95% confident that by using the point estimate \( \overline{X} = 115 \) to estimate μ, I am off by no more than 3 IQ points. In other words, I am 95% confident that μ is within

**3 of 115, or between 112 (115 - 3) and 118 (115 + 3).**"Yet another way to say the same thing is: I am 95% confident that μ is somewhere in (or covered by) the interval (112,118).

Note that while point estimation provided just one number as an estimate for μ (115), interval estimation provides a

**whole interval of "plausible values" for μ (between 112 and 118), and also attaches the level of our confidence that this interval**indeed includes the value of μ to our estimation (in our example, 95% confidence). The interval (112,118) is therefore called "a 95% confidence interval for μ."Let's look at another example:

**Example :**Let's consider the second example from the point estimation section.Suppose that we are interested in the opinions of U.S. adults regarding legalizing the use of marijuana. In particular, we are interested in the parameter p, the proportion of U.S. adults who believe marijuana should be legalized.

Suppose a poll of 1,000 U.S. adults finds that 560 of them believe marijuana should be legalized.

If we wanted to estimate p, the population proportion, by a single number based on the sample, it would make intuitive sense to use the corresponding quantity in the sample, the sample proportion \( \hat{p} = \frac{560}{1000} = 0.56 \).

Interval estimation would take this a step further and say something like:

"I am 90% sure that by using 0.56 to estimate the true population proportion, p, I am off by (or, I have an error of) no more than 0.03 (or 3 percentage points). In other words, I am 90% confident that the actual value of p is somewhere between 0.53 (0.56 - 0.03) and 0.59 (0.56 + 0.03)."

Yet another way of saying this is: "I am 90% confident that p is covered by the interval (0.53, 0.59)."

In this example, (0.53, 0.59) is a 90% confidence interval for p.

Suppose that we are interested in studying the IQ levels of students at Smart University (SU). In particular (since IQ level is a quantitative variable), we are interested in estimating μ, the mean IQ level of all the students at SU.

We will assume that from past research on IQ scores in different universities, it is known that the IQ standard deviation in such populations is \( \sigma = 15 \).

**In order to estimate μ , a random sample of 100 SU students was chosen, and their (sample) mean IQ level is calculated (let's not assume, for now, that the value of this sample mean is 115, as before).**We learned in the "Sampling Distributions" module of probability that according to the central limit theorem, the sampling distribution of the sample mean \( \overline{X} \) is approximately normal with a mean of μ and standard deviation of \( \frac{\sigma}{\sqrt{n}}\).

In our example, then, (where \( \sigma = 15 \) and n=100), the possible values of \( \overline{X} \), the sample mean IQ level of 100 randomly chosen students, is approximately normal, with mean μ and standard deviation \( \frac{15}{\sqrt{100}} = 1.5 \).

Next, we recall and apply the

**Standard Deviation Rule for the normal distribution**, and in particular its second part: There is a 95% chance that the sample mean we get in our sample falls within 2 * 1.5 = 3 of μ.Obviously, if there is a certain distance between the sample mean and the population mean, we can describe that distance by starting at either value. So, if the sample mean \( \overline{X} \) falls within a certain distance of the population mean μ, then the population mean μ falls within the same distance of the sample mean.

Therefore, the statement, "There is a 95% chance that the sample mean \( \overline{X} \) falls within 3 units of μ" can be rephrased as: "We are 95% confident that the population mean μ falls within 3 units of \( \overline{X} \)."

So, if we happen to get a sample mean of \( \overline{X} = 115 \), then we are 95% sure that μ falls within 3 of 115, or in other words that μ is covered by the interval (115 - 3, 115 + 3) = (112,118).

**Comment :**Note that the first phrasing is about \( \overline{X} \), which is a random variable; that's why it makes sense to use probability language. But the second phrasing is about μ, which is a parameter, and thus is a "fixed" value that doesn’t change, and that's why we shouldn’t use probability language to discuss it.

Suppose that we are interested in estimating the unknown population mean (μ) based on a random sample of size n.

**Further, we assume that the population standard deviation (σ) is known.**The values of \( \overline{x} \) follow a normal distribution with (unknown) mean μ and standard deviation \( \frac{\sigma}{\sqrt{n}} \) (known, since both σ and n are known).

**By the (second part of the) Standard Deviation Rule, this means that:**There is a 95% chance that our sample mean \( \overline{x} \) will fall within \( 2 * \frac{\sigma}{\sqrt{n}} \) of μ, which means that: We are 95% confident that μ falls within \( 2 * \frac{\sigma}{\sqrt{n}} \) of our sample mean \( \overline{x} \). Or, in other words,

**a 95% confidence interval for the population mean μ is :**\(\overline{x} - 2 * \frac{\sigma}{\sqrt{n}}, \overline{x} + 2 * \frac{\sigma}{\sqrt{n}}\)

**Here, then, is the general result:**\(\overline{x} \pm 2 * \frac{\sigma}{\sqrt{n}}\)Suppose a random sample of size n is taken from a normal population of values for a quantitative variable whose mean (μ) is unknown, when the standard deviation (σ) is given. A 95% confidence interval (CI) for μ is:

**Comment :**Note that for now we require the population standard deviation (σ) to be known. Practically, σ is rarely known, but for some cases, especially when a lot of research has been done on the quantitative variable whose mean we are estimating (such as IQ, height, weight, scores on standardized tests), it is reasonable to assume that σ is known. Eventually, we will see how to proceed when σ is unknown, and must be estimated with sample standard deviation (s).**Example :**An educational researcher was interested in estimating μ, the mean score on the math part of the SAT (SAT-M) of all community college students in his state. To this end, the researcher has chosen a random sample of 650 community college students from his state, and found that their average SAT-M score is 475. Based on a large body of research that was done on the SAT, it is known that the scores roughly follow a normal distribution with the standard deviation \( \sigma = 100 \) .Here is a visual representation of this story, which summarizes the information provided:

Based on this information, let's estimate μ with a 95% confidence interval. Using the formula we developed before, \(\overline{x} \pm 2 * \frac{\sigma}{\sqrt{n}}\), a 95% confidence interval for μ is:

\(475 - 2 * \frac{100}{\sqrt{650}}, 475 + 2 * \frac{100}{\sqrt{650}}\), which is (475 - 7.8 , 475 + 7.8) = (467.2, 482.8). In this case,

**it makes sense to round, since SAT scores can be only whole numbers, and say that the 95% confidence interval is (467,483).**We are 95% confident that the mean SAT-M score of all community college students in the researcher's state is covered by the interval (467,483). Note that the confidence interval was obtained by taking \( 475 \pm 8\) (rounded).

**This means that we are 95% confident that by using the sample mean \( \overline{x} = 475 \) to estimate μ, our error is no more than 8.**

**Scenario: Smoking During Pregnancy**

A study was done on pregnant women who smoked during their pregnancies. In particular, the researchers wanted to study the effect that smoking has on pregnancy length. A sample of 114 pregnant women who were smokers participated in the study and were followed until the birth of their child. At the end of the study, the collected data were analyzed and it was found that the average pregnancy length of the 114 women was 260 days. From a large body of research, it is known that length of human pregnancy has a standard deviation of 16 days

Based on this study, find a 95% confidence interval for μ, the mean pregnancy length of women who smoke during their pregnancy, and interpret your interval in context.

The problem provides the following information:

Our sample size is n = 114

This sample gives a sample mean of \( \overline{x} = 260 \)

Human pregnancy is known to have a standard deviation of σ = 16.

The 95% confidence interval for μ is therefore: \( 260 \pm 2*\frac{16}{\sqrt{114}} = 260 \pm 3\) or (257,263).

**Interpretation:**We are 95% confident that the mean pregnancy length of women who smoke during pregnancy is covered by the interval (257, 263).Note that the confidence interval was obtained by calculating 260 ± 3.

**Give an interpretation of the number 3. We are 95% confident that by using 260 days as the estimate for μ, our estimation error is no more than 3 days.**Note that the way a confidence interval is used is that we hope the interval contains the population mean μ.

**This is why we call it an "interval for the population mean."**We just saw that one interpretation of a 95% confidence interval is that we are 95% confident that the population mean (μ) is contained in the interval.

**Another useful interpretation in practice is that, given the data, the confidence interval represents the set of plausible values for the population mean μ.**

**Example**

As an illustration, let’s return to the example of mean SAT-Math score of community college students. Recall that we had constructed the confidence interval (467,483) for the unknown mean SAT-M score for all community college students.

Here is a way that we can use the confidence interval:

Do the results of this study provide evidence that μ, the mean SAT-M score of community college students, is lower than the mean SAT-M score in the general population of college students in that state (which is 480)?

The 95% confidence interval for μ was found to be (467,483). Note that 480, the mean SAT-M score in the general population of college students in that state, falls inside the interval,

**which means that it is one of the plausible values for μ.**This means that μ could be 480 (or even higher, up to 483), and therefore we cannot conclude that the mean SAT-M score among community college students in the state is lower than the mean in the general population of college students in that state.

**(Note that the fact that most of the plausible values for μ fall below 480 is not a consideration here.)**

**Comment**

Recall that in the formula for the 95% confidence interval for μ, \(\overline{x} \pm 2 * \frac{\sigma}{\sqrt{n}}\), the 2 comes from the Standard Deviation Rule, which says that any normal random variable (in our case \(\overline{x}\),

**has a 95% chance (or probability of .95) of taking a value that is within 2 standard deviations of its mean.**As you recall from the discussion about the normal random variable, this is only an approximation, and to be more accurate, there is a 95% chance that a normal random variable will take

**a value within 1.96 standard deviations of its mean.**Therefore, a more accurate formula for the 95% confidence interval for μ is \(\overline{x} \pm 1.96 * \frac{\sigma}{\sqrt{n}}\), which you'll find in most introductory statistics books.

**In this course, we'll use 2 (and not 1.96), which is close enough for our purposes.**

**Explanation :**The population mean (μ) won't necessarily be at the exact center of the confidence intervals. On the simulation, the population mean was represented by the green line, and it wasn't always at the exact center of the intervals.

**Explanation :**As you saw on the simulation, each interval was centered at a red dot which represented a sample mean (x̄). In fact, a confidence interval is always made with the sample mean (x̄) at the center.

**Explanation :**The population mean (μ) is a parameter, so it doesn't change. On the simulation, the population mean was indicated by the green line, which didn't change position along the axis.

**Explanation :**The sample mean (x̄) is a statistic that changes from sample to sample. On the simulation, each sample mean (x̄) was represented by a red dot in the middle of each interval, and each red dot was in a different position along the axis

**Explanation :**In fact, 100% of the intervals will contain their sample mean, not only 95% of the intervals, because each confidence interval is centered at the sample mean (x̄); each x̄ was represented on the simulation by the red dot in the middle of each interval

**Explanation :**With the confidence level set at 95%, you saw that in the long run (if you selected many thousands of samples) 95% of the intervals would cover the green line, which represented the population mean (μ).

**Explanation :**The 95% confidence interval says that that in the long run (if you selected many thousands of samples) 95% of the intervals would cover the population mean (μ).

The most commonly used level of confidence is 95%. However, we may wish to increase our level of confidence and produce an interval that's almost certain to contain μ. Specifically, we may want to report an interval for which we are 99% confident that it contains the unknown population mean, rather than only 95%.

Using the same reasoning as in the last comment, in order to create a 99% confidence interval for μ, we should ask: There is a probability of .99 that any normal random variable takes values within how many standard deviations of its mean? The precise answer is 2.576, and therefore, a 99% confidence interval for μ is \( \overline{x} \pm 2.576 * \frac{\sigma}{\sqrt{n}} \).

Another commonly used level of confidence is a 90% level of confidence. Since there is a probability of 0.90 that any normal random variable takes values within 1.645 standard deviations of its mean, the 90% confidence interval for μ is \( \overline{x} \pm 1.645 * \frac{\sigma}{\sqrt{n}} \).

**Example :**Let's go back to our first example, the IQ example:The IQ level of students at a particular university has an unknown mean (μ) and known standard deviation \( \sigma = 15 \). A simple random sample of 100 students is found to have a sample mean IQ \( \overline{x} = 115 \) Estimate μ with a 90%, 95%, and 99% confidence interval.

A 90% confidence interval for μ is \( \overline{x} \pm 1.645 * \frac{\sigma}{\sqrt{n}} = 115 \pm 1.645(\frac{15}{\sqrt{100}}) = 115 \pm 2.5 = (112.5,117.5)\).

A 95% confidence interval for μ is \( \overline{x} \pm 2 * \frac{\sigma}{\sqrt{n}} = 115 \pm 2(\frac{15}{\sqrt{100}}) = 115 \pm 3 = (112,118)\).

A 99% confidence interval for μ is \( \overline{x} \pm 2.576 * \frac{\sigma}{\sqrt{n}} = 115 \pm 2.576(\frac{15}{\sqrt{100}}) = 115 \pm 4.0 = (111,119)\).

The Golden Retriever Club of America conducted a study of 64 golden retrievers, and found the average age at death in the sample to be 11.0 years old. Let’s assume the standard deviation of golden retriever lifespan is known to be 1.2 years (this is consistent with studies and with some other dog breeds). Give three confidence interval estimates for the unknown mean age at death for golden retrievers: first using 90% confidence, then 95%, and finally 99%. Please report your intervals in parenthesis notation, and please round your final values to the nearest tenth for simplicity. Be sure to notice the size of the intervals with the different confidence levels

The 90% confidence estimate for μ is (10.8, 11.2).

The 95% confidence estimate for μ is (10.7, 11.3).

The 99% confidence estimate for μ is (10.6, 11.4).

The wider 99% confidence interval (111, 119) gives us a less precise estimation about the value of μ than the narrower 90% confidence interval, because the smaller interval 'narrows-in' on the plausible values of μ.

The important practical implication here is that researchers must decide whether

**they prefer to state their results with a higher level of confidence or produce a more precise interval.**In other words,**There is a trade-off between the level of confidence and the precision with which the parameter is estimated.**The price we have to pay for a higher level of confidence is that the unknown population mean will be estimated with less precision (i.e., with a wider confidence interval).

**If we would like to estimate μ with more precision (i.e. a narrower confidence interval), we will need to sacrifice and report an interval with a lower level of confidence.**

**Scenario: Exercise Habits**

In a recent study 1,115 males 25 to 35 years of age were randomly chosen and asked about their exercise habits. Based on the study results, the researchers estimated the mean time that a male 25 to 35 years of age spends exercising with 90%, 95%, and 99% confidence intervals. These were (not necessarily in the same order):

**Explanation :**The widest interval gives us the most confidence of capturing the population mean, because it covers more of the number line.

**Explanation :**The confidence percentage measures our confidence that the interval captures the population mean, so the widest interval must be the one with the largest confidence.

**Explanation :**Indeed, since this is the narrowest confidence interval, it is the one that provides the most precise estimation of the unknown mean.

**Explanation :**Indeed, since there is a trade-off between the level of confidence and precision, the interval that provides the most precise estimation must be the one with the lowest level of confidence.

The price you pay for a higher level of confidence is a lower level of precision of the interval (i.e., a wider interval).

Understanding the General Structure of the Confidence Intervals

We explored the confidence interval for μ for different levels of confidence and found that, in general, it has the following form:

\( \overline{x} \pm z* . \frac{\sigma}{\sqrt{n}} \) , where z* is a general notation for the multiplier that depends on the level of confidence. As we discussed before:

For a 90% level of confidence, z* = 1.645

For a 95% level of confidence, z* = 2 (or 1.96 if you want to be really precise)

For a 99% level of confidence, z* = 2.576

To start our discussion about the structure of the confidence interval, let's denote the \( z* . \frac{\sigma}{\sqrt{n}} \) formula by m.

The confidence interval, then, has the form: \( \overline{x} \pm m \):

\( \overline{x} \) is the sample mean, the point estimator for the unknown population mean (μ).

m is called the margin of error, since it represents the maximum estimation error for a given level of confidence.

For example, for a 95% confidence interval, we are 95% sure that our estimate will not depart from the true population mean by more than m, the margin of error.

**m is further made up of the product of two components:**z*, the confidence multiplier, and

\( \frac{\sigma}{\sqrt{n}} \), which is the standard deviation of \( \overline{X} \), the point estimator of μ.

Here is a summary of the different components of the confidence interval and its structure:

This structure: \( estimate \pm margin of error \) ,

where the margin of error is further composed of the product of a confidence multiplier and the standard deviation (or, as we'll see, the standard error) is the

**general structure of all confidence intervals that we will encounter in this course.**Obviously, even though each confidence interval has the same components, what these components actually are is different from confidence interval to confidence interval,

**depending on what unknown parameter the confidence interval aims to estimate.**Since the structure of the confidence interval is such that it has a margin of error on either side of the estimate, it is centered at the estimate (in our case, \( \overline{x} \)), and its width (or length) is exactly twice the margin of error:

**The margin of error, m, is therefore "in charge" of the width (or precision) of the confidence interval**, and the estimate is in charge of its location (and has no effect on the width).

**Explanation :**The confidence interval must be centered at the sample mean (7.1 in our case), and this is the only interval of the three that satisfies this requirement.

**Explanation :**Because the width of the confidence interval is 0.4 (6.9 - 6.5), the margin of error must be 0.4 / 2 = 0.2.

**Explanation :**6.9 is the center of the confidence interval [(6.5 + 7.3)/2] and is therefore the sample mean.

**Explanation :**The 99% confidence interval is wider than the 95% confidence interval, and since both confidence intervals were constructed using the same data, they are both centered at the same sample mean (6.1 in this case).