**Jon Starkweather, PhD**

Jon Starkweather, PhD

`jonathan.starkweather@unt.edu`

Consultant

**R**esearch and **S**tatistical **S**upport

`http://www.unt.edu`

`http://researchsupport.unt.edu`

The Research and Statistical Support (RSS) office at the University of North Texas hosts a number of ``Short Courses''.

Consultant

The Research and Statistical Support (RSS) office at the University of North Texas hosts a number of ``Short Courses''.

A list of them is available at:

1. Sampling Distributions

From a score to a distribution of scores

- Moving toward more realistic research applications an examples.
- From a sample of one to a sample of many...
- Does this individual (Scooby) come from or differ from a particular population?
- We were given and , and compared Scooby's Z-score to the Z-critical value.

- Now we ask, does this sample of 3 cartoon dogs (Scooby, Pluto, & Goofy) differ from a particular population?
- Now, we are given and , and must compare a sample mean to the
**Distribution of Means**.

- Now, we are given and , and must compare a sample mean to the

- Does this individual (Scooby) come from or differ from a particular population?

Distribution of Means

- The Distribution of Means refers to a distribution of means built from infinite random samples with replacement taken from the population of interest.
- Also called: the Sampling Distribution of the Mean, as well as the Distribution of Sample Means.

3 rules for the Distribution of Means

- The mean of the distribution of means is the same as the mean of the population.
- The variability of the distribution of means is less than the variability of the population.
- So, we correct for this bias.

- The shape of the distribution of means is approximately normal if (a) each sample is of 30 or greater individuals, or (b) the distribution of the population is normal.

More on Rule 1

**Rule 1:**The mean of a distribution of means is the same as the mean of the population of individuals.- When taking repeated random samples with replacement, the mean of the resulting distribution of means
*must*be the population mean.- Extreme means can not be as extreme as extreme scores, and they will balance out.
- Central Limit Theorem

Sampling Error

- Sampling Error
- Samples, by their very nature, can never be
*completely*representative of their respective populations.- Sample is defined as a subset of a population; not all the individuals in the population will be included in a sample.

- Each sample contains variability due to chance (the chance associated with which individuals were chosen for the sample).
- Even with large numbers of large samples, we may not have any single sample mean which is exactly equal to the population mean.
- Larger sample sizes are always better than smaller sample sizes.

- Samples, by their very nature, can never be

Central Limit Theorem

- Central Limit Theorem
- Mathematical proof showing that no matter the shape of a population distribution, with an infinite number of random samples (with replacement), the distribution of sample means will be normally distributed around a mean which is equal to the population mean.

More on Rule 2 (Part A)

**Rule 2a:**The variance of a distribution of means is the variance of the population of individuals divided by the number of individuals in each sample.- This corrects for the known bias of the distribution of means' variability.
- The variability of the distribution of means is always less than the population's variability.

More on Rule 2 (Part B)

**Rule 2b:**The standard deviation of a distribution of means is the square root of the variance of the distribution of means.- The standard deviation of a distribution of means is called the
**Standard Error of the Mean (**or just the*SEM*)**Standard Error (**.*SE*)

More on Rule 3

**Rule 3:**The shape of a distribution of means is approximately normal if (a) each sample is of 30 or greater individuals, or (b) the distribution of the population is normal.- The statement above is proven by the Central Limit Theorem.

- Essentially, we can very safely assume the distribution of means is always normal.
- Therefore, we can use what we know about the standard normal curve.
- Percentage of scores, probabilities, and values of Z-scores.

Z-score for a mean?

- We know the Z-score formula, but we must now make adjustments for the distribution of means and identifying the Z-score of a mean on the distribution of means.
- Essentially, we're doing the same
**Z-test**we were doing in the previous module, just modified slightly.

2. Z-test with Means Example

Remember Scooby?

In Module 5, we had the example research question: Is Scooby's IQ *greater than* that of dogs not in cartoons?

- Given the IQ among dogs not in cartoons is normally distributed with ( )

Now, we have a sample of 3 cartoon dogs (Scooby, Pluto, & Goofy). We are expanding our research to include a sample greater than one individual, but we are pursuing the same research question. Do cartoon dogs have significantly higher IQ than other dogs (i.e., those not on cartoons)?

The Sample Data

Table 1: Raw Data

X | Dog | IQ |

1 | Scooby | 123 |

2 | Pluto | 145 |

3 | Goofy | 133 |

**Step 1**

Define the populations and restate the research question as null and alternative hypotheses.

Population 1: Dogs on cartoons.

Population 2: Dogs not on cartoons.

Null Hypothesis:

Alternative Hypothesis:

**Step 2**

Determine the characteristics of the comparison distribution.

- Given:
- Given: sample

Variance of the distribution of means:

**Step 3**

Determine the cutoff sample score on the comparison distribution at which the null should be rejected.

- Given: Significance level = .05
- Given: (one-tailed test)

**Step 4**

Compute or calculate your sample statistic.

**Step 5**

Compare and make a decision about whether to reject the null hypothesis or fail to reject the null hypothesis.

- Our sample's mean IQ is higher than 95% of the population of dogs not in cartoons.
- Reject the null hypothesis.

3. Confidence Intervals

Confidence Intervals

- Traditionally, social science used point estimates when attempting to ascertain a population value from sample data.
- Using a sample statistic to estimate a
**specific**population parameter. - For example, using to estimate

- Using a sample statistic to estimate a
- Interval estimates have gained popularity, in part because they tend to give us a better
*idea*of where the population value is located.- Using a range of possible values that likely include the
*unknown*population parameter.

- Using a range of possible values that likely include the

Calculating an interval estimate

- Back to the distribution of means...
- Dealing with distributions of means for most of the upcoming modules.
- Standard deviation of a distribution of means is the
**Standard Error**(*SE*):

- Confidence Interval (CI).
- Using the
*SE*, we can calculate the range above and below the mean on a distribution of means that reflects the uncertainty associated with our sample.

- Using the

Confidence Limits

- The
*limits*of our confidence interval are based on the Standard Error (*SE*), the mean of our sample, and the level of confidence we want to have.- Typically 95% confidence interval.
- 95% translates to +1.96 and -1.96 Z-scores above and below the mean on the standard normal curve.

Calculating the Limits

- If our sample yields a mean of 133.67 with a
*SE*of 8.67, then we can use a simple conversion to find the upper and lower limits of our interval estimate (i.e., confidence interval). - Upper Limit:

(+Z-cutoff)(*SE*) + = upper limit

- Lower Limit

(-Z-cutoff)(*SE*) + = lower limit

- So, the 95% confidence interval from our sample is 150.66 to 116.68.

Interpretation of Confidence Intervals

- Generally speaking, confidence intervals
*mean*:- 95% of the confidence intervals calculated on infinite samples of this population would contain the population mean.

- Interpretation of the confidence interval we calculated:
- If we were to take an infinite number of samples of dogs on cartoons, 95% of those samples' means would be between 150.66 and 116.68.

- Remember, the population mean is fixed (but unknown); while each sample has its own mean (sample means fluctuate).

Controversy of Confidence Intervals

- What we really want to know is the range that includes the population mean with or we wish we could get a 95% chance that the interval includes the population mean.
- Because we are dealing with a sample and a distribution of samples; we do not really know what the population value is and therefore, can not know what that interval would be.

4. Summary of Module 6

Summary of Module 6

Module 6 covered the following topics:

- The Distribution of Means
- Z-test with Means
- Confidence Intervals

This concludes Module 6

Next time Module 7.

- Next time we'll begin covering Additions to Statistical Significance.
- Until next time; have a nice day.

This page was last updated on: October 8, 2010

This page was created using L^{A}TEX. This document was created in L^{A}TEX and converted to HTML using L^{A}TEX2HTML.

Return to the Short Course page by clicking the link below.

This page was created using L

Return to the Short Course page by clicking the link below.