Friday, May 8, 2009

May 9, 2009 - Extra Credit Questions

In your work groups, you will receive extra credit for each question you answer correctly (post answers under your work group):

1. In a school, 50 boys and 50 girls are randomly selected to test the claim that the mean weights for 10-year-old boys and 10-year-old girls are the same. Which of the following tests should be used in this scenario?

a) One-sample test
b) Two-sample test

2. Which of the following is a valid pair of hypotheses?
a) H0:μ1=μ2;Ha:μ1≠μ2

b) 0 1 2 1 2 : ; :a H μ >μ H μ ≤μ
c) 0 1 2 1 2 : ; :a H μ ≠μ H μ =μ
d) 0 1 2 1 2 : ; :a H μ <μ H μ ≥μ

3. To test the claim that μ1 = μ2 , two samples are randomly selected from each population. If a hypothesis test is performed, how should you interpret a decision that rejects the null hypothesis?

a) There is sufficient evidence to support the claimμ1 = μ2 .
b) There is sufficient evidence to reject the claim μ1 = μ2 .
c) There is not sufficient evidence to support the claim μ1 = μ2 .
d) There is not sufficient evidence to reject the claim μ1 = μ2 .

4. For two-sample t-tests for small independent samples, the following condition or
conditions have to be met:

a) The samples must be randomly selected.
b) The samples must be independent.
c) Each population must have a normal distribution.
d) Each population must have a t-distribution.

5. For a two-sample t-test, if the test statistic is outside the rejection region, or regions, which of the following decisions should be made?

a) Fail to reject the alternative hypothesis.
b) Reject the alternative hypothesis.
c) Fail to reject the null hypothesis.
d) Reject the null hypothesis.

6. Two samples are considered independent if the sample selected from one population is not related to the sample selected from the second population.

a) True
b) False

7. For a two-sample hypothesis test, the alternative hypothesis is a statistical hypothesis that is reasonable when the null hypothesis is not accepted.

a) True
b) False

8. When doing 2 sample hypothesis tests, which of the following tests would you conduct if the samples sizes are less than 30, the populations are normal, the standard deviations are unknown, and the population variances are equal?

a) z-Test
b) t-Test

9. Given H0: μ1 ≥ μ2 , critical value t0 = −1.350, and standardized test statistic
t = −4.038 , you will:

a) Reject the null hypothesis.
b) Fail to reject the null hypothesis.

10. A crash test claims that the mean bumper repair cost for small cars is less than that for midsize cars. After conducting a t-test, a decision is made to reject the null hypothesis at the 5% level. The decision implies that:

a) There is enough evidence at the 5% level to support the claim that the mean bumper
repair cost is less for small cars than it is for midsize cars.
b) There is insufficient evidence at the 5% level to support the claim that the mean bumper repair cost is less for small cars than it is for midsize cars.
c) There is insufficient evidence at 5% level to reject the claim that the mean bumper repair cost is less for small cars than it is for midsize cars.
d) There is enough evidence at the 5% level to reject the claim that the mean bumper repair cost is less for small cars than it is for midsize cars.

May 9, 2009 - Content Covered

CONTENT COVERED

Elementary Statistics—Picturing the World:
o Chapter 9, Section 9.1, “Correlation,” pp. 458–473
o Chapter 9, Section 9.2, “Linear Regression,” pp. 474–483

May 9, 2009 - Animation, follow this www link

http://media.pearsoncmg.com/ph/esm/esm_larson_statlet_questions_2e/Leverage_Statlet/leverage.html

Included here is an animation that illustrates the concept of leverage that all points do not have equal influence on a fitted regression line.

May 9, 2009 - Problem One

Problem 1:

Excel has built-in functionalities that facilitate graphing of scatter plots and calculation of correlation coefficients r.

Follow the instructions listed in Student’s Solution & Technology Manual,
pages 341–344, to construct a scatter plot and calculate the correlation coefficient.

May 9, 2009 - Correlation Notes

A. Correlation

A correlation is a relationship between two variables.

The correlated variables can be represented by an ordered pair (x, y) where x is the
independent, or explanatory, variable, and y is the dependent, or response, variable.
A scatter plot is the graphical representation of the ordered pairs in the form of points in a coordinate plane.

The correlation coefficient is a mathematical measure of the strength and the direction of a linear relationship between two variables. The symbol r represents the sample correlation coefficient. The range of r is –1 to 1.
Types of correlation:

• Negative linear correlation: A negative linear correlation implies that when x
increases, y tends to decrease. When r approaches –1, x and y are said to have a strong negative, or inverse, linear correlation.


• Positive linear correlation: A positive linear correlation implies that when x
increases, y tends to increase. When r approaches 1, x and y are said to have a strong positive, or direct, linear correlation.

• No correlation or weak correlation: No correlation or a weak linear correlation
implies that the magnitude of x has little or no effect on the magnitude of y. When r is near 0, x and y are said to have no correlation or a weak linear correlation.

• Nonlinear correlation: A scatter plot of the data shows a pattern, but it is not that of a line. It might resemble a U, an arc, or some other shape.
Cause and effect relations refer to situations when two variables, x and y, are related in such a way that changing one variable causes the other to change. Statisticians are careful to avoid claiming that because x and y are correlated, x causes y. In other words, you want to emphasize that correlation does not imply causation.

B. Linear Regression

The technique of fitting a linear equation to real data points gives a line called a regression line. This line is used to predict the value y—the response variable—for a given x, often called the predictor variable.

The equation of a regression line for an independent variable x and a dependent variable y is written as ŷ = mx + b, where m is the slope of the equation and ŷ is the predicted y-value for a given x-value.

May 9, 2009 - Review and Analysis - Quick T/F Quiz

We have learned basic concepts and formulas to decide whether a relationship exits
between two variables. They learn to present paired data in a scatter plot and to calculate and interpret a correlation coefficient. If the data elements are found to be correlated, the students can find a linear equation that best models the relationship and draw a regression line using Excel. They also learn to use the equation of regression line to predict a y-value for a given x-value.

True or False - Answer in your work groups and post:

1. If there is a strong correlation between two variables, you can conclude that one
variable caused the other.

2. Correlation coefficient r close to –1 implies that there is no correlation between the two variables.

3. A correlation is a relationship between more than two variables.

4. A regression line is the line that maximizes the residuals.

5. The equation of a regression line can be used to predict the independent variable x value for a given y-value.

May 9, 2009 - CORRELATION AND REGRESSION

Overview - Key Concepts

A. Introduction to Correlation
a) Scatter plots
b) Correlation coefficient
B. Linear Regression
a) Defining regression line
b) Graphing regression line

Friday, May 1, 2009

May 2, 2009 - Extra Credit Questions

BONUS CREDIT QUESTIONS - For each question you post in your work groups correctly, you will be given 1 extra point, for a possibility of 10 extra points.


1. A random sample of 200 high school seniors is given the SAT-V test. The mean score for
this sample is 483. Assuming that the population is normally distributed, the mean score
μ for all high school seniors will be:

a) 2.415
b) 200
c) 483

2. Given the same sample statistics, which level of confidence will produce the narrowest
confidence interval?

a) 75%
b) 85%
c) 90%
d) 99%

3. The margin of error is present only when the sample size is less than 30.
a) True
b) False

4. When a t-distribution is used to estimate a population mean, the degrees of freedom are
equal to one less than the sample size.

a) True
b) False

5. A t-distribution is used when the random variable is distributed normally, the sample size
is < 30 and the value of σ (sigma) is unknown.

a) True
b) False

6. The critical value that corresponds to a 95% confidence level is:

a) ±1.645
b) ±1.96
c) ±2.33
d) ±2.575

7. The point estimate is a single value estimate for a sample statistic.

a) True
b) False


8. The most unbiased point estimate of the population mean μ is the sample mean.

a) True
b) False

9. Which of the following statements describe the properties of a t-distribution?

a) The t-distribution is bell-shaped and symmetric about the mean.
b) The total area under a t-curve is 1 or 100%.
c) The mean, median, and mode of the t-distribution are equal to zero.
d) As the degrees of freedom decrease, the t-distribution approaches the normal distribution.

10. The critical value, tc, for c = 0.99 and n = 10 is:

a) 1.833
b) 2.262
c) 2.281
d) 3.250

May 2, 2009 - Homework, Chapter 8

Complete the following 4 questions in your work groups and post as a team. *** I realize these questions are extremely difficult - and to that end, Dustin is in the LRC (Stats help) and you are encouraged to work with him for additional help. I am also distributing PowerPoint notes to help.

Complete the following exercises from your textbook
Elementary Statistics—Picturing the World:

1. Section 8.1, Exercise #4, p. 409
2. Section 8.1, Exercise #18, p. 411
3. Section 8.2, Exercise #14, p. 421
4. Section 8.2, Exercise #20, p. 422

May 2, 2009 - Final Project Outline (optional)

For those of you interested in working on a final project, the following is an example outline . The final project must be in a "PowerPoint" or MS Word format.

Steps in Statistical Stud<span style="font-weight: bold;">Des</span> Project Outline (a guide)

I. Identifying the "subject" of your study

  • What is the question? (What are my hypotheses?)- 1 slide
  • Is the data obtainable? (birth weight, socio economic, drugs, alcohol)- 1 slide
  • Is it ethical to obtain such data?
  • If not, is there a reasonable substitute?
  • Are the assumptions reasonable?- 1 slide
II. Designing
  • Identify the population of interest- 1 slide
  • Survey- several slides (how would you design the survey, you do not have to actually do the survey) * * * although, for extra credit (5 points) you can do a survey
    • Obtain a representative sample of that population- 1 slide
      • Simple Random Sampling
      • Stratified Sampling (M-F, Age groups)
      • Systematic Sampling (class roster, census list)
      • Multi-Stage Sampling
    • Sources of Bias- 1-2 slides
      • Voluntary Response
      • Non-response bias (day phone)
      • Response bias (people lie)
      • Undercoverage
  • Observational Studies- 1-3 slides
    • Used when a designed experiment is not ethical
    • Subjects studied over a period of time in natural setting
    • Case/Control – Control must match
    • Record Variables of interest
    • Confounding is a major issue
  • Designing an Experiment1-5 slides
    • Researcher has control over the subjects or units in the study
    • An intervention takes place that otherwise would not occur
    • Randomization used to assign treatments
    • Strongest case for causality
  • EDA – Exploratory Data Analysis (trends, relationships, differences) - optional, 1 slide
  • Pilot Study
III. Collecting Data 1-3 slides
  • Identify variables
  • Identify types of variables
    • Qualitative
    • Quantitative
  • Identify Limits of measurement or observation
IV. Analyze the data 1-3 slides
  • Use proper procedures and techniques.
  • Check the assumptions behind the procedures and techniques.
V. Make Conclusions and Discuss Limitations 1-3 slides
  • What are the answers to the original hypotheses?
  • What are the limitations of the study?
  • What conclusions does the study not make?
  • What new questions arise from this study?

May 2, 2009 - Homework

Next week - please read Chapter 9.

CORRELATION AND
REGRESSION

May 2, 2009 - Summary

Class Summary:

Topics covered continue the previous discussion on hypothesis testing presented in
Unit 8 in which we learn to test a claim about a population mean and the difference of
means between two populations.

We learned to identify when they can conduct a z-test or a t-test for large and
small sample sizes and make a decision based on testing results.

May 2, 2009 - Class Notes

A. Testing the Difference Between Means (Large Independent Samples)

A null hypothesis, H0, is a statistical hypothesis that usually states that there is no
difference between the parameters of two populations. The null hypothesis always contains
the symbol ≤, =, or ≥.

An alternative hypothesis, Ha, is a statistical hypothesis that is true when H0 is false. The
alternative hypothesis always contains the symbol >, ≠, or <.

The Central Limit Theorem states that the difference of the sample means is normally
distributed when the following conditions are satisfied:
• The samples are randomly selected.
• The samples are independent.
• Each sample size is at least 30, or, if not, each population has a normal distribution
with a known standard deviation σ.
These three conditions are often called the assumptions of the statistical test.
When the difference of sample means is normally distributed:

B. Testing the Difference Between Means (Small Independent Samples)
When small samples—n < 30—are used and the population standard deviation is unknown,
the Central Limit Theorem does not apply. In this case, you can use a t-test to test the
difference between two population means μ1 and μ2 if the following conditions are met:

• The samples are randomly selected.
• The samples are independent.
• Each population has a normal distribution.

May 2, 2009 - HYPOTHESIS TESTING WITH TWO SAMPLES

Key Objectives for the class:

A. Testing the Difference Between Means (Large Independent Samples)
a) Defining null and alternative hypothesis
b) The Central Limit Theorem
c) Guidelines for a two-sample z-test for the difference between means

B. Testing the Difference Between Means (Small Independent Samples)
a) Guidelines for a two-sample t-test for the difference between means

May 2, 2009 - PowerPoint Notes

I will be distributing a CD ROM disc with copies of my notes with PowerPoint presentation for final exam/or project study purposes. The PowerPoints are an excellent review.