Boise Statistics

Friday, May 8, 2009

May 9, 2009 - Extra Credit Questions

In your work groups, you will receive extra credit for each question you answer correctly (post answers under your work group):

1. In a school, 50 boys and 50 girls are randomly selected to test the claim that the mean weights for 10-year-old boys and 10-year-old girls are the same. Which of the following tests should be used in this scenario?

a) One-sample test
b) Two-sample test

2. Which of the following is a valid pair of hypotheses?
a) H0:μ1=μ2;Ha:μ1≠μ2

b) 0 1 2 1 2 : ; :a H μ >μ H μ ≤μ
c) 0 1 2 1 2 : ; :a H μ ≠μ H μ =μ
d) 0 1 2 1 2 : ; :a H μ <μ H μ ≥μ

3. To test the claim that μ1 = μ2 , two samples are randomly selected from each population. If a hypothesis test is performed, how should you interpret a decision that rejects the null hypothesis?

a) There is sufficient evidence to support the claimμ1 = μ2 .
b) There is sufficient evidence to reject the claim μ1 = μ2 .
c) There is not sufficient evidence to support the claim μ1 = μ2 .
d) There is not sufficient evidence to reject the claim μ1 = μ2 .

4. For two-sample t-tests for small independent samples, the following condition or
conditions have to be met:

a) The samples must be randomly selected.
b) The samples must be independent.
c) Each population must have a normal distribution.
d) Each population must have a t-distribution.

5. For a two-sample t-test, if the test statistic is outside the rejection region, or regions, which of the following decisions should be made?

a) Fail to reject the alternative hypothesis.
b) Reject the alternative hypothesis.
c) Fail to reject the null hypothesis.
d) Reject the null hypothesis.

6. Two samples are considered independent if the sample selected from one population is not related to the sample selected from the second population.

a) True
b) False

7. For a two-sample hypothesis test, the alternative hypothesis is a statistical hypothesis that is reasonable when the null hypothesis is not accepted.

a) True
b) False

8. When doing 2 sample hypothesis tests, which of the following tests would you conduct if the samples sizes are less than 30, the populations are normal, the standard deviations are unknown, and the population variances are equal?

a) z-Test
b) t-Test

9. Given H0: μ1 ≥ μ2 , critical value t0 = −1.350, and standardized test statistic
t = −4.038 , you will:

a) Reject the null hypothesis.
b) Fail to reject the null hypothesis.

10. A crash test claims that the mean bumper repair cost for small cars is less than that for midsize cars. After conducting a t-test, a decision is made to reject the null hypothesis at the 5% level. The decision implies that:

a) There is enough evidence at the 5% level to support the claim that the mean bumper
repair cost is less for small cars than it is for midsize cars.
b) There is insufficient evidence at the 5% level to support the claim that the mean bumper repair cost is less for small cars than it is for midsize cars.
c) There is insufficient evidence at 5% level to reject the claim that the mean bumper repair cost is less for small cars than it is for midsize cars.
d) There is enough evidence at the 5% level to reject the claim that the mean bumper repair cost is less for small cars than it is for midsize cars.

May 9, 2009 - Content Covered

CONTENT COVERED

Elementary Statistics—Picturing the World:
o Chapter 9, Section 9.1, “Correlation,” pp. 458–473
o Chapter 9, Section 9.2, “Linear Regression,” pp. 474–483

May 9, 2009 - Animation, follow this www link

http://media.pearsoncmg.com/ph/esm/esm_larson_statlet_questions_2e/Leverage_Statlet/leverage.html

Included here is an animation that illustrates the concept of leverage that all points do not have equal influence on a fitted regression line.

May 9, 2009 - Problem One

Problem 1:

Excel has built-in functionalities that facilitate graphing of scatter plots and calculation of correlation coefficients r.

Follow the instructions listed in Student’s Solution & Technology Manual,
pages 341–344, to construct a scatter plot and calculate the correlation coefficient.

May 9, 2009 - Correlation Notes

A. Correlation

A correlation is a relationship between two variables.

The correlated variables can be represented by an ordered pair (x, y) where x is the
independent, or explanatory, variable, and y is the dependent, or response, variable.
A scatter plot is the graphical representation of the ordered pairs in the form of points in a coordinate plane.

The correlation coefficient is a mathematical measure of the strength and the direction of a linear relationship between two variables. The symbol r represents the sample correlation coefficient. The range of r is –1 to 1.
Types of correlation:

• Negative linear correlation: A negative linear correlation implies that when x
increases, y tends to decrease. When r approaches –1, x and y are said to have a strong negative, or inverse, linear correlation.

• Positive linear correlation: A positive linear correlation implies that when x
increases, y tends to increase. When r approaches 1, x and y are said to have a strong positive, or direct, linear correlation.

• No correlation or weak correlation: No correlation or a weak linear correlation
implies that the magnitude of x has little or no effect on the magnitude of y. When r is near 0, x and y are said to have no correlation or a weak linear correlation.

• Nonlinear correlation: A scatter plot of the data shows a pattern, but it is not that of a line. It might resemble a U, an arc, or some other shape.
Cause and effect relations refer to situations when two variables, x and y, are related in such a way that changing one variable causes the other to change. Statisticians are careful to avoid claiming that because x and y are correlated, x causes y. In other words, you want to emphasize that correlation does not imply causation.

B. Linear Regression

The technique of fitting a linear equation to real data points gives a line called a regression line. This line is used to predict the value y—the response variable—for a given x, often called the predictor variable.

The equation of a regression line for an independent variable x and a dependent variable y is written as ŷ = mx + b, where m is the slope of the equation and ŷ is the predicted y-value for a given x-value.

May 9, 2009 - Review and Analysis - Quick T/F Quiz

We have learned basic concepts and formulas to decide whether a relationship exits
between two variables. They learn to present paired data in a scatter plot and to calculate and interpret a correlation coefficient. If the data elements are found to be correlated, the students can find a linear equation that best models the relationship and draw a regression line using Excel. They also learn to use the equation of regression line to predict a y-value for a given x-value.

True or False - Answer in your work groups and post:

1. If there is a strong correlation between two variables, you can conclude that one
variable caused the other.

2. Correlation coefficient r close to –1 implies that there is no correlation between the two variables.

3. A correlation is a relationship between more than two variables.

4. A regression line is the line that maximizes the residuals.

5. The equation of a regression line can be used to predict the independent variable x value for a given y-value.

May 9, 2009 - CORRELATION AND REGRESSION

Overview - Key Concepts

A. Introduction to Correlation
a) Scatter plots
b) Correlation coefficient
B. Linear Regression
a) Defining regression line
b) Graphing regression line

Friday, May 1, 2009

May 2, 2009 - Extra Credit Questions

BONUS CREDIT QUESTIONS - For each question you post in your work groups correctly, you will be given 1 extra point, for a possibility of 10 extra points.

1. A random sample of 200 high school seniors is given the SAT-V test. The mean score for
this sample is 483. Assuming that the population is normally distributed, the mean score
μ for all high school seniors will be:

a) 2.415
b) 200
c) 483

2. Given the same sample statistics, which level of confidence will produce the narrowest
confidence interval?

a) 75%
b) 85%
c) 90%
d) 99%

3. The margin of error is present only when the sample size is less than 30.
a) True
b) False

4. When a t-distribution is used to estimate a population mean, the degrees of freedom are
equal to one less than the sample size.

a) True
b) False

5. A t-distribution is used when the random variable is distributed normally, the sample size
is < 30 and the value of σ (sigma) is unknown.

a) True
b) False

6. The critical value that corresponds to a 95% confidence level is:

a) ±1.645
b) ±1.96
c) ±2.33
d) ±2.575

7. The point estimate is a single value estimate for a sample statistic.

a) True
b) False

8. The most unbiased point estimate of the population mean μ is the sample mean.

a) True
b) False

9. Which of the following statements describe the properties of a t-distribution?

a) The t-distribution is bell-shaped and symmetric about the mean.
b) The total area under a t-curve is 1 or 100%.
c) The mean, median, and mode of the t-distribution are equal to zero.
d) As the degrees of freedom decrease, the t-distribution approaches the normal distribution.

10. The critical value, tc, for c = 0.99 and n = 10 is:

a) 1.833
b) 2.262
c) 2.281
d) 3.250

May 2, 2009 - Homework, Chapter 8

Complete the following 4 questions in your work groups and post as a team. *** I realize these questions are extremely difficult - and to that end, Dustin is in the LRC (Stats help) and you are encouraged to work with him for additional help. I am also distributing PowerPoint notes to help.

Complete the following exercises from your textbook
Elementary Statistics—Picturing the World:

1. Section 8.1, Exercise #4, p. 409
2. Section 8.1, Exercise #18, p. 411
3. Section 8.2, Exercise #14, p. 421
4. Section 8.2, Exercise #20, p. 422

May 2, 2009 - Final Project Outline (optional)

For those of you interested in working on a final project, the following is an example outline . The final project must be in a "PowerPoint" or MS Word format.

Steps in Statistical Stud<span style="font-weight: bold;">Des</span> Project Outline (a guide)

I. Identifying the "subject" of your study

What is the question? (What are my hypotheses?)- 1 slide
Is the data obtainable? (birth weight, socio economic, drugs, alcohol)- 1 slide
Is it ethical to obtain such data?
If not, is there a reasonable substitute?
Are the assumptions reasonable?- 1 slide

II. Designing

Identify the population of interest- 1 slide
Survey- several slides (how would you design the survey, you do not have to actually do the survey) * * * although, for extra credit (5 points) you can do a survey
- Obtain a representative sample of that population- 1 slide
  - Simple Random Sampling
  - Stratified Sampling (M-F, Age groups)
  - Systematic Sampling (class roster, census list)
  - Multi-Stage Sampling
- Sources of Bias- 1-2 slides
  - Voluntary Response
  - Non-response bias (day phone)
  - Response bias (people lie)
  - Undercoverage
Observational Studies- 1-3 slides
- Used when a designed experiment is not ethical
- Subjects studied over a period of time in natural setting
- Case/Control – Control must match
- Record Variables of interest
- Confounding is a major issue
Designing an Experiment1-5 slides
- Researcher has control over the subjects or units in the study
- An intervention takes place that otherwise would not occur
- Randomization used to assign treatments
- Strongest case for causality
EDA – Exploratory Data Analysis (trends, relationships, differences) - optional, 1 slide
Pilot Study

III. Collecting Data 1-3 slides

Identify variables
Identify types of variables
- Qualitative
- Quantitative
Identify Limits of measurement or observation

IV. Analyze the data 1-3 slides

Use proper procedures and techniques.
Check the assumptions behind the procedures and techniques.

V. Make Conclusions and Discuss Limitations 1-3 slides

What are the answers to the original hypotheses?
What are the limitations of the study?
What conclusions does the study not make?
What new questions arise from this study?

May 2, 2009 - Homework

Next week - please read Chapter 9.

CORRELATION AND
REGRESSION

May 2, 2009 - Summary

Class Summary:

Topics covered continue the previous discussion on hypothesis testing presented in
Unit 8 in which we learn to test a claim about a population mean and the difference of
means between two populations.

We learned to identify when they can conduct a z-test or a t-test for large and
small sample sizes and make a decision based on testing results.

May 2, 2009 - Class Notes

A. Testing the Difference Between Means (Large Independent Samples)

A null hypothesis, H0, is a statistical hypothesis that usually states that there is no
difference between the parameters of two populations. The null hypothesis always contains
the symbol ≤, =, or ≥.

An alternative hypothesis, Ha, is a statistical hypothesis that is true when H0 is false. The
alternative hypothesis always contains the symbol >, ≠, or <.

The Central Limit Theorem states that the difference of the sample means is normally
distributed when the following conditions are satisfied:
• The samples are randomly selected.
• The samples are independent.
• Each sample size is at least 30, or, if not, each population has a normal distribution
with a known standard deviation σ.
These three conditions are often called the assumptions of the statistical test.
When the difference of sample means is normally distributed:

B. Testing the Difference Between Means (Small Independent Samples)
When small samples—n < 30—are used and the population standard deviation is unknown,
the Central Limit Theorem does not apply. In this case, you can use a t-test to test the
difference between two population means μ1 and μ2 if the following conditions are met:

• The samples are randomly selected.
• The samples are independent.
• Each population has a normal distribution.

May 2, 2009 - HYPOTHESIS TESTING WITH TWO SAMPLES

Key Objectives for the class:

A. Testing the Difference Between Means (Large Independent Samples)
a) Defining null and alternative hypothesis
b) The Central Limit Theorem
c) Guidelines for a two-sample z-test for the difference between means

B. Testing the Difference Between Means (Small Independent Samples)
a) Guidelines for a two-sample t-test for the difference between means

May 2, 2009 - PowerPoint Notes

I will be distributing a CD ROM disc with copies of my notes with PowerPoint presentation for final exam/or project study purposes. The PowerPoints are an excellent review.

Friday, April 24, 2009

April 25, 2009 - In-Class Project, "21"

In class project review, and "21".

Think about how you could use statistics in your career, personal life, et al -- we will discuss how statistics can play an important role in the decision making process regarding your career, financial future (after you graduate from ITT), or in your own business.

April 25, 2009 - Final Exam sample questions to review (do not answer)

Examples of final exam questions to review: (*** please note, you can answer these questions, for extra credit - in your work groups -- if you have extra time)

The salaries of employees in a government agency can be classified as:
a) Quantitative data
b) Qualitative data

In a recent study of 400 randomly selected adults, the researchers found there is a
relationship between smoking cigarettes and developing emphysema." Which type of
statistics does the statement describe?
a) Inferential statistics
b) Descriptive statistics

What is the level of measurement for the data that can be classified according to color?
a) Ordinal
b) Nominal
c) Interval
d) Ratio

If the null hypothesis is rejected when in fact it is true, which type of error has been
committed?
a) Type I error
b) Type II error

Which of the following values represents a correlation coefficient?
a) –1.45
b) 0.0001
c) 1.05
d) 100

What is the type of correlation for the regression line yˆ = 5.12x − 0.3?
a) Positive linear correlation
b) Negative linear correlation

Assume that the variables, x and y, have a significant correlation. Given that the equation of
a regression line is y = –2x + 10, what is the best predicted value for y if x = 3?
a) –10
b) 4
c) 5
d) 10

April 25, 2009 - Chapter 7 Group Questions, quick multiple choice

1. A probability distribution curve is always bell-shaped and symmetric about the mean.
a) True
b) False

2. A z-score indicates a position of probability value under the normal curve.
a) True
b) False

3. The cumulative area for z = 0 is 0.5000.
a) True
b) False

4. Which of the following describes the properties of a normal distribution?
a) Mean, median, and mode are equal.
b) The normal curve is bell-shaped.
c) The normal curve is symmetric about the mean.
d) The normal curve touches the x-axis.

5. For any population distribution and any sample size n, the mean of the sampling
distribution of sample means is equal to the population mean.
a) True
b) False

6. According to the Central Limit Theorem, as the sample size increases, n ≥ 30, the
sampling distribution of sample means gets closer to a normal distribution.
a) True
b) False

7. The area under the standard normal curve between z = 0 and z = 3 is:
a) 0.0010
b) 0.4987
c) 0.9987
d) 1.0000

8. One of the properties of sampling distributions of sample means states that the mean of
the sample means, μ x is equal to the population mean.
a) True
b) False

9. The standard deviation of the sampling distribution of the sample means is called:
a) Standard error of the mean
b) Margin of error
c) Sampling error
d) Standard error of the median

April 25, 2009 - Homework, Reading

Next week, please review Chapter 8, HYPOTHESIS TESTING WITH
TWO SAMPLES.

April 25, 2009 - Summary

In this unit, we introduced to the basic concepts and techniques for hypothesis testing
and for identifying type I and type II errors.

We learned to conduct hypothesis tests for large samples and certain small samples and to
make a decision on a claim about a population parameter μ.

April 25, 2009 - Homework I

Please complete the following questions, in your groups:

1. Section 7.1, Exercise #10, p. 343
2. Section 7.1, Exercise #38, p. 344
3. Section 7.1, Exercise #48, p. 345
4. Section 7.2, Exercise #30, p. 359
5. Section 7.3, Exercise #20, p. 371

April 25, 2009 - Quick T/F in class quiz (in class work groups)

1. In a hypothesis test, you assume that the alternative hypothesis is true.

2. Statistical hypotheses are statements about the sample.

3. A type I error is committed when you fail to reject a null hypothesis when it is false.

4. If you want to support a claim, write it as a null hypothesis.

5. When using a P-value to make a conclusion in a hypothesis test with α as significance
level and P ≤ α , you should fail to reject H0.

6. When conducting z-test for mean μ, you should reject the null hypothesis if the P-value
falls within the rejection region.

7. The lower the P-value, the more evidence there is in favor of rejecting H0.

8. The degrees of freedom for a t-test, when n < 30, is equal to the sample size.

April 25, 2009 - Class Notes

A. Introduction to Hypothesis Testing

A hypothesis test is a process that uses sample statistics to test a claim about the value of a
population parameter.

A statistical hypothesis is a verbal statement, or claim, about a population parameter.
A null hypothesis H0 is a claim that contains a statement of equality such as ≤, =, or ≥.
An alternative hypothesis Ha is the complement of the null hypothesis. It is a statement that
must be true if H0 is false and the statement contains a statement of inequality such as >, ≠, or <.

A type I error occurs if the null hypothesis is rejected when it is true. For example, the null
hypothesis H0 claims that the new allergy drug lasts for 36 hours. The decision made from a
hypothesis testing is to reject H0, although H0 is true.

A type II error occurs if the null hypothesis is not rejected when it is false. In the given example,
the pharmaceutical company claims that the new allergy drug lasts for 36 hours. If the hypothesis
testing failed to reject the claim, although the actual truth is that the new drug does not last for 36
hours, we are making a type II error.
If the alternative hypothesis contains a less-than inequality symbol (<), the hypothesis test is a
left-tailed test. It can be mathematically expressed as:
H0: μ ≥ k
Ha: μ < k
If the alternative hypothesis contains a greater-than symbol (>), the hypothesis test is a righttailed
test. It can be mathematically expressed as:
H0: μ ≤ k
Ha: μ > k

If the alternative hypothesis contains the not-equal-to symbol (≠), the hypothesis test is a twotailed
test. In a two-tailed test, each tail has an area of one-half P. The hypotheses can be
mathematically expressed as:
H0: μ = k
Ha: μ ≠ k
B. Hypothesis Testing for the Mean (Large Samples)
The P-value of a hypothesis test is the probability of obtaining the sample statistic with a value as
extreme—or one that is more extreme—than the value obtained from the sample data. We reject
the null hypothesis if the P-value is less than the level of significance.
A rejection region, or critical region, of the sampling distribution is the range of values for
which the null hypothesis is not probable. If a test statistic falls in this region, the null hypothesis
is rejected.
A critical value z0 separates the rejection region from the no-rejection region.
C. Hypothesis Testing for the Mean (Small Samples)
When a sample size n is less than 30 and the random variable x is normally distributed, x follows
a t-distribution with n – 1 degrees of freedom.

April 25, 2009 - Key Concepts

KEY CONCEPTS

A. Introduction to Hypothesis Testing
a) Defining hypothesis testing
o Null and alternate hypothesis
o Type I and type II errors
b) Types of alternate hypothesis
o Left-tailed test
o Right-tailed test
o Two-tailed test

B. Hypothesis Testing for the Mean (Large Samples)
a) Hypothesis testing for large samples using the P-value
b) Hypothesis testing for large samples using rejection region
C. Hypothesis Testing for the Mean (Small Samples)
a) Hypothesis testing for small samples using t-test

Tuesday, April 14, 2009

April 18, 2009 - Summary (2nd Half of Class), & Homework Assignment

SUMMARY

In this chapter, we studied inferential statistics, finding a point estimate of
the mean and margin of error. We learned to construct and interpret confidence intervals for
the population mean for both large and small samples.

NEXT WEEK: HYPOTHESIS TESTING WITH ONE SAMPLE

Homework: Read Chapters 7 & 8 in the textbook.

April 18, 2009 - Homework, Part II

Complete the following homework activity and post as a group:

From your textbook
Elementary Statistics—Picturing the World:

1. Section 6.1, Exercise #24, p. 288

2. Section 6.1, Exercise #34, p. 288

3. Section 6.2, Exercise #8, p. 300

4. Section 6.2, Exercise #12, p. 300

April 18, 2009 - Quick T/F Group Activity

In your work groups, complete the following T/F questions and post as a group:

1. The sample mean x is a reliable point estimate of the population mean μ.

2. We can always estimate a population parameter using sample statistics, regardless of the
sample size or the type of population distribution.

3. The sample size is considered large when it reaches 100.

4. The degrees of freedom are equal to the sample size if the sample size is small.

5. A t-distribution is used when random variable is normally distributed and sample size is less
than 30.

6. The larger the sample size (n ≥ 30), the more skewed is the t-distribution.

April 18, 2009 - Class Notes (2nd Half of Class)

“Until now, you have focused on the first branch of statistics—descriptive statistics and probability.
You have learned to describe and graph data, calculate probabilities, and use properties of normal
distributions. In this unit, you will learn about inferential statistics.
This chapter will focus on how to estimate a population parameter and state how confident you are
about your estimate.”

* * * * * * *
A. Confidence Intervals for the Mean (Large Samples)

A point estimate is a single value estimate for a population parameter.
An interval estimate is an interval, or range of values, that is used to estimate the population
parameter.

The level of confidence c is the probability that the interval estimate contains the population
parameter. It states how confident we are that the interval estimate contains the population
parameter.

The difference between the point estimate and the actual population parameter value is called the sampling error and is denoted by x − μ .
Given a level of confidence, the greatest possible sampling error is called the margin of error. It is, sometimes, also called the maximum error of estimate or error tolerance and is denoted by E.
When the population standard deviation is known, E can be calculated using the formula:

n
E zc x zc
σ
= σ = .
A c-confidence interval for the population mean μ is written as:
x − E < μ < x + E

Here, c is the probability that the confidence interval contains μ.
The steps for determining the confidence interval for a population mean, when the sample size
n ≥ 30 or the sample comes from a normally distributed population, can be listed as follows:

Step 1: Find the sample statistics.
Step 2: Calculate standard deviation, s.
Step 3: Find the critical values.
Step 4: Calculate the margin of error.
Step 5: Form the confidence interval x − E < μ < x + E .

B. Confidence Intervals for the Mean (Small Samples)

A t-distribution is used when a sample size is less than 30, and the random variable x is
approximately normally distributed. The properties of t-distribution are as follows:

1. It is bell-shaped and symmetric about the mean.
2. The mean, median, and mode of the t-distribution are equal to zero.
3. The total area under a t-curve is 1, or 100%.
4. The t-distribution is a family of curves, each determined by a parameter called the degrees
of freedom, also referred to as d.f. The degrees of freedom are the number of free choices
left after a sample statistic such as x is calculated. When you use a t-distribution to estimate
a population mean, the degrees of freedom are equal to one less than the sample size.
d.f. = n – 1
5. As the degrees of freedom increase, the t-distribution approaches the normal distribution.
After 30 degrees of freedom, the t-distribution is very close to the standard normal z distribution.

Constructing a confidence interval using the t-distribution involves using a point estimate and a
margin of error. The following steps can be used for constructing a confidence interval for the mean of t-distribution.

Step 1: Assuming that the sample comes from a normally distributed population, identify the
sample statistics n, x , and variance s. Use the formulas
n
x
x Σ = and
1
( )2
−
−
= Σ
n
x x
s .

Step 2: Identify the degrees of freedom, the level of confidence c, and the critical value tc using the t-distribution table. Remember, d.f. = n – 1.

Step 3: Find the margin of error E using the formula
n
s
E = tc .

Step 4: Find the left and right endpoints and form the confidence interval. Use the following
formulas:
• Left endpoints: x − E
• Right endpoints: x + E
• Interval: x − E < μ < x + E

April 18, 2009 - CONFIDENCE INTERVALS (Chapter 6)

KEY CONCEPTS TO REMEMBER:

A. Confidence Intervals for the Mean (Large Samples)

• Point and interval estimate
• Level of confidence
• Sampling error
• Margin of error
• Confidence interval

B. Confidence Intervals for the Mean (Small Samples)
• t-Distribution
• Properties of t-distribution

April 18, 2009 - Unit Summary

SUMMARY

Key Points to remember:

• z-score

• Area under the standard normal curve

• Probabilities for normally distributed variables

• Probability for given data values
In this unit, the students learn how to interpret the most fundamental concept in inferential
statistics—the Central Limit Theorem.

April 18, 2009 - Homework Assignment 1

In your class work groups (the groups you have been working in over the last few classes) - complete the following homework assignment and post as a group:
Elementary Statistics⎯Picturing the World:
1. Section 5.1, Exercise #26, p. 225

2. Section 5.1, Exercise #60, p. 228

3. Section 5.2, Exercise #12, p. 232

4. Section 5.3, Exercise #44, p. 244

5. Section 5.4, Exercise #18, p. 255; do not sketch
the graph

April 18, 2009 - Class Notes

Class Lecture Notes for April 18, 2009 (talking points)

A. Introduction to Normal Distributions

A continuous probability distribution is the probability distribution of a continuous random variable.

A normal distribution is a continuous probability distribution describing the behavior of a normal
random variable. A normal probability distribution has a graph that is symmetric and bell shaped. Its mean, median, and mode are equal and determine the axis of symmetry. The graph of a normal distribution is defined for all numbers on the real number line. As the random variable x moves further and further from the mean—in either direction—the graph of the normal distributions approaches butnever touches the x-axis.

Between the points x = μ – σ and x = μ + σ, the graph is curved downward. To the left of x = μ – σ and to the right of x = μ + σ, the graph is curved upward. The points x = μ – σ and x
= μ + σ are called inflection points.

The normal curve, or the bell-curve, is the graph of a normal distribution.
Properties of a normal distribution can be listed as follows:
• The mean, median, and mode are equal.
• The normal curve is bell-shaped and symmetric about the mean, μ.
• The total area under the curve is equal to 1.
• The normal curve approaches the x-axis but never touches the axis as it extends further and
further away from the mean.
• At the center of the curve, between (μ − σ) and (μ + σ), the graph curves downward. The graph
curves upward to the left of (μ − σ) and to the right of (μ + σ).

Properties of a standard normal distribution can be listed as follows:
• The standard normal curve is bell shaped and symmetric about 0.
• The total area under the curve is equal to 1.
• The standard normal curve approaches the x-axis but never touches the axis as it extends
further and further away from the mean.
• At the center of the curve, between –1 and 1, the graph curves downward. The graph
curves upward to the left of –1 and to the right of 1.
• The cumulative area is close to 0 for z-scores close to z = −3.49.
• The cumulative area increases as the z-scores increase, but it never exceeds 1.
• The cumulative area for z = 0 is 0.5000.
• The cumulative area is close to 1 for z-scores close to z = 3.49.

B. Normal Distributions: Finding Probabilities
The probability of a normally distributed random variable x can be calculated using the following
guidelines:

Step 1: Find the x-values of the upper and lower bounds of the given interval.
Step 2: Convert the x-values to z-scores using the formula:
Step 3: Sketch the standard normal curve and shade the appropriate area under the curve.
Step 4: Find the area by following the same directions as given in the table for the standard normal probability distribution.

C. Normal Distributions: Finding Values of the Random Variable x, Given the Standard Normal
Random Variable z
Find variable x-values within areas under the standard normal curve:
Step 1: Determine the position of the area corresponding to the given probability.
Step 2: Find the corresponding z-scores for the area using the standard normal distribution table.
Here, you may have two cases:

• Area to the left of z
• Area to the right of z
Step 3: Transform the z-score to an x-value, using the formula: x = μ + zσ
D. Sampling Distributions and the Central Limit Theorem
A sampling distribution is the probability distribution of a sample statistic that is formed when
samples of size n are repeatedly taken from a population.
If the sample statistic is the sample mean, then the distribution is the sampling distribution of sample
means.
The properties of sampling distributions of sample means can be listed as follows:
1. The mean of the sample means μ x is equal to the population mean.
μ x = μ
2. The standard deviation of the sample means σ x is calculated by dividing the population
standard deviation σ by the square root of n—the sample size.
n
x
σ
σ =
σ x is also known as the standard error of the mean.

The Central Limit Theorem is an important concept in inferential statistics. It enables you to make inferences about a population mean based upon sample statistics.

Saturday, April 11, 2009

April 11, 2009 - Reading Assignment

Please have read through chapters 5 & 6 by April 18, 2009.

April 11, 2009 - WWW Links

Links that may be of interest:

http://www.bea.gov/

http://www.fedstats.gov/

http://www.commerce.gov/

http://www.census.gov/

http://www.dol.gov/

Friday, April 10, 2009

April 11, 2009 - Normal Probability Distributions

NORMAL PROBABILITY DISTRIBUTIONS

KEY CONCEPTS

A. Introduction to Normal Probability Distributions
a) Continuous probability distribution
o Normal distribution
o Properties of a normal distribution
o Standard normal distribution
o Properties of a standard normal distribution

B. Normal Distributions: Finding Probabilities
a) Probability of a random variable

C. Normal Probabilities: Finding Values
b) Find variable x-values within areas under the standard normal curve

D. Sampling Distributions and the Central Limit Theorem
a) Defining sampling distribution
b) Properties of sampling distributions of sample means
c) Central Limit Theorem

April 11, 2009 - Homework, Part II

Complete the following exercises from your textbook
Elementary Statistics—Picturing the World (and post them on the blog):

1. Section 4.1, Exercise #12, p. 179

2. Section 4.1, Exercise #24, p. 181

3. Section 4.2, Exercise #8, p. 194

4. Section 4.2, Exercise #10, p. 194

April 11, 2009 - Assignment, Homework, The Poll

Assume the following facts:

A political polling organization conducted a survey. As a part of the survey, the organization
calls 1,012 people and asks, “Do you approve, disapprove, or have no opinion of the way the president is handling his job?”

The random variable x represents the number of people who approve of the way the president is
handling his job.

* * * * * *

1. Is the given experiment a
binomial experiment?

2. Assume that the polling question
is revised to: “Do you approve or disapprove
of the way the president is
handling his job?”

The random variable x represents
the number of people who
approve of the way the president
is handling his job.

Now, this question represents a
binomial experiment. Determine
which of the following outcomes
will denote “success” for this
experiment.

• No opinion
• Approve
• Disapprove

April 11, 2009 - Quick "T & F" Quiz

True or False

1. The expected value of a discrete random variable is equal to the mean of the random variable.

2. Continuous random variables represent countable data, and discrete random variables represent
uncountable data.

3. It is possible for the sum of all probabilities of a random variable to exceed 1.

4. A binomial experiment is repeated for a fixed number of trials, and each trial is dependent on
the other trials.

5. There are only two possible outcomes—success and failure—from a binomial experiment.

April 11, 2009 - Quick "WWW" excercise

The following link leads to an animation that performs the parking lot simulation described in the text.

Experiment with the animation.
http://media.pearsoncmg.com/ph/esm/esm_larson_statlet_questions_2e/Pick_a_Lane_Statlet/pick.html

In your opinion, which strategy—“Pick a Row” or “Cycling”—saves most time? After comparing the time to walk and drive, which strategy are you likely to choose.

Explain your answer.

April 11, 2009 - Lecture Notes

Certain applications, such as those used for weather forecasting and space research, require the
collection and analysis of large amounts of data.

For such applications, data is often collected using probability experiments and the outcome of these experiments is organized to form probability distributions. The shape, central tendency, and variability of probability distributions help analyzers
find patterns in the data set and make predictions and decisions.

You will be introduced to discrete probability distributions—a specific type of probability
distribution. Continuous probability distributions will be covered later in the course

* * * * * * * *

A. Probability Distributions

A random variable, x, represents a numerical value associated with each outcome of a probability experiment.

A discrete random variable is a random variable with countable possible outcomes that can be listed. A continuous random variable is a random variable with an uncountable number of possible outcomes represented by an interval on the number line.

B. Binomial Distributions
Binomial experiments produce only two outcomes per trial, often called Success S and Failure F.
Examples include the possible outcomes when flipping a coin or answering a question with two answer options.

A binomial experiment is a discrete probability experiment that must satisfy the four conditions given below:

Condition 1: The experiment is repeated for a fixed number of trials, n. For example, a coin is
flipped 10 times. Each trial is independent of the other trials.

Condition 2: There are only two possible outcomes of interest for each trial. One of these outcomes is classified as a success (S) and the other as a failure (F). For example, flipping a coin has two possible outcomes—heads or tails. You may consider the occurrence of heads as a success and tails as a failure.

Condition 3: The probability of a success P(S) and the probability of a failure P(F) is the same for
each trial. For example, the probability of getting a six when tossing a fair dice is 1 6
and the probability of not getting a six is 5 6 and these probabilities are the same regardless of how many times the dice is thrown.

Condition 4: The random variable x counts the number of successful trials in the total number of
trials, n: x = 0, 1, 2, 3, …, n. If six is the result two times in 10 flips, x = 2.

April 10, 2009 - DISCRETE PROBABILITY DISTRIBUTIONS

KEY CONCEPTS

A. Probability Distributions

a) Defining random variable
b) Types of random variable
o Discrete random variable
o Continuous random variable
c) Discrete probability distribution

B. Binomial Distributions

a) Defining binomial experiment
b) Conditions for binomial experiment

Saturday, April 4, 2009

April 4, 2009 - Homework Reading

Please review Chapter 4 for next week.

Friday, April 3, 2009

April 4, 2009 - Homework, Probability

Complete the following exercises from your textbook
Elementary Statistics—Picturing the World:

1. Section 3.1, Exercise #14, p. 125

2. Section 3.1, Exercise #20, p. 126

3. Section 3.2, Exercise #16, p. 136

4. Section 3.3, Exercise #18, p. 146

April 4, 2009 - Probability & Cards

A standard deck of cards contains a total of 52 cards. Each of the four suits—Spade, Heart,
Diamond, and Club—contains 13 cards, 10 of them numbered from 2 to 10 and one each of an
A(ace), a J(jack), a Q(queen), and a K(king). The two jokers are excluded.

The probability of selecting a
card from the standard deck
and drawing a Queen of
Hearts is:
• 0.5
• 0.0192
• 1.0

Answer: 0.0192

The probability of drawing a
Queen from the deck is:
• 1
• 0
• 0.45
• 0.0769

Answer: 0.0769

What’s the probability of not
selecting a Queen from the
standard deck of cards?
• 0.9231
• 0.222
• 0.0769
• 0.126

Answer: 0.9231

Tip:
The key here is to know that
not selecting a Queen is the
complement of selecting a
Queen. In addition, from
Problem 2, we know that the
probability of selecting a
Queen is 0.0769.
Therefore, this problem can
be solved using the:
1. Probability of
selecting a Queen
2. Formula for finding
the probability of the
complement event,
P(E’) = 1 – P(E)

April 4, 2009 - True or False Questions

True, False, or Subjective Probability

1. There is a 200% chance of thunderstorm tonight.
2. Classical probability of an event is the relative frequency of the event.
3. The complement of voting for Democratic Party is voting for other parties plus not voting
for any party.
4. John expects a very high chance of winning the poker game. What type of probability is it?
5. Conditional probability is the probability of a single event.
6. If events A and B are dependent, then P(A and B) = P(A) · P(B).
7. If two events are mutually exclusive, they have no outcomes in common.
8. If two events are independent, then they are also mutually exclusive.
9. The addition rule is used to find the probability of at least one of the two events occurring.

April 4, 2009 - Probability (www) Interactive

The following link leads to an animation that illustrates the additive and multiplicative laws of
probability. The amount of flow through each pipe represents probability.

http://media.pearsoncmg.com/ph/esm/esm_larson_statlet_questions_2e/Probability_Statlet/probability.html

April 4, 2009 - Class Notes

A. Basic Concepts of Probability
a) Defining probability
b) Probability experiment
c) Types of probability
d) Fundamental concepts related to probability
o The law of large numbers
o Range of probabilities rule
o Subjective probability
o Complement of an event E
B. Conditional Probability and the Multiplication Rule
a) Defining conditional probability
b) Defining multiplication rule
C. The Addition Rule
a) Mutually exclusive events

In units 2 and 3, you learned about collecting and describing data. In this unit, you will
strengthen your foundation in descriptive statistics by learning the concepts of probability.
In this unit, you will learn about various types of probability and the method of calculating
probability using various rules.

A. Basic Concepts of Probability

Probability refers to the likelihood of the occurrence of uncertain events.
A probability experiment is a trial through which specific results or outcomes are obtained.
An event consists of one or more outcomes and is a subset of the sample space. A simple event
has a single outcome, such as rolling a dice and obtaining 4.
Classical or theoretical probability refers to the type of probability when each outcome in a
sample space is equally likely to occur.
The classical probability of occurrence of an event E is given by:
Empirical or statistical probability is based on observations obtained from probability
experiments.
The empirical probability that an event E will occur is given by:
n
f
P(E) =
where,
f is the frequency of the event E occurring.
n is the total frequency of the experiment. n is sometimes denoted as Σf.
The law of large numbers
According to the law of large numbers, if an experiment is performed repeatedly, the empirical
Number of outcomes in an event
( )=
Total number of outcomes in a sample space
P E

probability of an event will be close to its theoretical or actual probability.
Range of probabilities rule
According to this rule, the probability of an event E is always between 0 and 1. Mathematically, it
is expressed as: 0 ≤ P(E) ≤ 1
Subjective probability
It describes an individual's personal judgment about the likelihood of the occurrence of an event.
It is based on estimates, intuition, and educated guess.
Complement of an event E
It refers to the set of all outcomes in a sample space that are not included in an event E. It is
denoted as E’—pronounced E prime. The probability of the complement of an event E is
calculated as follows:
P(E’) = 1 – P(E)
B. Conditional Probability and the Multiplication Rule
Conditional probability is the probability of an event B occurring, given that another event A
has already occurred. It is denoted by P(B|A).
Independent events do not affect the probability of occurrence of another event. For example,
getting a 2 after rolling a dice and getting a 2 on the next roll are independent events.
When two events A and B are independent, then:
P(B|A) = P(B) and P(A|B) = P(A)
In other words, if two events are independent, then
P(Aand B)=P(A)iP(B)
Dependent events are not independent.
The multiplication rule is used to determine the probability of the occurrence of two events A
and B in sequence.
The formula for multiplication rule is represented as follows:
P(A and B) = P(A) · P(B|A). However, if the events are independent, this formula reduces to
P(Aand B) =P(A)iP(B) .
C. Addition Rule
Two events are mutually exclusive if they have no outcomes in common. In other words, when
events are mutually exclusive, they cannot occur at the same time.
The addition rule is used to find the probability of occurrence of event A or B. Mathematically,
the addition rule is represented as follows:
P(A or B) = P(A) + P(B) – P(A and B), where P(A and B) is the probability of events A and B
occurring at the same time. If the events A and B are mutually exclusive P(A and B) = 0, and the
formula reduces to P(A or B) = P(A) + P(B).

Saturday, March 28, 2009

March 28, 2009 - Class Notes

A. Measures of Central Tendency

Terms such as “the most common” or “average” used in regular vocabulary refer to the typical or middle value of a data set. In descriptive statistics, this value is called a measure of central
tendency.

The three measures used most commonly to describe central tendency are mean, median, and
mode.

Mean (also called arithmetic average): The sum of the data entries divided by the number of
entries.

Median: The middle value of an ordered data set.

Outlier: A data entry that is “very different” from the other entries in the data set.

Mode: The data value that occurs most frequently in a data set.

While explaining mode, pay attention to the two special cases:
• No repeat entry
• Two entries that occur with the same highest frequency
Weighted mean: It calculates the mean of a data set by taking into consideration the weight
assigned to each data entry.

If in a frequency distribution graph, the mean, median, and mode are equal and located on the
same value of the x-axis, the distribution is symmetric.

A distribution in which the mean, median, and mode are unequal is called a skewed distribution.

A distribution where the graph has a tail stretching to the left is called skewed left. In this
distribution, mean < median < mode. If the graph of the distribution has a tail stretching to the
right, the distribution is called skewed right. In this distribution, mode < median < mean.
Outliers can create a skewed distribution.

B. Measures of Variation

Range: The difference between the largest and the smallest data entries.

Deviation: The difference between a data entry x in a population and the population mean μ, or
the difference between a data entry x in a sample and the sample mean x .

Variance: A measure of the deviation of the population data set or sample data set from its
mean. Population variance is represented using the symbol σ2—pronounced sigma square.

March 28, 2009 - Descriptive Statistics (Part II)

Outline

A. Measures of Central Tendency
o Mean
o Median
o Mode
B. Measures of Variation
o Variance
o Standard deviation
C. Measures of Position
o Percentiles

March 28, 2009 - Take Home Quiz Assignment

You will have 45 minutes in class to complete the following assignment/quiz (11:45 to 12:30).

Title: Frequency Distributions and Their Graphs

Introduction: This set of exercises will help you read and construct
a frequency distribution and organize data using a graph.

Tasks:

Complete the following exercises from your textbook
Elementary Statistics—Picturing the World:

1. Section 2.1, Exercise #10, p. 43

2. Section 2.1, Exercise #24, p. 45

3. Section 2.2, Exercise #20, p. 58

GOOD LUCK!