Saturday, March 28, 2009

March 28, 2009 - Class Notes

A. Measures of Central Tendency

Terms such as “the most common” or “average” used in regular vocabulary refer to the typical or middle value of a data set. In descriptive statistics, this value is called a measure of central
tendency.

The three measures used most commonly to describe central tendency are mean, median, and
mode.

Mean (also called arithmetic average): The sum of the data entries divided by the number of
entries.

Median: The middle value of an ordered data set.

Outlier: A data entry that is “very different” from the other entries in the data set.

Mode: The data value that occurs most frequently in a data set.

While explaining mode, pay attention to the two special cases:
• No repeat entry
• Two entries that occur with the same highest frequency
Weighted mean: It calculates the mean of a data set by taking into consideration the weight
assigned to each data entry.

If in a frequency distribution graph, the mean, median, and mode are equal and located on the
same value of the x-axis, the distribution is symmetric.

A distribution in which the mean, median, and mode are unequal is called a skewed distribution.

A distribution where the graph has a tail stretching to the left is called skewed left. In this
distribution, mean < median < mode. If the graph of the distribution has a tail stretching to the
right, the distribution is called skewed right. In this distribution, mode < median < mean.
Outliers can create a skewed distribution.

B. Measures of Variation

Range: The difference between the largest and the smallest data entries.

Deviation: The difference between a data entry x in a population and the population mean μ, or
the difference between a data entry x in a sample and the sample mean x .

Variance: A measure of the deviation of the population data set or sample data set from its
mean. Population variance is represented using the symbol σ2—pronounced sigma square.

No comments:

Post a Comment