Quartiles, boxplots, percentiles, and z-scores

Quartiles

In order to describe a data set without listing all the data, we have measures of location such as the mean and median, measures of spread such as the range and standard deviation, and descriptions of shape such as symmetric, skewed, unimodal, and bimodal. We can also get a good sense of the distribution of a set of data with five carefully chosen measures of location. We supplement the median, minimum, and maximum with the first and third quartile, which indicate the extent to which the data lies near the median, or near the extremes.

There are many definitions for calculating the first and third quartiles, which definitions do not all give the same results. Heuristically, one fourth of the data lies below the first quartile (hence three-quarters above it). Similarly, three quarters of the data lies below the third quartile (hence one quarter above it). The first and third quartiles are the medians of the lower half and upper half of the data, but whether or not you include the median when there are an odd number of data is one reason definitions vary. The second quartile is by definition the median.

Note that a quartile is a number or cutoff, and not a range of values. One may be above or below the first quartile, but not in the first quartile.

The five number summary, i.e., the minimum, Q1, Q2 (median), Q3, and maximum, give a goo indication of where data lie. For the data set of weights The five number summary is: 105,130, 155, 175, 235. One know immediately that half the data is below 155, half is above 155, and alternatively half is between 130 and 175.

The five number summary is sometimes represented graphically as a (box-and-)whisker plot. The first and third quartiles are at the ends of the box, the median is indicated with a vertical line in the box, and the maximum and minimum are at the ends of the whiskers. A boxplot for the weights is depicted below.

Exercise: How is a boxplot similar to a histogram? How is it different?

Percentiles

Percentiles are like quartiles, except that they divide the data set into 100 equal parts instead of four equal parts (similarly, there are quintiles and deciles and ...). Percentiles are useful for giving the relative standing of an individual in a population, they are essentilaaly the rank position of an idividual. As with quartiles, there are definitions which vary slightly specifying how to calculate percentiles. One definition is the fraction of the population which is less than the specified value. If one wants to compare someone who graduted 37th out of a class of 250 with someone who graduated 12th in a class of 60, one can calculate 213/250 = .852 which is rounded down to the 85th percentile (percentiles measure position from the bottom, 37 from the top means that 213 are below it in a population of 250); similarly 48/60 = .80 or the 80th percentile. Therefore, being 37th out of 250 puts one at the 85th percentile, which is better than 12th out of 60 which is only at the 80th percentile.

z-scores

Another way to compare individuals in different populations is with z-scores. If mu is the mean of a population s is the standard deviation, the z-score of a value x is (x-mu)/s (note that z-scores may be positive or negative). A standard example for demonstrating the utility of z-scores is comparing a score on the ACT tests with a score on the SAT tests. Originally, SAT tests had a mean score of 500 with a standard deviation of 100, while ACT tests had a mean score of 18 with a standard deviation of 6 (these are no longer the means and standard deviations for thosae tests). Hence one could compare 680 on the SAT with 25 on the ACT. The respective z-scores are (680-500)/100 = 1.8 and (25-18)/6 = 1.17. Therefore 680 on the SAT is a better score than 25 on the ACT (assuming equal quality among the students who took the two tests).

Z-scores measure how outstanding an individual is relative to the standard deviation for that population. Note that percentiles use the median as the average (50th percentile), while z-scores use the mean as average (z-score of 0). Competencies: For the data set {2 5 9 4 6 7 6 8 8}, calculate the quartiles and 5-number summary.

For the class weightsfind the percentile and z-score of the 168 pound individual.

Reflection: When are z-scores versus percentiles a better measure of relative standing?

Challenge:

May 2002