Box Plots

and

Some Related Statistical Ideas

A boxplot, sometimes called a box and whisker plot, is a quick graphic approach for examining one or more sets of data. The image below shows two related boxplots on the same scale. The lower plot, "total", shows a simple boxplot with no outliers, that is, no values sufficiently far from the central part of the data. The upper graph, "total adj", shows a set which includes an outlier. The boxplot below was created with the free online software **StatCrunch 3.0** Try it out here

A boxplot usually displays at least five important pieces of information about a set of data. The median of the data is represented by the line in the center of the rectangular box. In both sets of data this is the bar representing a value of 158 which essentially divides the data into two equal halves. The two ends of the rectangles represent the **upper quartile or Third Quartile or Q3**, located at about 161, and the **lower quartile, or First Quartile or Q1** at 155 . The other two values always shown are the maximum and minimum value of the data set. For more explicit information on how boxplots values are calculated, visit this Hyperstats Link which uses yet another name, **hinge** instead of quartile. Hinge was actually the term used by the inventor of the boxplot, Dr. John Tukey (see below). Quartiles existed as independent measures long before the development of the boxplot. The lines reaching from the sides of the rectangle out to the most remote non-outlier value are sometimes called **whiskers**

The boxplot, and the name, were invented by the American statistician John Tukey in 1977 in his text, __Exploratory Data Analysis. __

The short answer is that Fathom defines the first (or lower) quartile as the 25th percentile; i.e. the value such that 25% of the data values are below it. The third (or upper) quartile is defined in Fathom as the 75th percentile.Other texts choose to find the hinges/quartiles by finding the median of all values above the median for the upper, and the median of all the values below the median for the lower value.

John Tukey defines a box plot in his classic Exploratory Data Analysis, 1977. Tukey's book is interesting but difficult read because he invents so much non-standard terminology. Instead of referring to the upper and lower quartile, he calls them "hinges" and says, "it is natural to find them by counting half-way from each extreme to the median." This implies he would have us keep the median, and, indeed his example shows that he does. Devore and Peck, edition 3, of Statistics, follows this convention.

There is still more confusion in the use of quartiles. Some people speak of the quartiles as the four sets of data created by the median and two hinges, and refer to the values at Q1 and Q3 as the Quartile values or Quartile boundaries. The Mathworld site defines quartile as, "One of the four divisions of observations which have been grouped into four equal-sized sets based on their statistical rank." But the Oxford English Dictionary gives, " The first and third of the three values of a variate which divide a frequency distribution into four groups, each containing one quarter of the total population (the second value of the three, the mean, is sometimes also included); also, any of the four groups so produced. "

From Jeff Miller's website on the first use of math words we find,

The term QUARTILE was introduced by Francis Galton (Hald, p. 604).. More simply stated, the Interquartile range, which is often abbreviated as

Higher and lower quartile are found in 1879 in D. McAlister, The Law of the Geometric Mean, Proc. R. Soc. XXIX, p. 374: "As these two measures, with the mean, divide the curve of facility into four equal parts, I propose to call them the 'higher quartile' and the 'lower quartile' respectively. It will be seen that they correspond to the ill-named 'probable errors' of the ordinary theory" (OED2). McAlister defined octiles on a similar principle.

Upper and lower quartile appear in 1882 in F. Galton, "Report of the Anthropometric Committee," Report of the 51st Meeting of the British Association for the Advancement of Science, 1881, p. 245-260 (David, 1995).

[and in another page he gives]INTERQUARTILE RANGE is found in 1882 in Francis Galton, "Report of the Anthropometric Committee," Report of the 51st Meeting of the British Association for the Advancement of Science, 1881, pp. 245-260: "This gave the upper and lower 'quartile' values, and consequently the 'interquartile' range (which is equal to twice the 'probable error') (OED2).

Quartile was used in astronomy as early as 1585 [OED] to describe to objects which are seperated by 90^{o} of arc. The word is built on the Latin *quartilis* for one-fourth.