Monday, 27 September 2010
boxplot data set sizes
there are complications involved in sorting out where the 1st and 3rd quartiles lie for fairly small data sets
most sources simplify matters...
and anyway, for discrete distributions, there is no convention for selecting the quartile values
when choosing a size for a data set it is easiest to have a total of the form 4n + 3 with all the quartiles being actual numbers in the data set (ignore the median value to locate the remaining quartiles...)
a data set size of the form 4n + 2 has the median between the middle two numbers and then (for simplicity) the other quartiles are in the data set
for a size 4n + 1 data set, the 1st and 3rd quartiles are half-way between numbers and the median is in the data set
for a size 4n data set, all of the quartiles are half-way between data set values
this covers all possible data set sizes
for smallish data set sizes I usually suggest that students draw a set of tick marks so that they can see where the quartiles are located
however, these are simplifications
there are some rules that make sense but seem to complicate matters
for real data analysis, people do not deal with such small data set sizes...