## Monday, 27 September 2010

### boxplot data set sizes

there are complications involved in sorting out where the 1st and 3rd quartiles lie for fairly small data sets
most sources simplify matters...
and anyway, for discrete distributions, there is no convention for selecting the quartile values

when choosing a size for a data set it is easiest to have a total of the form 4n + 3 with all the quartiles being actual numbers in the data set (ignore the median value to locate the remaining quartiles...)

a data set size of the form 4n + 2 has the median between the middle two numbers and then (for simplicity) the other quartiles are in the data set

for a size 4n + 1 data set, the 1st and 3rd quartiles are half-way between numbers and the median is in the data set

for a size 4n data set, all of the quartiles are half-way between data set values

this covers all possible data set sizes

for smallish data set sizes I usually suggest that students draw a set of tick marks so that they can see where the quartiles are located

however, these are simplifications
there are some rules that make sense but seem to complicate matters

for real data analysis, people do not deal with such small data set sizes...