DesertPy Meetup Group
28 May 2014
Sarah Braden
PEP: 450
Title: Adding A Statistics Module To The Standard Library
Last-Modified: 16 Mar 2014
Author: Steven D'Aprano
PEP 450 is deceptively simple
Graphing calculators can do this, Python should too
DIY statistics functions are often incorrect, and buggy at best
Don't have to install numpy just to take the standard deviation
def mean(data):
return sum(data)/len(data)
def variance(data):
# Use the Computational Formula for Variance.
n = len(data)
ss = sum(x**2 for x in data) - (sum(data)**2)/n
return ss/(n-1)
def standard_deviation(data):
return math.sqrt(variance(data))
# The above appears to be correct with a casual test:
data = [1, 2, 4, 5, 8]
variance(data)
7
# However adding a constant to each data point should not change the variance:
data = [x+1e12 for x in data]
variance(data)
0.0
# And variance should *never* be negative:
variance(data*100)
-1239429440.1282566
The mean
, median
and mode
functions take a single mandatory argument and return the appropriate statistic, e.g.:
mean([1, 2, 3])
2
mean(data) -> arithmetic mean of data.
median(data) -> median (middle value) of data, taking the average of the two middle values when there are an even number of values.
median_high(data) -> high median of data, taking the larger of the two middle values when the number of items is even.
median_low(data) -> low median of data, taking the smaller of the two middle values when the number of items is even.
median_grouped(data, interval=1) -> 50th percentile of grouped data, using interpolation.
mode(data) -> most common data point.
variance(data, xbar=None) -> sample variance of data, optionally using xbar as the sample mean.
stdev(data, xbar=None) -> sample standard deviation of data, optionally using xbar as the sample mean.
pvariance(data, mu=None) -> population variance of data, optionally using mu as the population mean.
pstdev(data, mu=None) -> population standard deviation of data, optionally using mu as the population mean.
Q: Will this module be backported to older versions of Python?
A: The module currently targets 3.3, and available on PyPI for 3.3 for the foreseeable future.
Backporting to older versions of the 3.x series is likely (but not yet decided).
Backporting to 2.7 is less likely but not ruled out.