PEP 450

DesertPy Meetup Group

28 May 2014

Sarah Braden

What?

PEP: 450

Title: Adding A Statistics Module To The Standard Library

Last-Modified: 16 Mar 2014

Author: Steven D'Aprano

Why?

PEP 450 is deceptively simple

Graphing calculators can do this, Python should too

DIY statistics functions are often incorrect, and buggy at best

Don't have to install numpy just to take the standard deviation

In [2]:

def mean(data):
    return sum(data)/len(data)

def variance(data):
    # Use the Computational Formula for Variance.
    n = len(data)
    ss = sum(x**2 for x in data) - (sum(data)**2)/n
    return ss/(n-1)

def standard_deviation(data):
    return math.sqrt(variance(data))

In [3]:

# The above appears to be correct with a casual test:
data = [1, 2, 4, 5, 8]
variance(data)

Out[3]:

In [4]:

# However adding a constant to each data point should not change the variance:
data = [x+1e12 for x in data]
variance(data)

Out[4]:

0.0

In [5]:

# And variance should *never* be negative:
variance(data*100)

Out[5]:

-1239429440.1282566

Calculating mean, median and mode

The mean, median and mode functions take a single mandatory argument and return the appropriate statistic, e.g.:

In [6]:

mean([1, 2, 3])

Out[6]:

Functions

mean(data) -> arithmetic mean of data.

median(data) -> median (middle value) of data, taking the average of the two middle values when there are an even number of values.

median_high(data) -> high median of data, taking the larger of the two middle values when the number of items is even.

median_low(data) -> low median of data, taking the smaller of the two middle values when the number of items is even.

median_grouped(data, interval=1) -> 50th percentile of grouped data, using interpolation.

mode(data) -> most common data point.

Calculating variance and standard deviation

variance(data, xbar=None) -> sample variance of data, optionally using xbar as the sample mean.

stdev(data, xbar=None) -> sample standard deviation of data, optionally using xbar as the sample mean.

pvariance(data, mu=None) -> population variance of data, optionally using mu as the population mean.

pstdev(data, mu=None) -> population standard deviation of data, optionally using mu as the population mean.

Backporting

Q: Will this module be backported to older versions of Python?

A: The module currently targets 3.3, and available on PyPI for 3.3 for the foreseeable future.

Backporting to older versions of the 3.x series is likely (but not yet decided).

Backporting to 2.7 is less likely but not ruled out.