Exploring NumPy

Austin Godber
@godber

DesertPy - 8/27/2014

Getting Started

Installation
- Linux - apt-get install python-numpy
- OS X - Anaconda
- Windows - Anaconda

Typically imported as np for brevity.

In [91]:

%matplotlib inline
import numpy as np
import scipy
import matplotlib.pylab as plt

Features

ndarray - Multidimensional array object
vectorized operations that operate on ndarray without loops
IO tools to read/write files and work with memory mapped files
Linear algebra, Fourier Transform and Random Number utilities
C/C++ and Fortran integration

NDArray

At the core of NumPy we have the ndarray object.

ndarray is a multidimensional container for homogeneous data, which has attributes: shape, dtype and supports many vectorized operations.

In [2]:

data = range(10)
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
array = np.array(data)
print "array contains %s" % array

array contains [0 1 2 3 4 5 6 7 8 9]

In [3]:

print "array.shape is %s, array.dtype is %s" % (array.shape, array.dtype)

array.shape is (10,), array.dtype is int64

In [4]:

array**2

Out[4]:

array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

Creating

There are many methods that can create ndarray objects. Some are shown below

array, asarray
arange, linspace and meshgrid
ones and ones_like
zeros and zeros_like
empty and empty_like
eye and identity
fromfile, fromfunction and loadtxt

http://scipy.org/docs/numpy/reference/routines.array-creation.html

array - pass an array type object and optionally dtype

In [5]:

np.array([1, 2, 3])

Out[5]:

array([1, 2, 3])

In [6]:

a1 = np.array([1, 2, 3.0])  # automatic upcast to float64
print a1, a1.dtype

[ 1.  2.  3.] float64

In [7]:

a2 = np.array([1, 2, 3], dtype='float64')
print a2, a2.dtype

[ 1.  2.  3.] float64

Multiple Dimensions

In [8]:

a3 = np.array([[0, 1, 2], [3, 4, 5]])
print a3, a3.shape, a3.size, a3.dtype

[[0 1 2]
 [3 4 5]] (2, 3) 6 int64

asarray like array but existing arrays are not copied

In [9]:

a1

Out[9]:

array([ 1.,  2.,  3.])

In [10]:

np.array(a1) is a1  # a1 data copied to create new ndarray

Out[10]:

False

In [11]:

np.asarray(a1) is a1  # a1 is referenced rather than copied

Out[11]:

True

arange returns ndarray of evenly spaced values within a given interval

In [12]:

np.arange(10)

Out[12]:

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [13]:

ar1 = np.arange(0, 19, 2)
ar1

Out[13]:

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

ones and ones_like returns an array full of ones

In [14]:

np.ones(5)

Out[14]:

array([ 1.,  1.,  1.,  1.,  1.])

In [15]:

np.ones((2,3))

Out[15]:

array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

In [16]:

np.ones_like(ar1)

Out[16]:

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

zeros and zeros_like returns array of zeros

empty and empty_like returns empty, uninitialized array (junk)

In [17]:

np.empty((3,1))

Out[17]:

array([[  6.93416554e-310],
       [  6.93416554e-310],
       [  6.93413712e-310]])

eye and identity returns identity matrix

In [18]:

np.eye(3)

Out[18]:

array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

Advanced Creation Functions

genfromtxt - generate array from StringIO object
- handles delimeters, comments, footers, headers, missing values
- fixed width with white space stripping

Data Types

Boolean: bool_
Signed Integer Types: int_, int8, int16, int32, int64
Unsigned Integer Types: uint8, uint16, uint32, uint64
Floating Point Types: float_, float16, float32, float64
Complex Type represented by a real and imaginary component: complex_, complex64, complex128

In [19]:

dta1 = np.array([1,2,3], dtype=np.float64)
dta2 = np.array([1,2,3], dtype='float64')
dta3 = np.array([1,2,3], dtype='float_')
print dta1, dta2, dta3, '\n', dta1.dtype, dta2.dtype, dta3.dtype

[ 1.  2.  3.] [ 1.  2.  3.] [ 1.  2.  3.] 
float64 float64 float64

Just use the object:

np.float64

Operations

There are a huge number of operations

http://docs.scipy.org/doc/numpy/reference/routines.html

Changing Shapes, Joining and Splitting
String Operations
Datetime Functions
Math - trig, rounding, basic arithmetic, exponents, logs
Linear Algebra, FFTs
Masking functions

Basic Arithmetic

Addition

In [20]:

# Remember this guy?
ar1

Out[20]:

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [21]:

ar1 + ar1

Out[21]:

array([ 0,  4,  8, 12, 16, 20, 24, 28, 32, 36])

In [22]:

ar1 + 100  # broadcasting

Out[22]:

array([100, 102, 104, 106, 108, 110, 112, 114, 116, 118])

Trigonometric Functions

In [23]:

rad = np.arange(6.) * np.pi / 6
np.degrees(rad)

Out[23]:

array([   0.,   30.,   60.,   90.,  120.,  150.])

In [24]:

np.sin(rad)

Out[24]:

array([ 0.       ,  0.5      ,  0.8660254,  1.       ,  0.8660254,  0.5      ])

In [25]:

# Create X values from -Pi t- Pi, 201 steps
x = np.linspace(-np.pi, np.pi, 201)
_ = plt.plot(x, np.sin(x))

Shape shifting

In [26]:

rar1 = ar1.reshape(2,5)
rar1

Out[26]:

array([[ 0,  2,  4,  6,  8],
       [10, 12, 14, 16, 18]])

Squishers

Functions that flatten the array

In [27]:

np.ravel(rar1)  # sometimes a copy

Out[27]:

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [28]:

rar1.flatten()  # always a copy

Out[28]:

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [29]:

[x for x in rar1.flat]  # .flat is an iterator

Out[29]:

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

Extra Credit There is an order argument to these methods, 'C' (C-like) and 'F' (Fortran-like). By default NumPy ndarray data is stored in a C-Like row-major layout, that is, row values are contiguous in memory.

Duplicating and Tiling

In [30]:

rar2 = np.arange(4).reshape(2,2)
rar2

Out[30]:

array([[0, 1],
       [2, 3]])

Concatenate

In [31]:

np.concatenate((rar2, rar2))  # default axis=0

Out[31]:

array([[0, 1],
       [2, 3],
       [0, 1],
       [2, 3]])

In [32]:

np.concatenate((rar2, rar2), axis=1)

Out[32]:

array([[0, 1, 0, 1],
       [2, 3, 2, 3]])

Or use the shortcuts

In [33]:

np.vstack((rar2, rar2))

Out[33]:

array([[0, 1],
       [2, 3],
       [0, 1],
       [2, 3]])

In [34]:

np.hstack((rar2, rar2))

Out[34]:

array([[0, 1, 0, 1],
       [2, 3, 2, 3]])

Stacking in the third dimension

In [35]:

np.dstack((rar2, rar2))

Out[35]:

array([[[0, 0],
        [1, 1]],

       [[2, 2],
        [3, 3]]])

Tiling

In [36]:

np.tile(rar2, (2,2))

Out[36]:

array([[0, 1, 0, 1],
       [2, 3, 2, 3],
       [0, 1, 0, 1],
       [2, 3, 2, 3]])

There's so much more ...

resize, delete, intsert
roll, rot90, fliplr, flipud
split SPLIT SPLIT!!!!

Slicing and Dicing

Simple array indexing is similar to that of Python lists
However, slices are views not copies of the data.

In [37]:

slicey = np.arange(10)
slicey

Out[37]:

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [38]:

slicey[5]

Out[38]:

In [39]:

slicey[5:8]  # note 8's absence

Out[39]:

array([5, 6, 7])

Omitting one of the values to get the rest of the values in one direction

In [40]:

slicey[3:]

Out[40]:

array([3, 4, 5, 6, 7, 8, 9])

In [41]:

slicey[:3]

Out[41]:

array([0, 1, 2])

Count by n or stride by adding another :

In [42]:

slicey[::2]

Out[42]:

array([0, 2, 4, 6, 8])

In [43]:

slicey[1::2]

Out[43]:

array([1, 3, 5, 7, 9])

In [44]:

slicey[-2:]

Out[44]:

array([8, 9])

Assigning to a view modifies the original array.

In [45]:

slicey_slice = slicey[2:5]
slicey_slice

Out[45]:

array([2, 3, 4])

In [46]:

slicey_slice[:] = 56
slicey_slice

Out[46]:

array([56, 56, 56])

In [47]:

slicey

Out[47]:

array([ 0,  1, 56, 56, 56,  5,  6,  7,  8,  9])

Moral of the story?

Use .copy() on your slices if you don't want to trash your original array.

Boolean Indexing

In [48]:

randy = np.random.randn(3,3)
randy

Out[48]:

array([[ 1.34480772,  2.41880375, -0.15025801],
       [ 0.78994062,  2.14416936, -0.43433278],
       [ 0.66470266, -0.03281608, -0.49504067]])

In [49]:

positively_randy = randy >= 0.0
positively_randy

Out[49]:

array([[ True,  True, False],
       [ True,  True, False],
       [ True, False, False]], dtype=bool)

In [50]:

randy[positively_randy]

Out[50]:

array([ 1.34480772,  2.41880375,  0.78994062,  2.14416936,  0.66470266])

In [51]:

randy[~positively_randy]

Out[51]:

array([-0.15025801, -0.43433278, -0.03281608, -0.49504067])

positively_randy is an index, you can use it for access as shown above, or accessing on assignement as shown below.

In [52]:

randy[~positively_randy] = 0.0  # negate the booleans with ~
randy

Out[52]:

array([[ 1.34480772,  2.41880375,  0.        ],
       [ 0.78994062,  2.14416936,  0.        ],
       [ 0.66470266,  0.        ,  0.        ]])

Of course this could have all been done in one step.

In [53]:

andy = np.random.randn(3,3)
andy

Out[53]:

array([[-0.27309172,  1.09020965,  0.54920302],
       [-0.81095048, -0.18885317,  0.6673531 ],
       [-1.4496338 , -0.40677127,  1.87158194]])

In [55]:

andy[andy < 0.0] = 0.0
andy

Out[55]:

array([[ 0.        ,  1.09020965,  0.54920302],
       [ 0.        ,  0.        ,  0.6673531 ],
       [ 0.        ,  0.        ,  1.87158194]])

Multidimensional

In []:

Doing something useful

In [63]:

from scipy import misc
wallaby = misc.imread('wallaby_746_600x450.jpg')
print type(wallaby), wallaby.size, wallaby.shape, wallaby.dtype

<type 'numpy.ndarray'> 810000 (600, 450, 3) uint8

In [62]:

plt.imshow(wallaby)

Out[62]:

<matplotlib.image.AxesImage at 0x7fa56c33f3d0>

In [80]:

fig, (ax0, ax1, ax2) = plt.subplots(ncols=3)
fig.set_size_inches(10, 4)
ax0.imshow(wallaby[:, :, 0], cmap='gray')
ax0.get_yaxis().set_ticks([]); ax0.get_xaxis().set_ticks([]); ax0.set_title('Red')
ax1.imshow(wallaby[:, :, 1], cmap='gray')
ax1.get_yaxis().set_ticks([]); ax1.get_xaxis().set_ticks([]); ax1.set_title('Green')
ax2.imshow(wallaby[:, :, 2], cmap='gray')
ax2.get_yaxis().set_ticks([]); ax2.get_xaxis().set_ticks([]); ax2.set_title('Blue')

Out[80]:

<matplotlib.text.Text at 0x7fa5694db590>

In [89]:

h = plt.hist(wallaby[:, :, 2].flatten(), 256,  fc='k', ec='k')

Credits

NumPy docs http://docs.scipy.org/doc/numpy/reference/index.html
Python for Data Analysis by Wes McKinney

In [54]: