Exploring NumPy

Austin Godber
@godber

DesertPy - 8/27/2014

Getting Started

  • Installation
    • Linux - apt-get install python-numpy
    • OS X - Anaconda
    • Windows - Anaconda

Typically imported as np for brevity.

In [91]:
%matplotlib inline
import numpy as np
import scipy
import matplotlib.pylab as plt

Features

  • ndarray - Multidimensional array object
  • vectorized operations that operate on ndarray without loops
  • IO tools to read/write files and work with memory mapped files
  • Linear algebra, Fourier Transform and Random Number utilities
  • C/C++ and Fortran integration

NDArray

At the core of NumPy we have the ndarray object.

ndarray is a multidimensional container for homogeneous data, which has attributes: shape, dtype and supports many vectorized operations.

In [2]:
data = range(10)
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
array = np.array(data)
print "array contains %s" % array
array contains [0 1 2 3 4 5 6 7 8 9]

In [3]:
print "array.shape is %s, array.dtype is %s" % (array.shape, array.dtype)
array.shape is (10,), array.dtype is int64

In [4]:
array**2
Out[4]:
array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

Creating

There are many methods that can create ndarray objects. Some are shown below

  • array, asarray
  • arange, linspace and meshgrid
  • ones and ones_like
  • zeros and zeros_like
  • empty and empty_like
  • eye and identity
  • fromfile, fromfunction and loadtxt

http://scipy.org/docs/numpy/reference/routines.array-creation.html

array - pass an array type object and optionally dtype

In [5]:
np.array([1, 2, 3])
Out[5]:
array([1, 2, 3])
In [6]:
a1 = np.array([1, 2, 3.0])  # automatic upcast to float64
print a1, a1.dtype
[ 1.  2.  3.] float64

In [7]:
a2 = np.array([1, 2, 3], dtype='float64')
print a2, a2.dtype
[ 1.  2.  3.] float64

Multiple Dimensions

In [8]:
a3 = np.array([[0, 1, 2], [3, 4, 5]])
print a3, a3.shape, a3.size, a3.dtype
[[0 1 2]
 [3 4 5]] (2, 3) 6 int64

asarray like array but existing arrays are not copied

In [9]:
a1
Out[9]:
array([ 1.,  2.,  3.])
In [10]:
np.array(a1) is a1  # a1 data copied to create new ndarray
Out[10]:
False
In [11]:
np.asarray(a1) is a1  # a1 is referenced rather than copied
Out[11]:
True

arange returns ndarray of evenly spaced values within a given interval

In [12]:
np.arange(10)
Out[12]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [13]:
ar1 = np.arange(0, 19, 2)
ar1
Out[13]:
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

ones and ones_like returns an array full of ones

In [14]:
np.ones(5)
Out[14]:
array([ 1.,  1.,  1.,  1.,  1.])
In [15]:
np.ones((2,3))
Out[15]:
array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])
In [16]:
np.ones_like(ar1)
Out[16]:
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
  • zeros and zeros_like returns array of zeros
  • empty and empty_like returns empty, uninitialized array (junk)
In [17]:
np.empty((3,1))
Out[17]:
array([[  6.93416554e-310],
       [  6.93416554e-310],
       [  6.93413712e-310]])

eye and identity returns identity matrix

In [18]:
np.eye(3)
Out[18]:
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

Advanced Creation Functions

  • genfromtxt - generate array from StringIO object
    • handles delimeters, comments, footers, headers, missing values
    • fixed width with white space stripping

Data Types

  • Boolean: bool_
  • Signed Integer Types: int_, int8, int16, int32, int64
  • Unsigned Integer Types: uint8, uint16, uint32, uint64
  • Floating Point Types: float_, float16, float32, float64
  • Complex Type represented by a real and imaginary component: complex_, complex64, complex128
In [19]:
dta1 = np.array([1,2,3], dtype=np.float64)
dta2 = np.array([1,2,3], dtype='float64')
dta3 = np.array([1,2,3], dtype='float_')
print dta1, dta2, dta3, '\n', dta1.dtype, dta2.dtype, dta3.dtype
[ 1.  2.  3.] [ 1.  2.  3.] [ 1.  2.  3.] 
float64 float64 float64

Just use the object:

np.float64

Operations

There are a huge number of operations

http://docs.scipy.org/doc/numpy/reference/routines.html

  • Changing Shapes, Joining and Splitting
  • String Operations
  • Datetime Functions
  • Math - trig, rounding, basic arithmetic, exponents, logs
  • Linear Algebra, FFTs
  • Masking functions

Basic Arithmetic

Addition

In [20]:
# Remember this guy?
ar1
Out[20]:
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])
In [21]:
ar1 + ar1
Out[21]:
array([ 0,  4,  8, 12, 16, 20, 24, 28, 32, 36])
In [22]:
ar1 + 100  # broadcasting
Out[22]:
array([100, 102, 104, 106, 108, 110, 112, 114, 116, 118])

Trigonometric Functions

In [23]:
rad = np.arange(6.) * np.pi / 6
np.degrees(rad)
Out[23]:
array([   0.,   30.,   60.,   90.,  120.,  150.])
In [24]:
np.sin(rad)
Out[24]:
array([ 0.       ,  0.5      ,  0.8660254,  1.       ,  0.8660254,  0.5      ])
In [25]:
# Create X values from -Pi t- Pi, 201 steps
x = np.linspace(-np.pi, np.pi, 201)
_ = plt.plot(x, np.sin(x))

Shape shifting

In [26]:
rar1 = ar1.reshape(2,5)
rar1
Out[26]:
array([[ 0,  2,  4,  6,  8],
       [10, 12, 14, 16, 18]])

Squishers

Functions that flatten the array

In [27]:
np.ravel(rar1)  # sometimes a copy
Out[27]:
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])
In [28]:
rar1.flatten()  # always a copy
Out[28]:
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])
In [29]:
[x for x in rar1.flat]  # .flat is an iterator
Out[29]:
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

Extra Credit There is an order argument to these methods, 'C' (C-like) and 'F' (Fortran-like). By default NumPy ndarray data is stored in a C-Like row-major layout, that is, row values are contiguous in memory.

Duplicating and Tiling

In [30]:
rar2 = np.arange(4).reshape(2,2)
rar2
Out[30]:
array([[0, 1],
       [2, 3]])

Concatenate

In [31]:
np.concatenate((rar2, rar2))  # default axis=0
Out[31]:
array([[0, 1],
       [2, 3],
       [0, 1],
       [2, 3]])
In [32]:
np.concatenate((rar2, rar2), axis=1)
Out[32]:
array([[0, 1, 0, 1],
       [2, 3, 2, 3]])

Or use the shortcuts

In [33]:
np.vstack((rar2, rar2))
Out[33]:
array([[0, 1],
       [2, 3],
       [0, 1],
       [2, 3]])
In [34]:
np.hstack((rar2, rar2))
Out[34]:
array([[0, 1, 0, 1],
       [2, 3, 2, 3]])

Stacking in the third dimension

In [35]:
np.dstack((rar2, rar2))
Out[35]:
array([[[0, 0],
        [1, 1]],

       [[2, 2],
        [3, 3]]])

Tiling

In [36]:
np.tile(rar2, (2,2))
Out[36]:
array([[0, 1, 0, 1],
       [2, 3, 2, 3],
       [0, 1, 0, 1],
       [2, 3, 2, 3]])

There's so much more ...

  • resize, delete, intsert
  • roll, rot90, fliplr, flipud
  • split SPLIT SPLIT!!!!

Slicing and Dicing

  • Simple array indexing is similar to that of Python lists
  • However, slices are views not copies of the data.
In [37]:
slicey = np.arange(10)
slicey
Out[37]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [38]:
slicey[5]
Out[38]:
5
In [39]:
slicey[5:8]  # note 8's absence
Out[39]:
array([5, 6, 7])

Omitting one of the values to get the rest of the values in one direction

In [40]:
slicey[3:]
Out[40]:
array([3, 4, 5, 6, 7, 8, 9])
In [41]:
slicey[:3]
Out[41]:
array([0, 1, 2])

Count by n or stride by adding another :

In [42]:
slicey[::2]
Out[42]:
array([0, 2, 4, 6, 8])
In [43]:
slicey[1::2]
Out[43]:
array([1, 3, 5, 7, 9])
In [44]:
slicey[-2:]
Out[44]:
array([8, 9])

Assigning to a view modifies the original array.

In [45]:
slicey_slice = slicey[2:5]
slicey_slice
Out[45]:
array([2, 3, 4])
In [46]:
slicey_slice[:] = 56
slicey_slice
Out[46]:
array([56, 56, 56])
In [47]:
slicey
Out[47]:
array([ 0,  1, 56, 56, 56,  5,  6,  7,  8,  9])

Moral of the story?

Use .copy() on your slices if you don't want to trash your original array.

Boolean Indexing

In [48]:
randy = np.random.randn(3,3)
randy
Out[48]:
array([[ 1.34480772,  2.41880375, -0.15025801],
       [ 0.78994062,  2.14416936, -0.43433278],
       [ 0.66470266, -0.03281608, -0.49504067]])
In [49]:
positively_randy = randy >= 0.0
positively_randy
Out[49]:
array([[ True,  True, False],
       [ True,  True, False],
       [ True, False, False]], dtype=bool)
In [50]:
randy[positively_randy]
Out[50]:
array([ 1.34480772,  2.41880375,  0.78994062,  2.14416936,  0.66470266])
In [51]:
randy[~positively_randy]
Out[51]:
array([-0.15025801, -0.43433278, -0.03281608, -0.49504067])

positively_randy is an index, you can use it for access as shown above, or accessing on assignement as shown below.

In [52]:
randy[~positively_randy] = 0.0  # negate the booleans with ~
randy
Out[52]:
array([[ 1.34480772,  2.41880375,  0.        ],
       [ 0.78994062,  2.14416936,  0.        ],
       [ 0.66470266,  0.        ,  0.        ]])

Of course this could have all been done in one step.

In [53]:
andy = np.random.randn(3,3)
andy
Out[53]:
array([[-0.27309172,  1.09020965,  0.54920302],
       [-0.81095048, -0.18885317,  0.6673531 ],
       [-1.4496338 , -0.40677127,  1.87158194]])
In [55]:
andy[andy < 0.0] = 0.0
andy
Out[55]:
array([[ 0.        ,  1.09020965,  0.54920302],
       [ 0.        ,  0.        ,  0.6673531 ],
       [ 0.        ,  0.        ,  1.87158194]])

Multidimensional

In []:
 

Doing something useful

In [63]:
from scipy import misc
wallaby = misc.imread('wallaby_746_600x450.jpg')
print type(wallaby), wallaby.size, wallaby.shape, wallaby.dtype
<type 'numpy.ndarray'> 810000 (600, 450, 3) uint8

In [62]:
plt.imshow(wallaby)
Out[62]:
<matplotlib.image.AxesImage at 0x7fa56c33f3d0>
In [80]:
fig, (ax0, ax1, ax2) = plt.subplots(ncols=3)
fig.set_size_inches(10, 4)
ax0.imshow(wallaby[:, :, 0], cmap='gray')
ax0.get_yaxis().set_ticks([]); ax0.get_xaxis().set_ticks([]); ax0.set_title('Red')
ax1.imshow(wallaby[:, :, 1], cmap='gray')
ax1.get_yaxis().set_ticks([]); ax1.get_xaxis().set_ticks([]); ax1.set_title('Green')
ax2.imshow(wallaby[:, :, 2], cmap='gray')
ax2.get_yaxis().set_ticks([]); ax2.get_xaxis().set_ticks([]); ax2.set_title('Blue')
Out[80]:
<matplotlib.text.Text at 0x7fa5694db590>
In [89]:
h = plt.hist(wallaby[:, :, 2].flatten(), 256,  fc='k', ec='k')

Credits

In [54]: