In this lesson, we'll learn about numerical analysis with the NumPy computing library.
Set up
First we'll import the NumPy package and set seeds for reproducability so that we can receive the exact same results every time.
1
importnumpyasnp
12
# Set seed for reproducibilitynp.random.seed(seed=1234)
Basics
1234567
# Scalarx=np.array(6)print("x: ",x)print("x ndim: ",x.ndim)# number of dimensionsprint("x shape:",x.shape)# dimensionsprint("x size: ",x.size)# size of elementsprint("x dtype: ",x.dtype)# data type
x: 6
x ndim: 0
x shape: ()
x size: 1
x dtype: int64
Keep in mind that when indexing the row and column, indices start at 0. And like indexing with lists, we can use negative indices as well (where -1 is the last item).
# Basic mathx=np.array([[1,2],[3,4]],dtype=np.float64)y=np.array([[1,2],[3,4]],dtype=np.float64)print("x + y:\n",np.add(x,y))# or x + yprint("x - y:\n",np.subtract(x,y))# or x - yprint("x * y:\n",np.multiply(x,y))# or x * y
x + y:
[[2. 4.]
[6. 8.]]
x - y:
[[0. 0.]
[0. 0.]]
x * y:
[[ 1. 4.]
[ 9. 16.]]
Dot product
One of the most common NumPy operations weβll use in machine learning is matrix multiplication using the dot product. We take the rows of our first matrix (2) and the columns of our second matrix (2) to determine the dot product, giving us an output of [2 X 2]. The only requirement is that the inside dimensions match, in this case the first matrix has 3 columns and the second matrix has 3 rows.
123456
# Dot producta=np.array([[1,2,3],[4,5,6]],dtype=np.float64)# we can specify dtypeb=np.array([[7,8],[9,10],[11,12]],dtype=np.float64)c=a.dot(b)print(f"{a.shape} Β· {b.shape} = {c.shape}")print(c)
# Sum across a dimensionx=np.array([[1,2],[3,4]])print(x)print("sum all: ",np.sum(x))# adds all elementsprint("sum axis=0: ",np.sum(x,axis=0))# sum across rowsprint("sum axis=1: ",np.sum(x,axis=1))# sum across columns
[[1 2]
[3 4]]
sum all: 10
sum axis=0: [4 6]
sum axis=1: [3 7]
min: 1
max: 6
min axis=0: [1 2 3]
min axis=1: [1 4]
Broadcast
Here, weβre adding a vector with a scalar. Their dimensions arenβt compatible as is but how does NumPy still gives us the right result? This is where broadcasting comes in. The scalar is broadcast across the vector so that they have compatible shapes.
We often need to change the dimensions of our tensors for operations like the dot product. If we need to switch two dimensions, we can transpose
the tensor.
1234567
# Transposingx=np.array([[1,2,3],[4,5,6]])print("x:\n",x)print("x.shape: ",x.shape)y=np.transpose(x,(1,0))# flip dimensions at index 0 and 1print("y:\n",y)print("y.shape: ",y.shape)
Sometimes, we'll need to alter the dimensions of the matrix. Reshaping allows us to transform a tensor into different permissible shapes -- our reshaped tensor has the same amount of values in the tensor. (1X6 = 2X3). We can also use -1 on a dimension and NumPy will infer the dimension based on our input tensor.
The way reshape works is by looking at each dimension of the new tensor and separating our original tensor into that many units. So here the dimension at index 0 of the new tensor is 2 so we divide our original tensor into 2 units, and each of those has 3 values.
Though reshaping is very convenient to manipulate tensors, we must be careful of their pitfalls as well. Let's look at the example below. Suppose we have x, which has the shape [2 X 3 X 4].
We want to reshape x so that it has shape [3 X 8] which we'll get by moving the dimension at index 0 to become the dimension at index 1 and then combining the last two dimensions. But when we do this, we want our output
Instead, if we transpose the tensor and then do a reshape, we get our desired tensor. Transpose allows us to put our two vectors that we want to combine together and then we use reshape to join them together.
Note
Always create a dummy example like this when youβre unsure about reshaping. Blindly going by the tensor shape can lead to lots of issues downstream.
We can also easily add and remove dimensions to our tensors and we'll want to do this to make tensors compatible for certain operations.
1234567
# Adding dimensionsx=np.array([[1,2,3],[4,5,6]])print("x:\n",x)print("x.shape: ",x.shape)y=np.expand_dims(x,1)# expand dim 1print("y: \n",y)print("y.shape: ",y.shape)# notice extra set of brackets are added
# Removing dimensionsx=np.array([[[1,2,3]],[[4,5,6]]])print("x:\n",x)print("x.shape: ",x.shape)y=np.squeeze(x,1)# squeeze dim 1print("y: \n",y)print("y.shape: ",y.shape)# notice extra set of brackets are gone