4  Introduction to NumPy

Open In Colab

NumPy (Numerical Python) is a fundamental library for numerical computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

4.1 What is NumPy used for?

  • 4.1.1 Numerical Operations:

    • Performing efficient mathematical and statistical operations on large datasets.
  • 4.1.2 Working with Arrays:

    • Creating, manipulating, and operating on N-dimensional arrays (ndarrays), which are the core data structure in NumPy.

4.2 Why is NumPy important in data science?

  • 4.2.1 Efficiency:

    • NumPy operations are implemented in C, making them significantly faster than equivalent operations on Python lists, especially for large datasets.
    • This efficiency is crucial for performance in data analysis and machine learning tasks.
  • Foundation for Other Libraries:
    • Many other popular data science libraries in Python, such as Pandas, Scikit-learn, and Matplotlib, are built on top of NumPy.
    • Understanding NumPy is essential for effectively using these libraries.
  • Mathematical Functions:
    • NumPy provides a comprehensive set of mathematical functions that can be applied to arrays, simplifying complex calculations.

4.2.2 Installing NumPy

NumPy can be easily installed using pip, the standard package installer for Python.

Open your terminal or command prompt and run the following command:

pip install numpy

This command will download and install the latest version of NumPy and its dependencies.

4.3 Importing numpy

Before you can use NumPy in your Python code, you need to import it.

The standard and widely accepted way to import NumPy is using the following statement:

import numpy as np

Here, import numpy tells Python to load the NumPy library, and as np assigns the alias np to the library. This alias is a convention and makes it much shorter to refer to NumPy functions and objects throughout your code (e.g., you can type np.array instead of numpy.array).

Let’s see the import statement in a code block:

import numpy as np

5 Creating Arrays

NumPy arrays, also known as ndarrays, are the core data structure in NumPy.

You can create them from existing Python lists and tuples using the np.array() function.

5.0.1 Creating a 1-dimensional array from a list:

print("1-dimensional array from list:")
my_list = [1, 2, 3, 4, 5]
list_array = np.array(my_list)
print(list_array)
1-dimensional array from list:
[1 2 3 4 5]

5.0.2 Creating a 2-dimensional array from a list of lists:

print("\n2-dimensional array from list of lists:")
my_list_of_lists = [
    [1, 2, 3],
    [4, 5, 6]]
list_array2 = np.array(my_list_of_lists)
print(list_array2)

2-dimensional array from list of lists:
[[1 2 3]
 [4 5 6]]

5.0.3 Creating a 1-dimensional array from a tuple:

print("\n1-dimensional array from tuple:")
my_tuple = (10, 20, 30, 40, 50)
tuple_array = np.array(my_tuple)
print(tuple_array)

1-dimensional array from tuple:
[10 20 30 40 50]
print("\n2-dimensional array from tuple of touples:")
my_tuple = (
    (10, 20), (30, 40), (50,60)
    )
tuple_array2 = np.array(my_tuple)
print(tuple_array2)

2-dimensional array from tuple of touples:
[[10 20]
 [30 40]
 [50 60]]

We can also combine them:

print("Combined arrays")
my_list_of_tuples = [ (1, 2), (3, 4)]
numpy_array1 = np.array(my_list_of_tuples)
print(numpy_array1)

my_tuple_of_lists = ([ 10, 20, 30], [30, 40, 50])
numpy_array2 = np.array(my_tuple_of_lists)
print(numpy_array2)
Combined arrays
[[1 2]
 [3 4]]
[[10 20 30]
 [30 40 50]]

5.1 Array attributes

NumPy arrays have several important attributes that provide information about their structure and the data they contain. Key attributes include:

  • 5.1.1 shape:

    • This attribute returns a tuple of integers representing the dimensions of the array.
    • For a 2D array with 2 rows and 3 columns, the shape would be (2, 3).
  • 5.1.2 dtype:

    • This attribute returns the data type of the elements in the array (e.g., int64, float64).
    • All elements in a NumPy array have the same data type.
  • 5.1.3 size

    • This attribute returns the total number of elements in the array.
    • It is the product of the elements of the shape tuple.

Let’s demonstrate these attributes using the arrays we created previously.

print("Attributes of numpy_array1:")
print("Shape:", numpy_array1.shape)
print("Dtype:", numpy_array1.dtype)
print("Size:", numpy_array1.size)

print("\nAttributes of numpy_array2:")
print("Shape:", numpy_array2.shape)
print("Dtype:", numpy_array2.dtype)
print("Size:", numpy_array2.size)
Attributes of numpy_array1:
Shape: (2, 2)
Dtype: int64
Size: 4

Attributes of numpy_array2:
Shape: (2, 3)
Dtype: int64
Size: 6

5.2 Array indexing and slicing

NumPy arrays support indexing and slicing, similar to Python lists, but with extensions for multi-dimensional arrays.

6 Indexing:

You can access individual elements in a NumPy array using square brackets [] and their index. - For 1D arrays, a single index is used. - For 2D arrays, you use a comma-separated pair of indices [row_index, column_index].

⚠️ Remember that indexing is zero-based.

print("Example (1D array):")
arr_1d = np.array([11, 22, 33, 44, 55])
print(arr_1d)
print(arr_1d[0])
    # Accessing the first element
print(arr_1d[3])
    # Accessing the fourth element
Example (1D array):
[11 22 33 44 55]
11
44
print("Example (2D array):")
arr_2d = np.array([[11, 12, 13], [24, 25, 26], [37, 38, 39]])
print(arr_2d)
print(arr_2d[0, 0])
    # Accessing the element in the first row and first column
print(arr_2d[1, 2])
    # Accessing the element in the second row and third column
Example (2D array):
[[11 12 13]
 [24 25 26]
 [37 38 39]]
11
26

7 Slicing:

Slicing allows you to extract a subarray from a NumPy array.

The syntax is [start:stop:step], where - start is the beginning index (inclusive), - stop is the ending index (exclusive), - and step is the interval between elements.

If start, stop, or step are omitted, they default to the beginning, end, and 1 respectively.

print("Example (1D array slicing):")
arr_1d = np.array([10, 20, 30, 40, 50])
print(arr_1d)
print(arr_1d[1:4])
    # Elements from index 1 up to (but not including) index 4
print(arr_1d[:3])
    # Elements from the beginning up to (but not including) index 3
print(arr_1d[2:])
    # Elements from index 2 to the end
print(arr_1d[::2])
    # Every second element
Example (1D array slicing):
[10 20 30 40 50]
[20 30 40]
[10 20 30]
[30 40 50]
[10 30 50]
print("Example (2D array slicing):")

arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(arr_2d)

print(arr_2d[0:2, 1:3])
    # Rows from index 0 up to 2, columns from index 1 up to 3
print(arr_2d[:, 0])
    # All rows, only the first column
print(arr_2d[1, :])
    # Second row, all columns
Example (2D array slicing):
[[1 2 3]
 [4 5 6]
 [7 8 9]]
[[2 3]
 [5 6]]
[1 4 7]
[4 5 6]

8 Array operations

NumPy’s power lies in its ability to perform operations on entire arrays efficiently.

Two key concepts for understanding these operations are element-wise operations and broadcasting.

8.1 Element-wise Operations

When performing arithmetic operations (+, -, *, /) between two NumPy arrays of the same shape, the operation is applied to each corresponding element in the arrays.

The result is a new array with the same shape as the input arrays.

Similarly, when performing arithmetic operations between a NumPy array and a scalar (a single number), the operation is applied to each element of the array.

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

# Element-wise addition
addition_result = arr1 + arr2

# Element-wise multiplication
multiplication_result = arr1 * arr2

# Element-wise addition with a scalar
scalar_addition_result = arr1 + 10

print(arr1)
print(arr2)
print(addition_result)
print(multiplication_result)
print(scalar_addition_result)
[1 2 3]
[4 5 6]
[5 7 9]
[ 4 10 18]
[11 12 13]

9 Broadcasting:

Broadcasting is a mechanism that allows NumPy to perform operations on arrays of different shapes.

When the shapes of two arrays are not the same, NumPy attempts to “broadcast” the smaller array across the larger array so that they have compatible shapes for the operation.

The broadcasting rules are as follows: 1. If the arrays do not have the same number of dimensions, the shape of the smaller dimension array is padded with ones on its left side. 2. If the shapes of the two arrays still do not match in any dimension, and neither dimension is 1, an error is raised. 3. Dimensions with size 1 are stretched to match the size of the other array’s dimension.

arr_broadcast_1 = np.array([[1, 2, 3], [4, 5, 6]]) # Shape (2, 3)
scalar = 5 # Treated as shape () or effectively (1, 1) for broadcasting

print("Broadcasting scalar to the array")
broadcast_result_scalar = arr_broadcast_1 * scalar
print(broadcast_result_scalar)

arr_broadcast_2 = np.array([10, 20, 30]) # Shape (3,)

print(" Broadcasting a 1D array to a 2D array")
broadcast_result_array = arr_broadcast_1 + arr_broadcast_2
print(broadcast_result_array)
Broadcasting scalar to the array
[[ 5 10 15]
 [20 25 30]]
 Broadcasting a 1D array to a 2D array
[[11 22 33]
 [14 25 36]]

Reasoning: Demonstrate the element-wise operations and broadcasting examples explained in the markdown cell using sample NumPy arrays and print the results.

print("--- Demonstrating Element-wise Operations ---")

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

# Element-wise addition
addition_result = arr1 + arr2
print("\nElement-wise addition of arr1 and arr2:", addition_result)

# Element-wise multiplication
multiplication_result = arr1 * arr2
print("Element-wise multiplication of arr1 and arr2:", multiplication_result)

# Element-wise addition with a scalar
scalar_addition_result = arr1 + 10
print("Element-wise addition of arr1 and scalar 10:", scalar_addition_result)

print("\n--- Demonstrating Broadcasting ---")

arr_broadcast_1 = np.array([[1, 2, 3], [4, 5, 6]]) # Shape (2, 3)
scalar = 5 # Treated as shape () or effectively (1, 1) for broadcasting

# Broadcasting scalar to the array
broadcast_result_scalar = arr_broadcast_1 * scalar
print("\nBroadcasting scalar 5 to arr_broadcast_1:\n", broadcast_result_scalar)

arr_broadcast_2 = np.array([10, 20, 30]) # Shape (3,)

# Broadcasting a 1D array to a 2D array
broadcast_result_array = arr_broadcast_1 * arr_broadcast_2
print("Broadcasting arr_broadcast_2 to arr_broadcast_1:\n", broadcast_result_array)
--- Demonstrating Element-wise Operations ---

Element-wise addition of arr1 and arr2: [5 7 9]
Element-wise multiplication of arr1 and arr2: [ 4 10 18]
Element-wise addition of arr1 and scalar 10: [11 12 13]

--- Demonstrating Broadcasting ---

Broadcasting scalar 5 to arr_broadcast_1:
 [[ 5 10 15]
 [20 25 30]]
Broadcasting arr_broadcast_2 to arr_broadcast_1:
 [[ 10  40  90]
 [ 40 100 180]]

10 Common NumPy Functions

NumPy provides a wide range of mathematical and statistical functions that can be applied to arrays.

Some of the most commonly used functions include:

  • np.sum(): Calculates the sum of all elements in an array, or the sum along a specific axis.
  • np.mean(): Calculates the arithmetic mean (average) of all elements in an array, or the mean along a specific axis.
  • np.max(): Finds the maximum value among all elements in an array, or the maximum along a specific axis.
  • np.min(): Finds the minimum value among all elements in an array, or the minimum along a specific axis.

These functions are very useful for quickly getting summary statistics from your data.

Reasoning: Demonstrate the usage of np.sum, np.mean, np.max, and np.min on existing NumPy arrays and print the results.

print("--- Demonstrating Common NumPy Functions ---")

numpy_array1 = np.array([10, 20, 30, 40, 55])
print("\nUsing numpy_array_from_tuple:")
print("Sum:", np.sum(numpy_array1))
print("Mean:", np.mean(numpy_array1))
print("Median:", np.median(numpy_array1))
print("Max:", np.max(numpy_array1))
print("Min:", np.min(numpy_array1))
print("St. Dev:", np.std(numpy_array1))

numpy_array2 = np.array([[1, 2, 3], [4, 5, 16]])
print("\nUsing numpy_array_from_list_of_lists:")
print("Sum:", np.sum(numpy_array2))
print("Mean:", np.mean(numpy_array2))
print("Median:", np.median(numpy_array2))
print("Max:", np.max(numpy_array2))
print("Min:", np.min(numpy_array2))
print("St. Dev:", np.std(numpy_array2))

numpy_array3 = [12, 23, 34, 45, 56]
print("A1-A3 Correlation:", np.corrcoef(numpy_array1,numpy_array3))


# Demonstrating axis parameter for 2D array
print("\nUsing numpy_array_from_list_of_lists with axis:")
print("Sum along axis 0 (columns):", np.sum(numpy_array2, axis=0))
print("Mean along axis 1 (rows):", np.mean(numpy_array2, axis=1))
--- Demonstrating Common NumPy Functions ---

Using numpy_array_from_tuple:
Sum: 155
Mean: 31.0
Median: 30.0
Max: 55
Min: 10
St. Dev: 15.620499351813308

Using numpy_array_from_list_of_lists:
Sum: 31
Mean: 5.166666666666667
Median: 3.5
Max: 16
Min: 1
St. Dev: 5.013869652163774
A1-A3 Correlation: [[1.         0.99589321]
 [0.99589321 1.        ]]

Using numpy_array_from_list_of_lists with axis:
Sum along axis 0 (columns): [ 5  7 19]
Mean along axis 1 (rows): [2.         8.33333333]

11 Basic Linear Algebra with NumPy

NumPy provides efficient functions for basic linear algebra operations, which are fundamental in many areas like machine learning, physics, and engineering.

Key operations include dot product and matrix multiplication.

12 Dot Product:

The dot product is a scalar value calculated from two vectors (1D arrays). It is the sum of the products of the corresponding elements.

# Example of dot product with two 1D arrays
vec1 = np.array([1, 2, 3])
vec2 = np.array([4, 5, 6])

# Calculate the dot product
dot_product_result = np.dot(vec1, vec2)
print(dot_product_result)

# Alternatively, you can use the @ operator for dot product of 1D arrays (Python 3.5+)
dot_product_result = vec1 @ vec2
print(dot_product_result)
32
32

13 Matrix Multiplication:

Matrix multiplication is an operation that produces a new matrix from two matrices.

The number of columns in the first matrix must be equal to the number of rows in the second matrix.

# Example of matrix multiplication with two 2D arrays
matrix1 = np.array([
    [1, 2],
    [3, 4]])
matrix2 = np.array([
    [5, 6],
    [7, 8]])

# Calculate matrix multiplication using np.matmul()
matrix_multiplication_result_matmul = np.matmul(matrix1, matrix2)
print(matrix_multiplication_result_matmul)

# Alternatively, you can use the @ operator for matrix multiplication (Python 3.5+)
matrix_multiplication_result_at = matrix1 @ matrix2
print(matrix_multiplication_result_at)
[[19 22]
 [43 50]]
[[19 22]
 [43 50]]

Execute the code examples for dot product and matrix multiplication and print the results with descriptive labels.

print("--- Demonstrating Basic Linear Algebra with NumPy ---")

# Dot product example
vec1 = np.array([1, 2, 3])
vec2 = np.array([4, 5, 6])
dot_product_result = np.dot(vec1, vec2)
print("\nDot product of vec1 and vec2:", dot_product_result)

# Matrix multiplication example
matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[5, 6], [7, 8]])
matrix_multiplication_result_matmul = np.matmul(matrix1, matrix2)
print("\nMatrix multiplication of matrix1 and matrix2 (using np.matmul):\n", matrix_multiplication_result_matmul)

matrix_multiplication_result_at = matrix1 @ matrix2
print("\nMatrix multiplication of matrix1 and matrix2 (using @ operator):\n", matrix_multiplication_result_at)
--- Demonstrating Basic Linear Algebra with NumPy ---

Dot product of vec1 and vec2: 32

Matrix multiplication of matrix1 and matrix2 (using np.matmul):
 [[19 22]
 [43 50]]

Matrix multiplication of matrix1 and matrix2 (using @ operator):
 [[19 22]
 [43 50]]

13.1 Summary:

13.1.1 Data Analysis Key Findings

  • NumPy is a fundamental library for numerical computing in Python, providing efficient support for multi-dimensional arrays and mathematical functions.
  • NumPy arrays (ndarrays) can be easily created from Python lists and tuples using np.array().
  • Important array attributes include shape (dimensions), dtype (data type of elements), and size (total number of elements).
  • NumPy supports intuitive indexing for accessing individual elements and powerful slicing ([start:stop:step]) for extracting subarrays, applicable to both 1D and multi-dimensional arrays.
  • NumPy enables efficient element-wise operations on arrays of the same shape and leverages broadcasting to perform operations on arrays with compatible shapes.
  • Common NumPy functions like np.sum(), np.mean(), np.max(), and np.min() are available for calculating summary statistics, including along specific axes.
  • Basic linear algebra operations, such as dot product (np.dot() or @ for 1D arrays) and matrix multiplication (np.matmul() or @ for 2D arrays), are efficiently supported.

13.1.2 Insights or Next Steps

  • The educational introduction successfully covered the core concepts of NumPy, providing a solid foundation for beginners.
  • Encouraging users to practice and explore the official documentation and advanced topics is crucial for reinforcing learning and promoting deeper understanding.