[PYTHON] How to use numpy

People who heard that python is good for numerical calculation and machine learning, and started python, but it's a little difficult, but C ++ is faster at all. People who have heard the name of numpy but have never used it. I tried using numpy, but the point is an advanced version of the math package? Someone who thinks. For those people, I'll teach you how to use numpy correctly!

numpy installation

If you installed python using anaconda, you probably already have numpy. There is a possibility that numpy is included depending on the installation method. So let's first check if numpy is included. Start the python console by typing python in the terminal (command prompt for windows). In the python console

> import numpy

If there is no error when you type, it is installed. If you get an error like No Module Named numpy, it is not installed and you need to install it. In the terminal (not the python console)

$ pip install numpy

You can install it with.

How to make an array

From here, we will look specifically at programming using numpy, but numpy is

> import numpy as np

Is imported. This means importing the module numpy with the name np.

How to make a one-dimensional array

The basics of numpy start with creating an array. The array with the contents 1,2,3 is

> arr = np.asarray([1,2,3])
> arr
array([1, 2, 3])

You can make it. In addition, you can specify the type of the array by specifying dtype. Frequently used types include np.int32, np.float32, and np.float64. To use this to create an array of type np.int32

> arr = np.asarray([1,2,3], dtype=np.int32)
> arr
array([1, 2, 3], dtype=int32)

will do. To change the type of an array that already exists

> i_arr = np.asarray([1,2,3], dtype=np.int32)
> f_arr = i_arr.astype(np.float32)
> f_arr
array([ 1.,  2.,  3.], dtype=float32)

will do. At this time, the original array ʻi_arr` does not change.

> i_arr
array([1, 2, 3], dtype=int32)

How to make a multidimensional array

To make a multidimensional array

> arr = np.asarray([[1,2,3], [4,5,6]])
> arr
array([[1, 2, 3],
       [4, 5, 6]])

will do. You can specify and change the type as in the case of one-dimensional. The shape element contains the shape of the array.

> arr.shape
(2, 3)

This is a tuple type. By the way, the shape of the one-dimensional array is

> arr = np.asarray([1,2,3])
> arr.shape
(3,)

Will be. This is a tuple type with only one element.

How to make a special array

You can easily create special arrays with numpy.

> #Array with all 0 elements
> np.zeros((2, 3))
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])
> #An array with all 1 elements
> np.ones((2, 3))
array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])
> #Element[0-1)Randomly initialize in the range of
> np.random.rand(2, 3)
array([[ 0.24025569,  0.48947483,  0.61541917],
       [ 0.01197138,  0.6885749 ,  0.48316059]])
> #Generate elements according to a normal distribution
> np.random.randn(2, 3)
array([[ 0.23397941, -1.58230063, -0.46831152],
       [ 1.01000451, -0.21079169,  0.80247674]])

There are many other functions that generate arrays. If you want this kind of arrangement, you may find it by google.

Array calculation

Basic calculation

The power of numpy is that it's very easy to calculate arrays.

> a = np.asarray([[1,2,3],[4,5,6]])

Against

> 3 * a
array([[ 3,  6,  9],
       [12, 15, 18]])

It will be. Scalarizing an array multiplies each element by a constant. If you add a scalar

> 3 + a
array([[4, 5, 6],
       [7, 8, 9]])

And are added to each element. Calculation between arrays

> b = np.asarray([[2,3,4],[5,6,7]])
> a + b
array([[ 3,  5,  7],
       [ 9, 11, 13]])
> a * b
array([[ 2,  6, 12],
       [20, 30, 42]])

In the calculation of arrays of the same shape, the elements at the same position are calculated and the array of that shape is returned. Sometimes arrays of different shapes can be calculated.

> v = np.asarray([2,1,3])
> a * v
array([[ 2,  2,  9],
       [ 8,  5, 18]])
> a + v
array([[3, 3, 6],
       [6, 6, 9]])

In the calculation of a two-dimensional array and a one-dimensional array, the result of calculating each row of the two-dimensional array as a one-dimensional array when the number of columns of the two-dimensional array is the same as the length of the one-dimensional array is I will return. Therefore, the array has the same shape as the two-dimensional array.

It is also possible to perform operations using a two-dimensional array as a single matrix.

> M = np.asarray([[1,2,3], [2,3,4]])
> N = np.asarray([[1,2],[3,4], [5,6]])

To find the product of two arrays of

> M.dot(N)
array([[22, 28],
       [31, 40]])

will do. Here we are multiplying the $ 2 \ times 3 $ matrix by the $ 3 \ times 2 $ matrix, so the $ 2 \ times 2 $ matrix is returned.

Function call

With numpy you can put arrays into various functions. At this time, the function acts on each element. For example

> a = np.asarray([[1,2], [3,1])
> np.log(a)
array([[ 0.        ,  0.69314718],
       [ 1.09861229,  0.        ]])

It will be. At this time, the original array does not change. There are many other possible functions such as trigonometric functions, ʻexp, sqrt`, and so on.

Take statistics

numpy is also good at collecting array statistics. First 100 generate this random number.

> arr = np.random.rand(100)

To average the array

> np.mean(arr)
0.52133315138159586

will do. The maximum and minimum values are

> np.max(arr)
0.98159897843423383
> np.min(arr)
0.031486992721019846

You can get it. The standard deviation is

> np.std(arr)
0.2918171894076691

To get the sum

> np.sum(arr)
52.133315138159588

will do. You can also specify in which direction statistics should be taken for 2D arrays. For example

> arr = np.asarray([[1,2,3], [2,3,4]])
> np.sum(arr, axis=0)
array([3, 5, 7])
> np.sum(arr, axis=1)
array([6, 9])

It will be.

Actually use

Including the above, let's use numpy to calculate the code for averaging the Euclidean distances from the origins of 100 vectors in 3D space.

First, suppose the data array is an array of shape (100, 3), with the first column at $ x $ coordinates, the second column at $ y $ coordinates, and the third column at $ z $ coordinates. here

> data = np.random.randn(100, 3)

Generated as. Euclidean distance

d(x,y,z) = \sqrt{x^2+y^2+z^3}

So, first, square each element.

> squared = data**2

Then sum the lines.

> squared_sum = np.sum(squared, axis=1)

At this time, squared_sum becomes a one-dimensional array. On the other hand, if you take the square root, you can find the Euclidean distance of each point.

> dist = np.sqrt(squared_sum)

If you take the average of this distance

> np.mean(dist)
1.5423905808984208

have become. (Since the data is randomly generated, the result will be slightly different.)

If you run this code without using numpy, you can use the for loop to calculate each of the 100 points, and use the for loop for each point as the dimension increases. Not only does it complicate the code, but it also slows down execution. In this way, the basic idea of numpy is to calculate a large array at once. As a result, numpy can perform complex operations faster than python can.

By the way, this time I calculated one by one for practice, but numpy has a function called np.linalg.norm, and you can easily calculate the Euclidean distance.

Summary

That's all for the basic usage of numpy, but numpy has many more features. For example, the np.where function that finds the index of an element that meets the conditions.

Not limited to numpy, I think that the shortest way to improve is to write and experience python while googled, so please do your best even if you get confused at first!