People who heard that python is good for numerical calculation and machine learning, and started python, but it's a little difficult, but C ++ is faster at all. People who have heard the name of numpy but have never used it. I tried using numpy, but the point is an advanced version of the math package? Someone who thinks. For those people, I'll teach you how to use numpy correctly!
If you installed python using anaconda, you probably already have numpy. There is a possibility that numpy is included depending on the installation method. So let's first check if numpy is included.
Start the python console by typing python
in the terminal (command prompt for windows). In the python console
> import numpy
If there is no error when you type, it is installed.
If you get an error like No Module Named numpy
, it is not installed and you need to install it.
In the terminal (not the python console)
$ pip install numpy
You can install it with.
From here, we will look specifically at programming using numpy, but numpy is
> import numpy as np
Is imported. This means importing the module numpy with the name np.
The basics of numpy start with creating an array. The array with the contents 1,2,3
is
> arr = np.asarray([1,2,3])
> arr
array([1, 2, 3])
You can make it.
In addition, you can specify the type of the array by specifying dtype
. Frequently used types include np.int32
, np.float32
, and np.float64
. To use this to create an array of type np.int32
> arr = np.asarray([1,2,3], dtype=np.int32)
> arr
array([1, 2, 3], dtype=int32)
will do. To change the type of an array that already exists
> i_arr = np.asarray([1,2,3], dtype=np.int32)
> f_arr = i_arr.astype(np.float32)
> f_arr
array([ 1., 2., 3.], dtype=float32)
will do. At this time, the original array ʻi_arr` does not change.
> i_arr
array([1, 2, 3], dtype=int32)
To make a multidimensional array
> arr = np.asarray([[1,2,3], [4,5,6]])
> arr
array([[1, 2, 3],
[4, 5, 6]])
will do. You can specify and change the type as in the case of one-dimensional. The shape
element contains the shape of the array.
> arr.shape
(2, 3)
This is a tuple type. By the way, the shape of the one-dimensional array is
> arr = np.asarray([1,2,3])
> arr.shape
(3,)
Will be. This is a tuple type with only one element.
You can easily create special arrays with numpy.
> #Array with all 0 elements
> np.zeros((2, 3))
array([[ 0., 0., 0.],
[ 0., 0., 0.]])
> #An array with all 1 elements
> np.ones((2, 3))
array([[ 1., 1., 1.],
[ 1., 1., 1.]])
> #Element[0-1)Randomly initialize in the range of
> np.random.rand(2, 3)
array([[ 0.24025569, 0.48947483, 0.61541917],
[ 0.01197138, 0.6885749 , 0.48316059]])
> #Generate elements according to a normal distribution
> np.random.randn(2, 3)
array([[ 0.23397941, -1.58230063, -0.46831152],
[ 1.01000451, -0.21079169, 0.80247674]])
There are many other functions that generate arrays. If you want this kind of arrangement, you may find it by google.
The power of numpy is that it's very easy to calculate arrays.
> a = np.asarray([[1,2,3],[4,5,6]])
Against
> 3 * a
array([[ 3, 6, 9],
[12, 15, 18]])
It will be. Scalarizing an array multiplies each element by a constant. If you add a scalar
> 3 + a
array([[4, 5, 6],
[7, 8, 9]])
And are added to each element. Calculation between arrays
> b = np.asarray([[2,3,4],[5,6,7]])
> a + b
array([[ 3, 5, 7],
[ 9, 11, 13]])
> a * b
array([[ 2, 6, 12],
[20, 30, 42]])
In the calculation of arrays of the same shape, the elements at the same position are calculated and the array of that shape is returned. Sometimes arrays of different shapes can be calculated.
> v = np.asarray([2,1,3])
> a * v
array([[ 2, 2, 9],
[ 8, 5, 18]])
> a + v
array([[3, 3, 6],
[6, 6, 9]])
In the calculation of a two-dimensional array and a one-dimensional array, the result of calculating each row of the two-dimensional array as a one-dimensional array when the number of columns of the two-dimensional array is the same as the length of the one-dimensional array is I will return. Therefore, the array has the same shape as the two-dimensional array.
It is also possible to perform operations using a two-dimensional array as a single matrix.
> M = np.asarray([[1,2,3], [2,3,4]])
> N = np.asarray([[1,2],[3,4], [5,6]])
To find the product of two arrays of
> M.dot(N)
array([[22, 28],
[31, 40]])
will do. Here we are multiplying the $ 2 \ times 3 $ matrix by the $ 3 \ times 2 $ matrix, so the $ 2 \ times 2 $ matrix is returned.
With numpy you can put arrays into various functions. At this time, the function acts on each element. For example
> a = np.asarray([[1,2], [3,1])
> np.log(a)
array([[ 0. , 0.69314718],
[ 1.09861229, 0. ]])
It will be. At this time, the original array does not change.
There are many other possible functions such as trigonometric functions, ʻexp,
sqrt`, and so on.
numpy is also good at collecting array statistics. First 100 generate this random number.
> arr = np.random.rand(100)
To average the array
> np.mean(arr)
0.52133315138159586
will do. The maximum and minimum values are
> np.max(arr)
0.98159897843423383
> np.min(arr)
0.031486992721019846
You can get it. The standard deviation is
> np.std(arr)
0.2918171894076691
To get the sum
> np.sum(arr)
52.133315138159588
will do. You can also specify in which direction statistics should be taken for 2D arrays. For example
> arr = np.asarray([[1,2,3], [2,3,4]])
> np.sum(arr, axis=0)
array([3, 5, 7])
> np.sum(arr, axis=1)
array([6, 9])
It will be.
Including the above, let's use numpy to calculate the code for averaging the Euclidean distances from the origins of 100 vectors in 3D space.
First, suppose the data
array is an array of shape (100, 3)
, with the first column at $ x $ coordinates, the second column at $ y $ coordinates, and the third column at $ z $ coordinates. here
> data = np.random.randn(100, 3)
Generated as. Euclidean distance
d(x,y,z) = \sqrt{x^2+y^2+z^3}
So, first, square each element.
> squared = data**2
Then sum the lines.
> squared_sum = np.sum(squared, axis=1)
At this time, squared_sum
becomes a one-dimensional array. On the other hand, if you take the square root, you can find the Euclidean distance of each point.
> dist = np.sqrt(squared_sum)
If you take the average of this distance
> np.mean(dist)
1.5423905808984208
have become. (Since the data is randomly generated, the result will be slightly different.)
If you run this code without using numpy, you can use the for
loop to calculate each of the 100 points, and use the for
loop for each point as the dimension increases. Not only does it complicate the code, but it also slows down execution. In this way, the basic idea of numpy is to calculate a large array at once. As a result, numpy can perform complex operations faster than python can.
By the way, this time I calculated one by one for practice, but numpy has a function called np.linalg.norm
, and you can easily calculate the Euclidean distance.
That's all for the basic usage of numpy, but numpy has many more features. For example, the np.where
function that finds the index of an element that meets the conditions.
Not limited to numpy, I think that the shortest way to improve is to write and experience python while googled, so please do your best even if you get confused at first!
Recommended Posts