[PYTHON] NumPy array manipulation (3)

Continuing from Last time, the story of NumPy continues without sexual discipline.

Why use NumPy in the first place

It's called data analysis, but what is data in the first place? Data is a collection of numbers. Looking around us, the total number of books on the desk, the plates on the table, and the apples lined up at the greengrocers is finite. For example, there are 14 books, 22 plates, 56 apples in total, and so on.

Such a finite and small number of things can be combined into one at any time if necessary. The collection of things in the world we live and experience is thus made up of a finite number of things.

A set of things is a set. When the total number of objects being considered is small, the recognition of each object and the recognition made by the entire set are not so different. However, as the number of objects included in one concept increases, the way of recognizing the whole as one becomes different. For example, in terms of natural numbers, there is a difference between recognizing each natural number and recognizing the entire natural number as a symbol. The way of recognizing this whole as one is the basis of set theory.

Now let's consider set theory and linear algebra. Vector space in linear algebra refers to the mathematical structure of a collection of elements called a vector. Roughly speaking, linear algebra is the mathematics of vectors and matrices, but in order to handle these vectors, matrices, and multidimensional arrays consisting of them, it is essential to support a dedicated library. Therefore, mastering NumPy also means mastering the structure and manipulation of the actual data to be analyzed.

Save ndarray object

The np.save and np.load functions can input and output ndarray objects as files. In addition, np.savetxt and np.loadtxt make the file in text format.

Also, if you can use pandas, you can use higher-order read_csv, read_table, write_csv, write_table functions.

arr = np.arange(10)
#=> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

np.save('hoge', arr) #Save with hoge, extension is.npy
arr2 = np.load('hoge.npy') #Load the saved object
arr2
#=> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) #Successful restoration

np.savetxt('fuga.txt', arr) #Save as text
arr3 = np.loadtxt('fuga.txt') #Read from text
arr3
#=> array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])

Matrix calculation

As mentioned above, the calculation of linear algebra is an important position in array calculations such as NumPy.

The reshape function is a very useful function when generating multidimensional arrays.

x = np.array([1,2,3,4,5,6]).reshape([2,3])
#=> array([[1, 2, 3],
#          [4, 5, 6]])

y = np.array([6,23,-1,7,8,9]).reshape([3,2])
#=> array([[ 6, 23],
#          [-1,  7],
#          [ 8,  9]])

x.dot(y) #Find the inner product
#=> array([[ 28,  64],
#          [ 67, 181]])
from numpy.linalg import inv, qr, pinv, eig

x = np.random.randn(5,5)
x.T #Transpose
#=> array([[ 0.1797343 , -1.48685211,  1.89995885, -1.48818535,  0.22707072],
#          [ 0.16362348,  0.73820851,  0.6830228 , -0.28744869,  1.60110706],
#          [-0.25212006, -0.75832623,  1.08510935,  0.36069392, -0.25172285],
#          [-1.23742215, -0.27616976,  1.09778477, -0.79290683,  1.88819678],
#          [ 1.25424329, -0.44571606, -0.37970879,  0.25329534, -0.0571783 ]])

mat = x.T.dot(x)
mat #Transpose and find inner product
#=> array([[ 8.119134  ,  1.02085855,  2.54992915,  3.88270879, -0.22322061],
#          [ 1.02085855,  3.68441518, -0.36661744,  3.59459509, -0.54751549],
#          [ 2.54992915, -0.36661744,  2.00954999,  0.95132328, -0.28449209],
#          [ 3.88270879,  3.59459509,  0.95132328,  7.00660306, -2.15457716],
#          [-0.22322061, -0.54751549, -0.28449209, -2.15457716,  1.98339569]])

inv(mat) #Returns the inverse matrix for a square matrix
#=> array([[ 0.34294894,  0.13024165, -0.30841121, -0.30883099, -0.30517266],
#          [ 0.13024165,  0.87379103,  0.25904943, -0.69902729, -0.46633355],
#          [-0.30841121,  0.25904943,  0.98083558, -0.06094767,  0.11128055],
#          [-0.30883099, -0.69902729, -0.06094767,  0.91304207,  0.75537864],
#          [-0.30517266, -0.46633355,  0.11128055,  0.75537864,  1.17764414]])

q, r = qr(mat) #QR disassemble
q
#=> array([[-0.86261627,  0.28238894,  0.35807769,  0.08420784, -0.20208678],
#          [-0.10846098, -0.74407004,  0.01429817,  0.58226198, -0.30880827],
#          [-0.27091687,  0.2393467 , -0.87410947,  0.31594231,  0.0736905 ],
#          [-0.41251786, -0.54595751, -0.16208726, -0.50524429,  0.50021529],
#          [ 0.02371604,  0.10611234,  0.28501978,  0.54661567,  0.7798415 ]])

r
#=> array([[-9.41221986, -2.67672118, -3.10345255, -6.93833754,  1.26485136],
#          [ 0.        , -4.56152677,  0.92426978, -5.40443517,  1.66303293],
#          [ 0.        , -0.        , -1.08401917, -1.13963141,  1.07545497],
#          [ 0.        ,  0.        ,  0.        , -1.997258  ,  1.7452655 ],
#          [-0.        , -0.        , -0.        , -0.        ,  0.66220471]])

pinv(mat) #Returns the Moore Penrose reciprocal
#=> array([[ 0.34294894,  0.13024165, -0.30841121, -0.30883099, -0.30517266],
#          [ 0.13024165,  0.87379103,  0.25904943, -0.69902729, -0.46633355],
#          [-0.30841121,  0.25904943,  0.98083558, -0.06094767,  0.11128055],
#          [-0.30883099, -0.69902729, -0.06094767,  0.91304207,  0.75537864],
#          [-0.30517266, -0.46633355,  0.11128055,  0.75537864,  1.17764414]])

np.trace(mat) #Returns the sum of diagonal components
#=> 22.80309791710043

eig(mat) #Returns eigenvalues and eigenvectors for a square matrix
#=> (array([ 13.3600683 ,   5.95602662,   2.24791381,   0.81881059,   0.4202786 ]),
#    array([[-0.64467006, -0.63086541,  0.20659265,  0.31642236, -0.20881982],
#          [-0.31183983,  0.5014887 ,  0.55973159, -0.32086736, -0.48477798],
#          [-0.19333292, -0.33433927, -0.32170008, -0.86424035, -0.02091191],
#          [-0.65225524,  0.42536152, -0.20774674,  0.04440243,  0.59033922],
#          [ 0.15601897, -0.24042203,  0.70524491, -0.21917587,  0.61028427]]))

Summary

This time, I explained the frequently occurring functions of linear algebra functions, which are especially important. This is the basic part of scientific calculation, so let's study it well.

Recommended Posts

NumPy array manipulation (3)
NumPy array manipulation (1)
python numpy array calculation
Create a python numpy array
Multidimensional array calculation without Numpy
About Numpy array and asarray
Subscript access to python numpy array
Multidimensional array calculation without Numpy Part 2
Python application: Numpy Part 3: Double array
Extract multiple elements with Numpy array
Invert numpy Boolean array in tilde
numpy practice 1
Numpy [Basic]
numpy part 1
NumPy basics
Numpy Memorandum_Matrix
numpy tips
Python / numpy> list (numpy array) file save / load
About numpy
NumPy axis
Use Numpy
numpy part 2
Differences between Numpy 1D array [x] and 2D array [x, 1]
Add rows to an empty array with numpy
The shape of the one-dimensional array of numpy was complicated
[Python numpy] Dynamically specify the index of the array
Calculation speed of indexing for numpy quadratic array