Python for Data Analysis Chapter 4

NumPy Basics: Arrays and Vectorized Computation

ndarray N-dimensional array object provided by NumPy Creating dnarrays

#Created from an array
data1 = [6, 7.5, 8, 9]
arr1 = np.array(data1)

#Can also be created in multidimensional arrays
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)

#python range function

#Zero vector

#Zero matrix
np.zeros((3, 6))

#Generate without initialization
np.empty((2, 3, 2))

#Dimensional confirmation

#Array shape

#Data type confirmation

#Generate by specifying the data type
arr1 = np.array([1, 2, 3], dtype=np.float64)

#Generated from a string
data3 = ["1.1", "2.2", "3.3"]
arr3 = np.array(data3, dtype=float64)

Operations between Arrays and Scalars

#Calculation between arrays is calculation between the same place
arr = np.array([[1, 2, 3], [4, 5, 6]])
In [32]: arr
array([[1, 2, 3],
       [4, 5, 6]])
arr * arr
In [33]: arr * arr
array([[ 1,  4,  9],
       [16, 25, 36]])

#Calculation with scalar is calculated for all elements
arr - 1
In [34]: arr - 1
array([[0, 1, 2],
       [3, 4, 5]])
1 / arr
In [35]: 1 / arr
array([[1, 0, 0],
       [0, 0, 0]])

Basic Indexing and Slicing / Fancy Indexing

|0 1 2
0 0,0 0,1
1 1,0 1,1
2 2,0 2,1

The element specification is the same as the mathematical matrix (row, col)

__ If you want a copy of an array slice, if you don't copy it, the slice will change when the original array changes __ arr[5:8].copy()

Boolean Indexing Array masking can be done using bool array

name = np.array(["bob", "martin" ,"feed","max","rosetta","john"])
In [63]: name == "bob"
Out[63]: array([ True, False, False, False, False, False], dtype=bool)
arr = np.arange(6)
In [68]: arr[name=="rosetta"]
Out[68]: array([4])

Boolean operator & (and) | (or)


mask = (name=="rosetta") | (name=="martin")
In [72]: mask
Out[72]: array([False,  True, False, False,  True, False], dtype=bool)

Selection by comparison operator

data = randn(10)
In [78]: data
array([-0.43930899, -0.18084457,  0.50384496,  0.34177923,  0.4786331 ,
        0.0930973 ,  0.95264648,  1.29876589,  0.96616151,  0.69204729])
data[data < 0] = 0
In [80]: data
array([ 0.        ,  0.        ,  0.50384496,  0.34177923,  0.4786331 ,
        0.0930973 ,  0.95264648,  1.29876589,  0.96616151,  0.69204729])

Transposing Arrays and Swapping Axes !! !! !! !! !! difficult! !! !! !! !! I think it's easier to take only what you want with a fancy slice ...

arr = np.arange(15).reshape((3,5))


#inner product, arr)

arr = np.arange(45).reshape((3,5,3))

#Transform by specifying the axis
arr.transpose((1, 0, 2))

#Shaft replacement
arr.swapaxes(1, 2)

Universal Functions: Fast Element-wise Array Functions

1 argument function

A function that operates on an elementwise basis. Apply a function to each element of x with np.func (x).

Function Description
abs Absolute value
sqrt x ** 0.5
square x ** 2
exp exp(x)
log, log10, log2 Bottom e, 10,Log at 2(x)
log1p log when x is very small(1+x)
sign Code(1,0,-1)return it
ceil Round up after the decimal point
floor Truncate after the decimal point
rint Round a decimal to a recent integer
modf Decompose a decimal into a decimal part and an integer part
isnan, isinf, isfinite NaN,infinite,Returns a numeric or bool value
logical_not returns a bool value of not x

2-argument function

Used in np.func (x1, x2).

Function Description
add, subtract, multiply, divide, power, mod x1 (+, -, *, /, **, %) x2
maximum, minimum With elements at the same position on x1 and x2(large,small)One
copysign x1 * (sign of x2)
greater, greater_equal, less, less_equal, equal, not_equal x1 (>, >=, <, <=, ==, !=) x2
logical_and, logical_or, logical_xor x1 (&,丨, ^) x2

Data Processing Using Arrays Visualize 2D data. As an example, display the grid on which sqrt (x ^ 2, y ^ 2) is calculated.

#Create 1000 points
points = np.arange(-5, 5, 0.01)
#Create a 2D mesh
#x is a two-dimensional array with an array of x in rows and y is an array of y in columns
xs, ys = np.meshgrid(points, points)
z = np.sqrt(xs ** 2 + ys ** 2)
plt.imshow(z,; plt.colorbar()
plt.title("Image plot of $\sqrt{x^2 + y^2}$ for a grid of values")

Expressing Conditional Logic as Array Operations np.where is a function that returns either the second or third argument depending on the value of the first argument. That is, np.where (cond, xarr, yarr) = [(x if c else y) for x, y, c in zip (xarr, yarr, cond)]

arr = randn(5, 5)
In [5]: arr
array([[-0.63774199, -0.76558645, -0.46003378,  0.61095653,  0.78277454],
       [ 0.25332127,  0.50226145, -1.45706102,  1.14315867,  0.28015   ],
       [-0.76326506,  0.33218657, -0.18509161, -0.3410194 , -0.29194451],
       [-0.32247669, -0.64285987, -0.61059921, -0.38261289,  0.41530912],
       [-1.7341384 ,  1.39960857,  0.78411537,  0.25922757, -0.22972615]])
arrtf = np.where(arr > 0, True, False)
In [6]: arrtf
array([[False, False, False,  True,  True],
       [ True,  True, False,  True,  True],
       [False,  True, False, False, False],
       [False, False, False, False,  True],
       [False,  True,  True,  True, False]], dtype=bool)

By combining these, it is possible to classify by multiple conditions.

cond1 = np.where(randn(10) > 0, True, False)
cond2 = np.where(randn(10) > 0, True, False)
In [16]: cond1
Out[16]: array([False,  True, False, False,  True,  True,  True,  True,  True,  True], dtype=bool)

In [17]: cond2
Out[17]: array([False, False, False, False, False,  True, False,  True,  True,  True], dtype=bool)
result = np.where(cond1 & cond2, 0, np.where(cond1, 1, np.where(cond2, 2, 3)))
In [19]: result
Out[19]: array([3, 1, 3, 3, 1, 0, 1, 0, 0, 0])

You can also rewrite if and else.

result = []
for i in range(n):
    if cond1[i] and cond2[i]:
    elif cond1[i]:
    elif cond2[i]:

It is also possible with mathematical formulas. (Note that 0 and 3 are interchanged with the others) result = 1*cond1 + 2*cond2

Mathematical and Statistical Methods Statistical functions are also available.

arr = randn(5, 4)
#Axis can also be specified
In [60]: arr.mean()
Out[60]: 0.51585861805229682

In [62]: arr.mean(0)
Out[62]: array([ 0.65067115, -0.03856606,  1.06405353,  0.38727585])

In [63]: arr.mean(1)
Out[63]: array([ 1.18400902,  0.84203136,  0.50352006,  0.07445734, -0.0247247 ])

Methods for Boolean Arrays Since the Boolean type True is counted as 1 and False is counted as 0, counting by the sum function is often used.

arr = randn(100)
sumnum = (arr > 0).sum()
In [75]: sumnum
Out[75]: 43

Other Boolean functions

Sorting You can also sort. arr.sort()

Unique and Other Set Logic You can also use something like a genuine Python set function.

File Input and Output with Arrays You can save the NumPy array object to an external file. Of course, you can also load and restore saved files.

arr = np.arange(10)

#Save in binary format"array_name", arr)
#Load binary format file
arr = np.load("array_name.npy")
#Save multiple arrays as zip
np.savez("array_archive.npz", a=arr, b=arr)
#Load multiple array zip
arr_a = np.load("array_archive.npz")["a"]
arr_b = np.load("array_archive.npz")["b"]

#Save in csv format
np.savetxt("array_ex.txt", arr, delimiter=",")
#Read csv format file
arr = np.loadtxt("array_ex.txt", delimiter=",")

Linear Algebra You can also calculate linear algebra.

Function Description
diag Extract diagonal elements
dot inner product
trace Sum of diagonal elements
det Determinant
eig Decompose into eigenvalues and eigenvectors
inv Transpose
pinv Moore-Penrose's reciprocal
qr QR decomposition
svd SVD decomposition
solve When A is a square matrix Ax=Find x in b
stsq Calculate least squares solution

Random Number Generation Random values of various distributions can be obtained at high speed.

Function Description
seed Random generation by seed value
permutation Randomly sort the elements of the sequence
shuffle Randomly sort the elements of the sequence
rand Generate a random array of the number of dimensions passed as an argument
randint Generate a random integer array of the number of dimensions passed as an argument
binomial Random sampling from the binomial distribution
normal Random sampling from normal distribution
beta Random sampling from beta distribution
chisquare chi-Random sampling from square distribution
gamma Random sampling from gamma distribution
uniform Random sampling from the normal distribution in a given range

Example: Random Walks Run the following in ipython

nsteps = 1000
draws = np.random.randint(0, 2, size=nsteps)
steps = np.where(draws > 0, 1, -1)
walk = steps.cumsum()

Simulating Many Random Walks at Once

nwalks = 100
nsteps = 1000
draws = np.random.randint(0, 2, size=(nwalks, nsteps))
steps = np.where(draws > 0, 1, -1)
walks = steps.cumsum(1)

Expansion It doesn't look like a very high quality random value, but it should be quite high quality because it actually uses the Mersenne Twister.

Recommended Posts

Python for Data Analysis Chapter 4
Python for Data Analysis Chapter 2
Python for Data Analysis Chapter 3
Data analysis python
Preprocessing template for data analysis (Python)
Data analysis with python 2
Python visualization tool for data analysis work
Data analysis using Python 0
Data analysis overview python
Python data analysis template
Data analysis with Python
[CovsirPhy] COVID-19 Python Package for Data Analysis: Data loading
[Python] Notes on data analysis
Python data analysis learning notes
Data analysis using python pandas
Tips for data analysis ・ Notes
Data analysis for improving POG 1 ~ Web scraping with Python ~
[For beginners] How to study Python3 data analysis exam
[CovsirPhy] COVID-19 Python package for data analysis: SIR-F model
[CovsirPhy] COVID-19 Python package for data analysis: S-R trend analysis
[CovsirPhy] COVID-19 Python Package for Data Analysis: SIR model
[CovsirPhy] COVID-19 Python Package for Data Analysis: Parameter estimation
[Technical book] Introduction to data analysis using Python -1 Chapter Introduction-
Python: Time Series Analysis: Preprocessing Time Series Data
Python course for data science_useful techniques
Data analysis for improving POG 3-Regression analysis-
Data formatting for Python / color plots
Data analysis starting with python (data visualization 1)
Data analysis starting with python (data visualization 2)
[CovsirPhy] COVID-19 Python Package for Data Analysis: Scenario Analysis (Parameter Comparison)
[Understand in the shortest time] Python basics for data analysis
Which should I study, R or Python, for data analysis?
Python learning memo for machine learning by Chainer Chapter 7 Regression analysis
<Python> Build a dedicated server for Jupyter Notebook data analysis
2016-10-30 else for Python3> for:
[Python] Chapter 04-06 Various data structures (creating dictionaries)
python [for myself]
Data analysis Titanic 2
[Introduction to Python3, Day 17] Chapter 8 Data Destinations (8.1-8.2.5)
[Introduction to Python3, Day 17] Chapter 8 Data Destinations (8.3-
[Python] Chapter 04-03 Various data structures (multidimensional list)
[Python] Chapter 04-04 Various data structures (see list)
[Introduction to Python3 Day 19] Chapter 8 Data Destinations (8.4-8.5)
Detailed Python techniques required for data shaping (1)
[Introduction to Python3 Day 18] Chapter 8 Data Destinations ( to
Data analysis Titanic 1
[Python] First data analysis / machine learning (Kaggle)
[Python] Chapter 04-02 Various data structures (list manipulation)
[Python] Chapter 04-07 Various data structures (dictionary manipulation)
Data analysis starting with python (data preprocessing-machine learning)
Data analysis Titanic 3
How to use "deque" for Python data
Detailed Python techniques required for data shaping (2)
I did Python data analysis training remotely
Python 3 Engineer Certified Data Analysis Exam Preparation
JupyterLab Basic Setting 2 (pip) for data analysis
Create a USB boot Ubuntu with a Python environment for data analysis
JupyterLab Basic Setup for Data Analysis (pip)
[python] Read data
Analysis for Data Scientists: Qiita Self-Article Summary 2020
A summary of Python e-books that are useful for free-to-read data analysis