# Python for Data Analysis Chapter 4

NumPy Basics: Arrays and Vectorized Computation

ndarray N-dimensional array object provided by NumPy Creating dnarrays

#Created from an array
data1 = [6, 7.5, 8, 9]
arr1 = np.array(data1)

#Can also be created in multidimensional arrays
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)

#python range function
np.arange(10)

#Zero vector
np.zeros(10)

#Zero matrix
np.zeros((3, 6))

#Generate without initialization
np.empty((2, 3, 2))

#Dimensional confirmation
arr2.ndim

#Array shape
arr2.shap

#Data type confirmation
arr1.dtype

#Generate by specifying the data type
arr1 = np.array([1, 2, 3], dtype=np.float64)

#Generated from a string
data3 = ["1.1", "2.2", "3.3"]
arr3 = np.array(data3, dtype=float64)


Operations between Arrays and Scalars

#Calculation between arrays is calculation between the same place
arr = np.array([[1, 2, 3], [4, 5, 6]])
"""
In : arr
Out:
array([[1, 2, 3],
[4, 5, 6]])
"""
arr * arr
"""
In : arr * arr
Out:
array([[ 1,  4,  9],
[16, 25, 36]])
"""

#Calculation with scalar is calculated for all elements
arr - 1
"""
In : arr - 1
Out:
array([[0, 1, 2],
[3, 4, 5]])
"""
1 / arr
"""
In : 1 / arr
Out:
array([[1, 0, 0],
[0, 0, 0]])
"""


Basic Indexing and Slicing / Fancy Indexing

|0 1 2
0 0,0 0,1
1 1,0 1,1
2 2,0 2,1

The element specification is the same as the mathematical matrix (row, col) __ If you want a copy of an array slice, if you don't copy it, the slice will change when the original array changes __ arr[5:8].copy()

Boolean Indexing Array masking can be done using bool array

name = np.array(["bob", "martin" ,"feed","max","rosetta","john"])
"""
In : name == "bob"
Out: array([ True, False, False, False, False, False], dtype=bool)
"""
arr = np.arange(6)
"""
In : arr[name=="rosetta"]
Out: array()
"""


Boolean operator & (and) | (or)

#### python


mask = (name=="rosetta") | (name=="martin")
"""
Out: array([False,  True, False, False,  True, False], dtype=bool)
"""


Selection by comparison operator

data = randn(10)
"""
In : data
Out:
array([-0.43930899, -0.18084457,  0.50384496,  0.34177923,  0.4786331 ,
0.0930973 ,  0.95264648,  1.29876589,  0.96616151,  0.69204729])
"""
data[data < 0] = 0
"""
In : data
Out:
array([ 0.        ,  0.        ,  0.50384496,  0.34177923,  0.4786331 ,
0.0930973 ,  0.95264648,  1.29876589,  0.96616151,  0.69204729])
"""


Transposing Arrays and Swapping Axes !! !! !! !! !! difficult! !! !! !! !! I think it's easier to take only what you want with a fancy slice ...

arr = np.arange(15).reshape((3,5))

#Transpose
arr.T

#inner product
np.dot(arr.T, arr)

arr = np.arange(45).reshape((3,5,3))

#Transform by specifying the axis
arr.transpose((1, 0, 2))

#Shaft replacement
arr.swapaxes(1, 2)


Universal Functions: Fast Element-wise Array Functions

### 1 argument function

A function that operates on an elementwise basis. Apply a function to each element of x with np.func (x).

Function Description
abs Absolute value
sqrt x ** 0.5
square x ** 2
exp exp(x)
log, log10, log2 Bottom e, 10,Log at 2(x)
log1p log when x is very small(1+x)
sign Code(1,0,-1)return it
ceil Round up after the decimal point
floor Truncate after the decimal point
rint Round a decimal to a recent integer
modf Decompose a decimal into a decimal part and an integer part
isnan, isinf, isfinite NaN,infinite,Returns a numeric or bool value
logical_not returns a bool value of not x

### 2-argument function

Used in np.func (x1, x2).

Function Description
add, subtract, multiply, divide, power, mod x1 (+, -, *, /, **, %) x2
maximum, minimum With elements at the same position on x1 and x2(large,small)One
copysign x1 * (sign of x2)
greater, greater_equal, less, less_equal, equal, not_equal x1 (>, >=, <, <=, ==, !=) x2
logical_and, logical_or, logical_xor x1 (&,丨, ^) x2

Data Processing Using Arrays Visualize 2D data. As an example, display the grid on which sqrt (x ^ 2, y ^ 2) is calculated.

#Create 1000 points
points = np.arange(-5, 5, 0.01)
#Create a 2D mesh
#x is a two-dimensional array with an array of x in rows and y is an array of y in columns
xs, ys = np.meshgrid(points, points)
#Calculation
z = np.sqrt(xs ** 2 + ys ** 2)
#display
plt.imshow(z, cmap=plt.cm.gray); plt.colorbar()
plt.title("Image plot of $\sqrt{x^2 + y^2}$ for a grid of values") Expressing Conditional Logic as Array Operations np.where is a function that returns either the second or third argument depending on the value of the first argument. That is, np.where (cond, xarr, yarr) = [(x if c else y) for x, y, c in zip (xarr, yarr, cond)]

arr = randn(5, 5)
"""
In : arr
Out:
array([[-0.63774199, -0.76558645, -0.46003378,  0.61095653,  0.78277454],
[ 0.25332127,  0.50226145, -1.45706102,  1.14315867,  0.28015   ],
[-0.76326506,  0.33218657, -0.18509161, -0.3410194 , -0.29194451],
[-0.32247669, -0.64285987, -0.61059921, -0.38261289,  0.41530912],
[-1.7341384 ,  1.39960857,  0.78411537,  0.25922757, -0.22972615]])
"""
arrtf = np.where(arr > 0, True, False)
"""
In : arrtf
Out:
array([[False, False, False,  True,  True],
[ True,  True, False,  True,  True],
[False,  True, False, False, False],
[False, False, False, False,  True],
[False,  True,  True,  True, False]], dtype=bool)
"""


By combining these, it is possible to classify by multiple conditions.

cond1 = np.where(randn(10) > 0, True, False)
cond2 = np.where(randn(10) > 0, True, False)
"""
In : cond1
Out: array([False,  True, False, False,  True,  True,  True,  True,  True,  True], dtype=bool)

In : cond2
Out: array([False, False, False, False, False,  True, False,  True,  True,  True], dtype=bool)
"""
result = np.where(cond1 & cond2, 0, np.where(cond1, 1, np.where(cond2, 2, 3)))
"""
In : result
Out: array([3, 1, 3, 3, 1, 0, 1, 0, 0, 0])
"""


You can also rewrite if and else.

result = []
for i in range(n):
if cond1[i] and cond2[i]:
result.append(0)
elif cond1[i]:
result.append(1)
elif cond2[i]:
result.append(2)
else:
result.append(3)


It is also possible with mathematical formulas. (Note that 0 and 3 are interchanged with the others) result = 1*cond1 + 2*cond2

Mathematical and Statistical Methods Statistical functions are also available.

arr = randn(5, 4)
arr.mean()
#Axis can also be specified
arr.mean(0)
arr.mean(1)
"""
In : arr.mean()
Out: 0.51585861805229682

In : arr.mean(0)
Out: array([ 0.65067115, -0.03856606,  1.06405353,  0.38727585])

In : arr.mean(1)
Out: array([ 1.18400902,  0.84203136,  0.50352006,  0.07445734, -0.0247247 ])
"""

• sum
• mean
• std, var
• min, max
• argmin, argmax (returns maximum / minimum index)
• cumsum (progressive total)
• cumprod (cumulative total)

Methods for Boolean Arrays Since the Boolean type True is counted as 1 and False is counted as 0, counting by the sum function is often used.

arr = randn(100)
sumnum = (arr > 0).sum()
"""
In : sumnum
Out: 43
"""


Other Boolean functions

• any (True if there is even one True)
• all (True if all are True)

Sorting You can also sort. arr.sort()

Unique and Other Set Logic You can also use something like a genuine Python set function.

• unique(x)
• intersect1d(x, y)（unique(x) & unique(y)）
• union1d(x, y)（unique(x) | unique(y)）
• in1d (x, y) (returns an array of Boolean values if the element of y is contained in x)
• setdiff1d (x, y) (value of x not in y)
• setxor1d (x, y) (value of x not in y & value of y not in x)

File Input and Output with Arrays You can save the NumPy array object to an external file. Of course, you can also load and restore saved files.

arr = np.arange(10)

#Save in binary format
np.save("array_name", arr)
#Load binary format file
#Save multiple arrays as zip
np.savez("array_archive.npz", a=arr, b=arr)
#Load multiple array zip

#Save in csv format
np.savetxt("array_ex.txt", arr, delimiter=",")
#Read csv format file
arr = np.loadtxt("array_ex.txt", delimiter=",")


Linear Algebra You can also calculate linear algebra.

Function Description
diag Extract diagonal elements
dot inner product
trace Sum of diagonal elements
det Determinant
eig Decompose into eigenvalues and eigenvectors
inv Transpose
pinv Moore-Penrose's reciprocal
qr QR decomposition
svd SVD decomposition
solve When A is a square matrix Ax=Find x in b
stsq Calculate least squares solution

Random Number Generation Random values of various distributions can be obtained at high speed.

Function Description
seed Random generation by seed value
permutation Randomly sort the elements of the sequence
shuffle Randomly sort the elements of the sequence
rand Generate a random array of the number of dimensions passed as an argument
randint Generate a random integer array of the number of dimensions passed as an argument
binomial Random sampling from the binomial distribution
normal Random sampling from normal distribution
beta Random sampling from beta distribution
chisquare chi-Random sampling from square distribution
gamma Random sampling from gamma distribution
uniform Random sampling from the normal distribution in a given range

Example: Random Walks Run the following in ipython

nsteps = 1000
draws = np.random.randint(0, 2, size=nsteps)
steps = np.where(draws > 0, 1, -1)
walk = steps.cumsum()
plt.plot(walk) Simulating Many Random Walks at Once

nwalks = 100
nsteps = 1000
draws = np.random.randint(0, 2, size=(nwalks, nsteps))
steps = np.where(draws > 0, 1, -1)
walks = steps.cumsum(1)
plt.plot(walks) Expansion It doesn't look like a very high quality random value, but it should be quite high quality because it actually uses the Mersenne Twister.