[Python] Various data processing using Numpy arrays

Preparation

Create an array with 11 elements from 0 to 10

arr = np.arange(11)
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

I will calculate using this

Find the square root of each array
np.sqrt(arr)

array([ 0.        ,  1.        ,  1.41421356,  1.73205081,  2.        ,
        2.23606798,  2.44948974,  2.64575131,  2.82842712,  3.        ,
        3.16227766])
Find the value you want to raise the base e of the natural logarithm

The power of e is required

np.exp(arr)

array([  1.00000000e+00,   2.71828183e+00,   7.38905610e+00,
         2.00855369e+01,   5.45981500e+01,   1.48413159e+02,
         4.03428793e+02,   1.09663316e+03,   2.98095799e+03,
         8.10308393e+03,   2.20264658e+04])

A function that returns a random number that follows a normal distribution

10 random returns from a distribution with mean 0 and variance 1

A = np.random.randn(10)
A

array([ 1.58618601,  1.04344798, -1.27389788,  0.96399318, -0.01948978,
        1.74715498, -1.74566889,  0.22554911, -0.89309691,  0.64486646])
Add two normal distributions
B = np.random.randn(10)
B

array([ 0.24123105, -1.68669802,  1.89703691,  0.13287126, -1.77419931,
       -1.1523576 , -0.23598222,  0.03143082,  1.86305367,  0.21272997])
np.add(A,B)

array([ 1.82741706, -0.64325005,  0.62313903,  1.09686444, -1.79368909,
        0.59479738, -1.98165111,  0.25697993,  0.96995677,  0.85759643])
Return the larger of each element

Return the larger of each element of A and B

np.maximum(A,B)

array([ 1.58618601,  1.04344798,  1.89703691,  0.96399318, -0.01948978,
        1.74715498, -0.23598222,  0.22554911,  1.86305367,  0.64486646])

Data processing using Numpy arrays

Preparing to draw the graph

import matplotlib.pyplot as plt
%matplotlib inline

Preparation

Array from -5 to 5 in 0.01 increments

points = np.arange(-5,5,0.01)
dx, dy = np.meshgrid(points,points)

From -5 to 5, it grows in 0.01 increments

dx

array([[-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       ..., 
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99]])

The opposite of dx

dy

array([[-5.  , -5.  , -5.  , ..., -5.  , -5.  , -5.  ],
       [-4.99, -4.99, -4.99, ..., -4.99, -4.99, -4.99],
       [-4.98, -4.98, -4.98, ..., -4.98, -4.98, -4.98],
       ..., 
       [ 4.97,  4.97,  4.97, ...,  4.97,  4.97,  4.97],
       [ 4.98,  4.98,  4.98, ...,  4.98,  4.98,  4.98],
       [ 4.99,  4.99,  4.99, ...,  4.99,  4.99,  4.99]])

Try drawing these

plt.imshow(dx)

You can see that it grows from left to right

スクリーンショット 2016-10-23 10.52.06.png

plt.imshow(dy)

You can see that it grows from top to bottom

スクリーンショット 2016-10-23 10.56.33.png

Try to do complicated calculations

The sum of dx and dy using the trigonometric function sin, respectively.

z = (np.sin(dx) + np.sin(dy))
z

array([[  1.91784855e+00,   1.92063718e+00,   1.92332964e+00, ...,
         -8.07710558e-03,  -5.48108704e-03,  -2.78862876e-03],
       [  1.92063718e+00,   1.92342581e+00,   1.92611827e+00, ...,
         -5.28847682e-03,  -2.69245827e-03,  -5.85087534e-14],
       [  1.92332964e+00,   1.92611827e+00,   1.92881072e+00, ...,
         -2.59601854e-03,  -5.63993297e-14,   2.69245827e-03],
       ..., 
       [ -8.07710558e-03,  -5.28847682e-03,  -2.59601854e-03, ...,
         -1.93400276e+00,  -1.93140674e+00,  -1.92871428e+00],
       [ -5.48108704e-03,  -2.69245827e-03,  -5.63993297e-14, ...,
         -1.93140674e+00,  -1.92881072e+00,  -1.92611827e+00],
       [ -2.78862876e-03,  -5.85087534e-14,   2.69245827e-03, ...,
         -1.92871428e+00,  -1.92611827e+00,  -1.92342581e+00]])

Try to draw

plt.imshow(z)
plt.colorbar()

スクリーンショット 2016-10-23 11.02.04.png

Various data processing using arrays

Prepare an experimental sequence

A = np.array([1,2,3,4])
B = np.array([1000,2000,3000,4000])

Identify the value that meets the conditions

condition = np.array([True, True, False, False])
answer = [(a if cond else b) for a,b,cond in zip(A,B,condition)]
answer

[1,2,3000,4000]

List comprehensions have the drawbacks of being slow and not multidimensional. Increase speed to accommodate multidimensional arrays

answer2 = np.where(condition, A, B)
answer2

array([   1,    2, 3000, 4000])

np.where can also be used for 2D arrays Create a random value taken from a 5x5 standard normal distribution

from numpy.random import randn
arr = randn(5,5)
arr

array([[-1.00937032,  1.23348883,  0.1267633 ,  0.6637059 ,  0.96770594],
       [ 0.29606946, -0.63752513,  0.97016509,  0.42688117, -2.38404912],
       [ 1.0549739 , -0.12309795, -0.22361239,  1.91466958, -0.35711905],
       [ 0.22359192, -1.60330203,  1.23216518, -0.99154743,  0.52558739],
       [-1.11301393,  0.1911824 ,  1.14858049, -0.19331843,  0.42102773]])

Try to return 0 for these values if they are less than 0, otherwise return the original values

np.where(arr < 0, 0, arr)

array([[ 0.        ,  1.23348883,  0.1267633 ,  0.6637059 ,  0.96770594],
       [ 0.29606946,  0.        ,  0.97016509,  0.42688117,  0.        ],
       [ 1.0549739 ,  0.        ,  0.        ,  1.91466958,  0.        ],
       [ 0.22359192,  0.        ,  1.23216518,  0.        ,  0.52558739],
       [ 0.        ,  0.1911824 ,  1.14858049,  0.        ,  0.42102773]])

All negative numbers will be 0. It's very simple to write.

Take a look at other numpy functions

Preparation
arr = np.array([[1,2,3],[4,5,6],[7,8,9]])
arr

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
sum function

Calculation of total value

arr.sum()       
45

The sum function can give support as to which axis to proceed with the calculation. For example, if argument 0 is set, calculation is performed in the row direction, and if argument 1 is set, calculation is performed in the column direction.

arr.sum(0)
array([12, 15, 18])
arr.sum(1)
array([ 6, 15, 24])
mean function

Average value

arr.mean()
5.0
std function

standard deviation

arr.std()
2.5819888974716112
var function

Distributed

arr.var()
6.666666666666667

numpy useful functions any, all

Preparation
bool_arr = np.array([True, False, True])
bool_arr

array([ True, False,  True], dtype=bool)
any function

If even one is true, return True

bool_arr.any()
True
all function

Returns True if everything is true

bool_arr.all()
False

Recommended Posts

[Python] Various data processing using Numpy arrays
[Python] Matrix multiplication processing time using NumPy
Data cleaning using Python
[Python] Sorting Numpy data
Various processing of Python
Process csv data with python (count processing using pandas)
Data analysis using python pandas
Using Python mode in Processing
Data acquisition using python googlemap api
[Python] I tried to get various information using YouTube Data API!
[Python] Chapter 04-06 Various data structures (creating dictionaries)
Periodic execution processing when using tkinter [Python3]
[Python] Speeding up processing using cache tools
Python Application: Data Visualization Part 3: Various Graphs
"Data Science 100 Knock (Structured Data Processing)" Python-007 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-006 Explanation
[Python] Chapter 04-03 Various data structures (multidimensional list)
[Python] Chapter 04-04 Various data structures (see list)
"Data Science 100 Knock (Structured Data Processing)" Python-001 Explanation
Get Youtube data in Python using Youtube Data API
[Python] Chapter 04-02 Various data structures (list manipulation)
"Data Science 100 Knock (Structured Data Processing)" Python-002 Explanation
[Python] Chapter 04-07 Various data structures (dictionary manipulation)
[Python] Data Science 100 Knock (Structured Data Processing) 021 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-005 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-004 Explanation
Video processing using Python + OpenCV on Mac
Creating Google Spreadsheet using Python / Google Data API
[Python] Data Science 100 Knock (Structured Data Processing) 020 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 025 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-003 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 019 Explanation
Sharing list type data between processes by parallel processing using Python Multiprocessing
My Numpy (Python)
VBA user tried using Python / R: Iterative processing
[Python] 100 knocks on data science (structured data processing) 018 Explanation
python image processing
[Python] 100 knocks on data science (structured data processing) 023 Explanation
[Python] Get all comments using Youtube Data API
Data analysis python
[Python] 100 knocks on data science (structured data processing) 030 Explanation
[Python] 100 knocks on data science (structured data processing) 022 Explanation
[Python] Swapping rows and columns in Numpy data
Debug with VS Code using boost python numpy
Start using Python
Python file processing
[Pandas] Basics of processing date data using dt
1. Statistics learned with Python 1-2. Calculation of various statistics (Numpy)
100 language processing knock-20 (using pandas): reading JSON data
#Python basics (#Numpy 1/2)
[Python] 100 knocks on data science (structured data processing) 017 Explanation
# 3 [python3] Various operators
#Python basics (#Numpy 2/2)
[Python] 100 knocks on data science (structured data processing) 026 Explanation
[Python3] Let's analyze data using machine learning! (Regression)
[Python] 100 knocks on data science (structured data processing) 016 Explanation
[Python] 100 knocks on data science (structured data processing) 024 Explanation
Python introductory study-output of sales data using tuples-
Python #Numpy basics
Python Application: Data Handling Part 2: Parsing Various Data Formats
[Python] 100 knocks on data science (structured data processing) 027 Explanation