[Python] Various data processing using Numpy arrays


Create an array with 11 elements from 0 to 10

arr = np.arange(11)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

I will calculate using this

Find the square root of each array

array([ 0.        ,  1.        ,  1.41421356,  1.73205081,  2.        ,
        2.23606798,  2.44948974,  2.64575131,  2.82842712,  3.        ,
Find the value you want to raise the base e of the natural logarithm

The power of e is required


array([  1.00000000e+00,   2.71828183e+00,   7.38905610e+00,
         2.00855369e+01,   5.45981500e+01,   1.48413159e+02,
         4.03428793e+02,   1.09663316e+03,   2.98095799e+03,
         8.10308393e+03,   2.20264658e+04])

A function that returns a random number that follows a normal distribution

10 random returns from a distribution with mean 0 and variance 1

A = np.random.randn(10)

array([ 1.58618601,  1.04344798, -1.27389788,  0.96399318, -0.01948978,
        1.74715498, -1.74566889,  0.22554911, -0.89309691,  0.64486646])
Add two normal distributions
B = np.random.randn(10)

array([ 0.24123105, -1.68669802,  1.89703691,  0.13287126, -1.77419931,
       -1.1523576 , -0.23598222,  0.03143082,  1.86305367,  0.21272997])

array([ 1.82741706, -0.64325005,  0.62313903,  1.09686444, -1.79368909,
        0.59479738, -1.98165111,  0.25697993,  0.96995677,  0.85759643])
Return the larger of each element

Return the larger of each element of A and B


array([ 1.58618601,  1.04344798,  1.89703691,  0.96399318, -0.01948978,
        1.74715498, -0.23598222,  0.22554911,  1.86305367,  0.64486646])

Data processing using Numpy arrays

Preparing to draw the graph

import matplotlib.pyplot as plt
%matplotlib inline


Array from -5 to 5 in 0.01 increments

points = np.arange(-5,5,0.01)
dx, dy = np.meshgrid(points,points)

From -5 to 5, it grows in 0.01 increments


array([[-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99]])

The opposite of dx


array([[-5.  , -5.  , -5.  , ..., -5.  , -5.  , -5.  ],
       [-4.99, -4.99, -4.99, ..., -4.99, -4.99, -4.99],
       [-4.98, -4.98, -4.98, ..., -4.98, -4.98, -4.98],
       [ 4.97,  4.97,  4.97, ...,  4.97,  4.97,  4.97],
       [ 4.98,  4.98,  4.98, ...,  4.98,  4.98,  4.98],
       [ 4.99,  4.99,  4.99, ...,  4.99,  4.99,  4.99]])

Try drawing these


You can see that it grows from left to right

スクリーンショット 2016-10-23 10.52.06.png


You can see that it grows from top to bottom

スクリーンショット 2016-10-23 10.56.33.png

Try to do complicated calculations

The sum of dx and dy using the trigonometric function sin, respectively.

z = (np.sin(dx) + np.sin(dy))

array([[  1.91784855e+00,   1.92063718e+00,   1.92332964e+00, ...,
         -8.07710558e-03,  -5.48108704e-03,  -2.78862876e-03],
       [  1.92063718e+00,   1.92342581e+00,   1.92611827e+00, ...,
         -5.28847682e-03,  -2.69245827e-03,  -5.85087534e-14],
       [  1.92332964e+00,   1.92611827e+00,   1.92881072e+00, ...,
         -2.59601854e-03,  -5.63993297e-14,   2.69245827e-03],
       [ -8.07710558e-03,  -5.28847682e-03,  -2.59601854e-03, ...,
         -1.93400276e+00,  -1.93140674e+00,  -1.92871428e+00],
       [ -5.48108704e-03,  -2.69245827e-03,  -5.63993297e-14, ...,
         -1.93140674e+00,  -1.92881072e+00,  -1.92611827e+00],
       [ -2.78862876e-03,  -5.85087534e-14,   2.69245827e-03, ...,
         -1.92871428e+00,  -1.92611827e+00,  -1.92342581e+00]])

Try to draw


スクリーンショット 2016-10-23 11.02.04.png

Various data processing using arrays

Prepare an experimental sequence

A = np.array([1,2,3,4])
B = np.array([1000,2000,3000,4000])

Identify the value that meets the conditions

condition = np.array([True, True, False, False])
answer = [(a if cond else b) for a,b,cond in zip(A,B,condition)]


List comprehensions have the drawbacks of being slow and not multidimensional. Increase speed to accommodate multidimensional arrays

answer2 = np.where(condition, A, B)

array([   1,    2, 3000, 4000])

np.where can also be used for 2D arrays Create a random value taken from a 5x5 standard normal distribution

from numpy.random import randn
arr = randn(5,5)

array([[-1.00937032,  1.23348883,  0.1267633 ,  0.6637059 ,  0.96770594],
       [ 0.29606946, -0.63752513,  0.97016509,  0.42688117, -2.38404912],
       [ 1.0549739 , -0.12309795, -0.22361239,  1.91466958, -0.35711905],
       [ 0.22359192, -1.60330203,  1.23216518, -0.99154743,  0.52558739],
       [-1.11301393,  0.1911824 ,  1.14858049, -0.19331843,  0.42102773]])

Try to return 0 for these values if they are less than 0, otherwise return the original values

np.where(arr < 0, 0, arr)

array([[ 0.        ,  1.23348883,  0.1267633 ,  0.6637059 ,  0.96770594],
       [ 0.29606946,  0.        ,  0.97016509,  0.42688117,  0.        ],
       [ 1.0549739 ,  0.        ,  0.        ,  1.91466958,  0.        ],
       [ 0.22359192,  0.        ,  1.23216518,  0.        ,  0.52558739],
       [ 0.        ,  0.1911824 ,  1.14858049,  0.        ,  0.42102773]])

All negative numbers will be 0. It's very simple to write.

Take a look at other numpy functions

arr = np.array([[1,2,3],[4,5,6],[7,8,9]])

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
sum function

Calculation of total value


The sum function can give support as to which axis to proceed with the calculation. For example, if argument 0 is set, calculation is performed in the row direction, and if argument 1 is set, calculation is performed in the column direction.

array([12, 15, 18])
array([ 6, 15, 24])
mean function

Average value

std function

standard deviation

var function



numpy useful functions any, all

bool_arr = np.array([True, False, True])

array([ True, False,  True], dtype=bool)
any function

If even one is true, return True

all function

Returns True if everything is true


Recommended Posts

[Python] Various data processing using Numpy arrays
[Python] Matrix multiplication processing time using NumPy
Data cleaning using Python
[Python] Sorting Numpy data
Various processing of Python
Process csv data with python (count processing using pandas)
Data analysis using python pandas
Using Python mode in Processing
Data acquisition using python googlemap api
[Python] I tried to get various information using YouTube Data API!
[Python] Chapter 04-06 Various data structures (creating dictionaries)
Periodic execution processing when using tkinter [Python3]
[Python] Speeding up processing using cache tools
Python Application: Data Visualization Part 3: Various Graphs
"Data Science 100 Knock (Structured Data Processing)" Python-007 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-006 Explanation
[Python] Chapter 04-03 Various data structures (multidimensional list)
[Python] Chapter 04-04 Various data structures (see list)
"Data Science 100 Knock (Structured Data Processing)" Python-001 Explanation
Get Youtube data in Python using Youtube Data API
[Python] Chapter 04-02 Various data structures (list manipulation)
"Data Science 100 Knock (Structured Data Processing)" Python-002 Explanation
[Python] Chapter 04-07 Various data structures (dictionary manipulation)
[Python] Data Science 100 Knock (Structured Data Processing) 021 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-005 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-004 Explanation
Video processing using Python + OpenCV on Mac
Creating Google Spreadsheet using Python / Google Data API
[Python] Data Science 100 Knock (Structured Data Processing) 020 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 025 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-003 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 019 Explanation
Sharing list type data between processes by parallel processing using Python Multiprocessing
My Numpy (Python)
VBA user tried using Python / R: Iterative processing
[Python] 100 knocks on data science (structured data processing) 018 Explanation
python image processing
[Python] 100 knocks on data science (structured data processing) 023 Explanation
[Python] Get all comments using Youtube Data API
Data analysis python
[Python] 100 knocks on data science (structured data processing) 030 Explanation
[Python] 100 knocks on data science (structured data processing) 022 Explanation
[Python] Swapping rows and columns in Numpy data
Debug with VS Code using boost python numpy
Start using Python
Python file processing
[Pandas] Basics of processing date data using dt
1. Statistics learned with Python 1-2. Calculation of various statistics (Numpy)
100 language processing knock-20 (using pandas): reading JSON data
#Python basics (#Numpy 1/2)
[Python] 100 knocks on data science (structured data processing) 017 Explanation
# 3 [python3] Various operators
#Python basics (#Numpy 2/2)
[Python] 100 knocks on data science (structured data processing) 026 Explanation
[Python3] Let's analyze data using machine learning! (Regression)
[Python] 100 knocks on data science (structured data processing) 016 Explanation
[Python] 100 knocks on data science (structured data processing) 024 Explanation
Python introductory study-output of sales data using tuples-
Python #Numpy basics
Python Application: Data Handling Part 2: Parsing Various Data Formats
[Python] 100 knocks on data science (structured data processing) 027 Explanation