Python application: data visualization part 1: basic

This time, it's getting closer to the data analysis method in earnest. I will post about visualization for clients.

Graph

It is also called a statistical chart in Japanese.

Information that cannot be read only by numerical values Easy to understand by visualization etc. Represented by charts and figures.

Line graph

Plot the data on a plane A graph connecting the plotted data with a straight line.

Use: Suitable for visualizing the amount that changes with time and position (distance). Example: By associating time on the horizontal axis (x-axis) and sales volume of products on the vertical axis (y-axis) You can visualize the transition of sales volume.

image.png

bar graph

A graph in which items are arranged on the horizontal axis and the values taken by the items are vertically represented by the length of the figure.

Use: A good visualization method for comparing the values of two or more items Example: Useful if you want to visualize the number of votes cast by agenda.

image.png

histogram

Also called a frequency distribution map.

After dividing the data by class, the frequency within the class (the number of data contained in the same class) is calculated. Graph expressed in height

Use: Distribution of one-dimensional data (such as data that measures the length of a product many times) This is the most suitable visualization method for visualization. Example: Census by age

image.png

Scatter plot

A graph with dots corresponding to the x-axis and y-axis of a certain data, respectively.

You can also visualize a total of three items on a plane by using the color and size of the dots.

Use: Check if the data is concentrated or depopulated on the data of two items. Example: Maximum temperature and number of ice cream sold

image.png

pie chart

A graph that assigns an angle from the center to a circular figure according to the proportion of the whole

Use: The best visualization method when you want to compare the percentage of an item to the whole. Example: Age-specific percentage of all customers, etc.

image.png

Random number generation

seed

The PC generates random numbers based on the "seed".

numpy.random.seed() #By specifying the same seed value (integer) each time, the same random number sequence is generated each time it is executed.

Under the same conditions, the same calculation result can be obtained even if random numbers are used. Therefore, it is used for output that requires reproducibility, such as when debugging.

If you do not set a seed, the computer time will be used as the initial value. Generates a different sequence of random numbers each time you run it.

Generate random numbers that follow a normal distribution

numpy.random.randn() 

By the above program The histogram that plots the generated numerical values is based on an expression called the normal distribution. It has a shape similar to the graph to be drawn.

The graph of the normal distribution is highest in the center, It has a symmetrical bell shape that goes down toward both sides. The average value comes to the highest position in the center.

If you specify an integer in numpy.random.randn () Returns random numbers according to a normal distribution for the number of specified integer values.

Generate random numbers that follow a binomial distribution

numpy.random.binomial()

The above program returns either a successful or unsuccessful attempt. For example, when you throw a coin, you always get only the front or back. The probability of failure or success is 0.5. Such an attempt

It ’s called Bernoulli Trial.

When n independent Bernoulli trials were performed Probability distribution of how many times an event occurs

It is called the binomial distribution.

If you specify an integer n and a real number p between 0 and 1 in numpy.random.binomial () Trials the success rate p as many times as the specified integer n Returns the number of successes.

In other words, the binomial distribution with the number of trials n and the probability p is calculated.

If you specify an integer value for the third argument, the first and second set trials will be performed. Returns the number of integer values.

#When you want to output the number of times the coin appears 10,000 times when you throw a coin 100 times
#Describe as follows.

import numpy

numpy.random.binomial(100, 0.5, 10000)
# (Number of trials,確率、Number of trialsのセット数)

#Output result
[52 51 61 ..., 57 53 52]

1.2.4 Randomly select from the list

numpy.random.choice(x,n)

If you specify list type data x and integer value n in the above program The result of randomly selecting from the specified list type data x Returns the number of specified integer values n.

Data in chronological order

datetime type (date and time)

When dealing with time series data, we need a way to represent time.

datetime #Data type that handles dates and times
datetime.datetime(year, month, day, hour=0, minute=0, second=0, microsecond=0)
#If specified, returns a datetime object with the specified date and time.
#Year(year),Month(month),Day(day)Is mandatory. Other arguments can be omitted, otherwise it will be 0.

It is mandatory to specify the year, month, and day. Other arguments can be omitted, otherwise it will be 0.

timedelta type (transition of time)

datetime.timedelta #A data type that represents elapsed time and time difference.
datetime.timedelta(days=0, seconds=0, microseconds=0, milliseconds=0, minutes=0, hours=0, weeks=0) 
#If specified, returns a timedelta object with the specified time.
#All arguments can be omitted, otherwise it will be 0.
import datetime

td = datetime.timedelta(days=1, seconds=2, microseconds=3, milliseconds=4, minutes=5, hours=6, weeks=7)
print(td)
#Output result
50 days, 6:05:02.004003
#You can also specify a negative number.

td = datetime.timedelta(days=-1, hours=-10)
print(td)
#Output result
-2 days, 14:00:00

datetime and timedelta type operations

By finding the difference between datetime objects You can compare the date and time.

The result is obtained with a timedelta object. Similarly, it is possible to perform operations between timedelta objects. Again, you can get results with a timedelta object.

By adding or subtracting with the timedelta object, you can easily get the number of days and hours until the set date and time.

import datetime

d1 = datetime.datetime.now()
d2 = datetime.datetime(2019, 9, 20, 19, 45, 0)
td = d2 - d1
print(td)
print(type(td))
#Output result
243 days, 5:38:45.159115
<class 'datetime.timedelta'>

Create a datetime object from a string representing the time

strptime() 
#Generates and returns a datetime object from a string.
#At this time, you need to specify the formatting code corresponding to the original string.

For details on the formatting code, go to the official Python website.

datetime 
#Basic date type and time type ・ strftime()And strptime()Behavior
import datetime

s = "2017-12-20 10:00:00"
str_dt = datetime.datetime.strptime(s, "%Y-%m-%d %H:%M:%S")
print(str_dt)
print(type(str_dt))
#Output result
2017-12-20 10:00:00
<class 'datetime.datetime'>
import datetime as dt

#A string representing October 22, 1992"Year-Month-Day"Assign to variable s in the form of
s = "1992-10-22"
# -Register separately by

#Convert the variable s to a datetime object representing October 22, 1992 and assign it to the variable x
x = dt.datetime.strptime(s, "%Y-%m-%d")
#Convert to date with strptime

#output
print(x)

1992-10-22 00:00:00

Manipulating data

Type conversion from string type to numeric type

To calculate the numerical value read from a file etc. The type of the read data must be int type or float type. You can convert numbers-only strings to int () or float () to convert them to numeric types.

Generate an evenly spaced sequence 1

numpy.arange() 
#When you want to order the elements of a list or even columns(0, 2, 4, ...)When you want to get
numpy.arange(Starting value,End value,Interval value)
#If specified, the start value to the end value-Returns numbers up to 1 at specified intervals.

np.arange(0, 5, 2) #When you want to get an even column from 0 to 4
np.arange(0, 4, 2) #Note that if specified, it will be an even column from 0 to 2.

Generate an evenly spaced sequence 2

numpy.linspace() #When you want to divide the specified range into the specified number
numpy.linspace(Starting value,End value,Value of the number you want to divide)
#If specified, returns the points to be divided into the specified number.

np.linspace(0, 15, 4) 
#4 points that divide the range from 0 to 15 at equal intervals 0, 5, 10,If you want to get 15

Recommended Posts

Python application: data visualization part 1: basic
Python Application: Data Visualization Part 3: Various Graphs
Python application: Pandas Part 1: Basic
Python application: data visualization # 2: matplotlib
Python Application: Data Cleansing Part 1: Python Notation
Python Application: Data Handling Part 3: Data Format
Python Data Visualization Libraries
Python basic memorandum part 2
Python basic memo --Part 2
Python basic memo --Part 1
Python Application: Data Handling Part 2: Parsing Various Data Formats
Python application: Pandas Part 2: Series
Python Basic Grammar Memo (Part 1)
Python application: Data handling Part 1: Data formatting and file input / output
Python application: Numpy Part 3: Double array
Easy data visualization with Python seaborn.
Data analysis starting with python (data visualization 1)
Data analysis starting with python (data visualization 2)
Python application: Data cleansing # 2: Data cleansing with DataFrame
Basic Linear Algebra Learned in Python (Part 1)
Python visualization tool for data analysis work
Process Pubmed .xml data with python [Part 2]
Python application: Pandas Part 4: DataFrame concatenation / combination
[Python] Web application from 0! Hands-on (4) -Data molding-
Recommendation of Altair! Data visualization with Python
Python Basic Memorandum Part 3-About Object Orientation-
QGIS + Python Part 2
QGIS + Python Part 1
Basic Python writing
Python: Scraping Part 1
Python3 basic grammar
Python3 Beginning Part 1
[python] Read data
Python: Scraping Part 2
Create test data like that with Python (Part 1)
Real-time visualization of thermography AMG8833 data in Python
Basic visualization techniques learned from Kaggle Titanic data
Data acquisition from analytics API with Google API Client for python Part 2 Web application
Beautiful graph drawing with python -seaborn makes data analysis and visualization easier Part 1
Beautiful graph drawing with python -seaborn makes data analysis and visualization easier Part 2
Data analysis with python 2
Python basic course (12 functions)
Python I'm also basic
Python basic grammar / algorithm
Python Basic Course (7 Dictionary)
Python basic course (2 Python installation)
Power BI visualization of Salesforce data entirely in Python
Visualization memo by Python
Data analysis using Python 0
Data analysis overview python
[python] class basic methods
Python Basic Course (11 exceptions)
Various Python visualization tools
Data cleaning using Python
Python3 cheat sheet (basic)
"My Graph Generation Application" by Python (PySide + PyQtGraph) Part 2
Python basic grammar (miscellaneous)
Python Basic Course (Introduction)
Web application made with Python3.4 + Django (Part.1 Environment construction)
python basic on windows ②
Data visualization with pandas