Detailed Python techniques required for data shaping (2)

I explained a little about Detailed Python techniques required around data formatting last week, but there are some other points to keep in mind.

Especially when trying to visualize data with D3.js like yesterday, it is necessary to generate a JSON dataset, but you need to understand well what kind of data structure the dataset has. not.

It would be nice if you could take a method such as Language to compile and execute that you can check for errors on the IDE, fix bugs until it is completed, and then run it. However, in the case of JavaScript, it tends to take time to get from the warning message to the true cause, which is not displayed as expected when executed in the browser. I think it's because of the execution environment, the type and format of the data, or a small mistake in the code, or because it takes time to isolate.

That's why understanding the format of the dataset correctly, writing unit tests to ensure accuracy can result in time savings, and it's sober and tedious, but it's very important that you shouldn't omit it. It is a procedure.

Scale level conversion and time stamp change

For example, consider visualizing time-series log data. Time is endlessly continuous information without any breaks in the first place. If you want to move this with the slide bar and show the change in the data due to the transition of time, change the Scale level and use the interval scale. A common technique is to convert to.

Suppose you want to show the number of customers for 4 days from 7/22 to 7/25. Converts human date and time to UNIX time to ensure the distance scale is evenly spaced. ..

Human date and time UNIX time
20140722 1404226800
20140723 1404313200
20140724 1404399600
20140725 1404486000

In order to visualize the daily transition in this way, we have prepared a function to convert the human date and time on the left side to UNIX time. In Python, time.mktime returns UNIX time as a floating point type.

def to_unixtime(self, d):
    # time.mktime(2014,07,22,0,0,0,0,0,0)To do
    return int(time.mktime((int(d[0:4]),int(d[5:6]),int(d[7:8]),0,0,0,0,0,0)))

In the above example, we only need to find the daily distance scale, so we fixed 0 to the time part. It would be better to improve it a little more for setting the distance to 1 hour, 10 minutes, and so on. Readers should think about what to do.

Also, in the JavaScript world, it is common to handle the time in 1/1000 second units. Don't forget to multiply the number by 1000 when converting to JSON.

The conversion from UNIX time stamps to human date and time is performed as follows.

now = 1406255406992 / 1000
datetime.datetime.fromtimestamp(now)
#=> datetime.datetime(2014, 7, 25, 11, 30, 6, 992000)

List comprehension

For example, consider the following data format. Nested dictionary type, generally associative array.

{1404399600.0: {'a': 1, 'b': 2}, 1404486000.0: {'c': 3, 'd': 4}}

If you want to display this in the Stacked Area Chart introduced Yesterday, you can use the Code of the main unit. master / src / models / stackedAreaChart.js) can treat the JSON data format as a multidimensional array I understand. Converting from a dictionary type seems to be troublesome at first glance, but it can be described concisely by using the list content notation that Python has.

[[a*1000,b] for a,b in v.items()]
#=> [[1404399600000.0, {'a': 1, 'b': 2}], [1404486000000.0, {'c': 3, 'd': 4}]]

In the world of mathematics, we interpret all real numbers, integers, etc. as universal sets. You can understand it clearly by referring to the elements of the data with for and getting a projection for each. You can see that it resembles the notation of mathematics.

In Python 3, the behavior of list comprehensions has been improved to generator expressions, for example the following syntax

[f(x) for x in S if P(x)]

It is equivalent to applying the list () function to a generator expression as shown below.

list(f(x) for x in S if P(x))

Summary

Today, I introduced the detailed techniques that often appear in performing simple data conversion.

Recommended Posts

Detailed Python techniques required for data shaping (2)
Python course for data science_useful techniques
Python for Data Analysis Chapter 4
Techniques for sorting in Python
Python for Data Analysis Chapter 2
Python for Data Analysis Chapter 3
Preprocessing template for data analysis (Python)
Data formatting for Python / color plots
Python visualization tool for data analysis work
Summary of useful techniques for Python Scrapy
How to use "deque" for Python data
[Competitive programming] [Python3] Required knowledge, for myself
Memo # 4 for Python beginners to read "Detailed Python Grammar"
python [for myself]
[CovsirPhy] COVID-19 Python Package for Data Analysis: Data loading
Data analysis python
Memo # 3 for Python beginners to read "Detailed Python Grammar"
Memo # 1 for Python beginners to read "Detailed Python Grammar"
Display candlesticks for FX (forex) data in Python
Memo # 2 for Python beginners to read "Detailed Python Grammar"
Memo # 7 for Python beginners to read "Detailed Python Grammar"
Memo # 6 for Python beginners to read "Detailed Python Grammar"
Memo # 5 for Python beginners to read "Detailed Python Grammar"
[python] Read data
Let's analyze Covid-19 (Corona) data using Python [For beginners]
Use data class for data storage of Python 3.7 or higher
Data analysis for improving POG 1 ~ Web scraping with Python ~
Create your own Big Data in Python for validation
Dimensionality reduction and 2D plotting techniques for high-dimensional data
[For beginners] How to study Python3 data analysis exam
List of Python libraries for data scientists and data engineers
Knowledge and study methods required for future data analysts
[Python] Measures and displays the time required for processing
[CovsirPhy] COVID-19 Python package for data analysis: SIR-F model
[CovsirPhy] COVID-19 Python package for data analysis: S-R trend analysis
[CovsirPhy] COVID-19 Python Package for Data Analysis: SIR model
[CovsirPhy] COVID-19 Python Package for Data Analysis: Parameter estimation
Data analysis with python 2
Python basics ② for statement
Data analysis using Python 0
Data analysis overview python
About Python, for ~ (range)
Techniques for code testing?
Data cleaning using Python
python textbook for beginners
python for android Toolchain
Python data analysis template
[Python tutorial] Data structure
[Python] Sorting Numpy data
Data analysis with Python
OpenCV for Python beginners
Install Python (for Windows)
[Python] for statement error
Python environment for projects
[CovsirPhy] COVID-19 Python Package for Data Analysis: Scenario Analysis (Parameter Comparison)
[Understand in the shortest time] Python basics for data analysis
Which should I study, R or Python, for data analysis?
Stylish technique for pasting CSV data into Excel with Python
Python code for writing CSV data to DSX object storage
<Python> Build a dedicated server for Jupyter Notebook data analysis
Get data from analytics API with Google API Client for python