I explained a little about Detailed Python techniques required around data formatting last week, but there are some other points to keep in mind.
Especially when trying to visualize data with D3.js like yesterday, it is necessary to generate a JSON dataset, but you need to understand well what kind of data structure the dataset has. not.
It would be nice if you could take a method such as Language to compile and execute that you can check for errors on the IDE, fix bugs until it is completed, and then run it. However, in the case of JavaScript, it tends to take time to get from the warning message to the true cause, which is not displayed as expected when executed in the browser. I think it's because of the execution environment, the type and format of the data, or a small mistake in the code, or because it takes time to isolate.
That's why understanding the format of the dataset correctly, writing unit tests to ensure accuracy can result in time savings, and it's sober and tedious, but it's very important that you shouldn't omit it. It is a procedure.
For example, consider visualizing time-series log data. Time is endlessly continuous information without any breaks in the first place. If you want to move this with the slide bar and show the change in the data due to the transition of time, change the Scale level and use the interval scale. A common technique is to convert to.
Suppose you want to show the number of customers for 4 days from 7/22 to 7/25. Converts human date and time to UNIX time to ensure the distance scale is evenly spaced. ..
Human date and time | UNIX time |
---|---|
20140722 | 1404226800 |
20140723 | 1404313200 |
20140724 | 1404399600 |
20140725 | 1404486000 |
In order to visualize the daily transition in this way, we have prepared a function to convert the human date and time on the left side to UNIX time. In Python, time.mktime returns UNIX time as a floating point type.
def to_unixtime(self, d):
# time.mktime(2014,07,22,0,0,0,0,0,0)To do
return int(time.mktime((int(d[0:4]),int(d[5:6]),int(d[7:8]),0,0,0,0,0,0)))
In the above example, we only need to find the daily distance scale, so we fixed 0 to the time part. It would be better to improve it a little more for setting the distance to 1 hour, 10 minutes, and so on. Readers should think about what to do.
Also, in the JavaScript world, it is common to handle the time in 1/1000 second units. Don't forget to multiply the number by 1000 when converting to JSON.
The conversion from UNIX time stamps to human date and time is performed as follows.
now = 1406255406992 / 1000
datetime.datetime.fromtimestamp(now)
#=> datetime.datetime(2014, 7, 25, 11, 30, 6, 992000)
For example, consider the following data format. Nested dictionary type, generally associative array.
{1404399600.0: {'a': 1, 'b': 2}, 1404486000.0: {'c': 3, 'd': 4}}
If you want to display this in the Stacked Area Chart introduced Yesterday, you can use the Code of the main unit. master / src / models / stackedAreaChart.js) can treat the JSON data format as a multidimensional array I understand. Converting from a dictionary type seems to be troublesome at first glance, but it can be described concisely by using the list content notation that Python has.
[[a*1000,b] for a,b in v.items()]
#=> [[1404399600000.0, {'a': 1, 'b': 2}], [1404486000000.0, {'c': 3, 'd': 4}]]
In the world of mathematics, we interpret all real numbers, integers, etc. as universal sets. You can understand it clearly by referring to the elements of the data with for and getting a projection for each. You can see that it resembles the notation of mathematics.
In Python 3, the behavior of list comprehensions has been improved to generator expressions, for example the following syntax
[f(x) for x in S if P(x)]
It is equivalent to applying the list () function to a generator expression as shown below.
list(f(x) for x in S if P(x))
Today, I introduced the detailed techniques that often appear in performing simple data conversion.
Recommended Posts