Hello. I'm a software engineer who makes various things such as embedded devices, games, web applications, iOS applications, and signage control systems. About two and a half years ago, as a general-purpose scripting language, I thought that it should be written in Python that it is difficult to realize with only the shell, and I was trying to learn Python while suffering. I use it only occasionally, so I forget how to write if and for loops every time.
However, about a year and a half ago, I learned that there is a REPL called Jupyter Notebook in a machine learning / data science book, and when I tried using it as a trial, it was not limited to machine learning / data science. , I found that it is convenient for all tasks that require daily trial and error, and I am writing this article to spread it.
I think Jupyter Notebook has a strong image of data analysis.
However, you don't need real-time performance, such as sending packets a little, using it as a calculator that can do complicated things, converting various file formats, copying complicated files, etc. It's very suitable for doing miscellaneous things like, but maybe the second and subsequent times. It's like stepping, and you can try it sentence by sentence to see if the result is as expected or wrong and write the script, so you can write code quickly and with high certainty.
Also, when writing a program in a dynamic language, there is a drawback that you tend to write code in a state where completion does not work so much, but in the case of Jupyter Notebook, while directly investigating the materialized instance Since you can complete methods and fields, the completion works like a static type language, and you can save a lot of time to read the reference.
This time,
--Case used for analysis --A case that I used as a calculator --A case where I used it to create binary data and send out packets --Case of format conversion
I would like to explain the four genres of.
When making software, there are quite a few situations where you have to do behavior analysis on a scale that cannot be tracked by a debugger, or you have to do research that cannot be done with existing tools. Devices that should be able to be controlled by TCP or UDP are sometimes asked to investigate because they can not be controlled, or I want to casually check what value the sensor data returns without glancing at the specifications. , Some characters can be displayed in a specific font and some cannot, so I was asked to decide the range of characters that can be used in the system.
Basically, you have to analyze the log or read the metadata and do something about it, but with Jupyter it is comfortable to proceed while checking ad hoc.
We received a request to investigate the problem that the device controlled by TCP / UDP sometimes could not be controlled, and investigated whether the cause was on the control software side, the network, or the device side. There was something I had to do. For the time being, I thought that it would be possible to manage by looking at the communication, so I asked him to prepare a switching hub capable of port mirroring, and to collect packets of several GB level a day with tshark, which is a command line tool of Wireshark. Convert the obtained capture file and .pcap file to Pandas DataFrame with the following code.
def makeDfFromPcap(filename):
p = dpkt.pcapng.Reader(open(filename, 'rb'))
dDict = {'t':[], 'src':[], 'dst':[], 'type':[], 'srcport':[], 'dstport':[], 'length':[]}
count = 0
for ti,buf in p:
try:
eth = dpkt.ethernet.Ethernet(buf)
except dpkt.NeedData:
pass
if type(eth.data) == dpkt.ip.IP:
ip = eth.data
src = ip.src
dst = ip.dst
src_a = socket.inet_ntoa(src)
dst_a = socket.inet_ntoa(dst)
t = 'IP'
srcPort = 0
dstPort = 0
if type(ip.data) == dpkt.udp.UDP:
srcPort = ip.data.sport
dstPort = ip.data.dport
t = 'UDP'
elif type(ip.data) == dpkt.tcp.TCP:
srcPort = ip.data.sport
dstPort = ip.data.dport
t = 'TCP'
dDict['t'].append(ti)
dDict['src'].append(src_a)
dDict['dst'].append(dst_a)
dDict['type'].append(t)
dDict['srcport'].append(srcPort)
dDict['dstport'].append(dstPort)
dDict['length'].append(ip.len)
count += 1
if count % 10000 == 0:
print(count)
df = pd.DataFrame(dDict, columns=['t','src','dst','type','srcport','dstport','length'])
return df
df = makeDfFromPcap('cap_00001_20191216.pcap')
df
After that, you can check how much communication is occurring between the source IP address and the destination IP address by Group By with DataFrame and totaling.
pd.DataFrame(df.groupby(['src','dst'])['length'].sum().sort_values(ascending=False) * 8 / (df.t.max() - df.t.min()))
#Bandwidth used(unit: bps)
pd.DataFrame(df.groupby(['src','dst']).size().sort_values(ascending=False) / (df.t.max() - df.t.min()))
#Number of packets per unit time(unit: tps)
In this case, it seems that communication between 10.100.45.26 and 10.100.45.39 is occurring frequently, so just in case, have a hub that connects only 10.100.45.26 and 10.100.45.39 that are cascaded from the main hub, and the problem is I watched the transition to see if
Of course, I think there are many tools that can do this kind of packet aggregation, but the big merit is that if you have knowledge of Pandas, you don't have to learn how to use special tools.
I think that there are many cases where you want to move on to full-scale implementation after having a skin sensation as to what value the sensor data of the device will return. It was necessary to quickly find out how the values of the 3-axis geomagnetic sensor attached to the IoT device are linked to the rotation of the device and what kind of data processing should be done when you want to extract the orientation. .. Fortunately, there was a sample code of the geomagnetic sensor, which dumped the sensor data via serial, so I decided to save the text log with TeraTerm and visualize it.
Because it is log data like this
def parse_mag_str(line):
try:
splitted = line[5:].replace(' ', '').split(',')
return [float(n) for n in splitted]
except:
return None
Let's create an appropriate perspective function such as.
The behavior seems to be okay, so
mags = [parse_mag_str(l) for l in log[2:]][0:-1]
Let's make a float 3D array.
After that, if you put the data in an interactive 3D visualization library such as ipyvolume, you can easily convert the data into a 3D graph by operating the mouse. You can check.
import ipyvolume as ipv
start = 250
end = 8000
t = np.array(range(start,end))
x = np.array([m[0] for m in mags[start:end]])
y = np.array([m[1] for m in mags[start:end]])
z = np.array([m[2] for m in mags[start:end]])
x_n = np.array([m[0] for m in mags[start+10:end+10]])
y_n = np.array([m[1] for m in mags[start+10:end+10]])
z_n = np.array([m[2] for m in mags[start+10:end+10]])
u = x_n - x
v = y_n - y
w = z_n - z
ipv.figure()
quiver = ipv.quiver(x, y, z, u, v, w, size=3, size_selected=20)
ipv.show()
In this case, it turns out that the xy plane coordinate value should be atan2 in good condition to get the bearing. Since the time series of data is also known by the direction of the cone, the code can be adjusted properly.
Many packages that start with ipy are useful because they enhance Jupyter's visibility and interactivity. There are ipyleaflet that can display data on an interactive map, ipywidgets that can create a UI with buttons, etc.
If you have to check the log comprehensively and visualize / voice the output result of data processing, it seems that writing a processing tool seems to be troublesome until now (only when it is really necessary). (I wrote it using openFrameworks), I was fooling myself with somehow verification without much, but in the field of data analysis and visualization, which is the main point of Jupyter Notebook, most things can be done smoothly, and my head is nervous. You can take a step forward in solving the problem without using it, and since the result of trial and error remains as a notebook without permission, it takes very little time and effort when you have to re-verify with another data later. I've come to like logging and retrieving metadata for research. Just by writing the code in an atmosphere, I can create a survey report that exceeds the manual verification that I had been reluctant to do with overwhelming accuracy and quantity, and I feel a sense of accomplishment.
Have you ever implemented UI design and have experienced the same multiplication and addition many times according to the screen size based on the design drawing? Also, if you are doing image processing and want to do a little matrix calculation (inverse matrix, diagonalization, etc.), you want to create a rotation matrix in your favorite rotation order, you can algebra a little complicated inverse matrix. Is there anything you want to ask for?
It is necessary to reproduce the color vision of a person with color vision deficiency, and after converting the RGB color space to the LMS color space (which is a linear conversion), performing matrix operations in the LMS color space and returning to the RGB space, As a result, we had to find out what kind of linear transformation should be done in the RGB color space. The formula is as follows.
\begin{eqnarray}
\mathbf{c^{'}}_\mathrm{RGB} & = & \mathbf{M}_\mathrm{LMStoRGB} \cdot \mathbf{M}_\mathrm{protanopia} \cdot \mathbf{M}_\mathrm{RGBtoLMS} \cdot \mathbf{c}_\mathrm{RGB} \\
& = & ( \mathbf{M}^{-1}_\mathrm{RGBtoLMS} \cdot \mathbf{M}_\mathrm{protanopia} \cdot \mathbf{M}_\mathrm{RGBtoLMS}) \cdot \mathbf{c}_\mathrm{RGB}
\end{eqnarray}
You only have to calculate the matrix part enclosed in parentheses on the right side, so put it in Numpy almost as it is, and it is completed with the following code.
import numpy as np
rgb2lms = np.array([0.31399022, 0.63951294,
0.04649755, 0.15537241, 0.75789446, 0.08670142, 0.01775239,
0.10944209, 0.87256922]).reshape((3,3))
lms2rgb = np.linalg.inv(rgb2lms)
While checking whether rgb2lms and lms2rgb are made properly (I did not know whether Numpy is Column Major or Row Major, so I proceeded while checking on Jupyter) The resulting matrix can be calculated below.
protanopia = np.array([0.0, 1.05118294,-0.05116099, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0]).reshape((3,3))
lms2rgb.dot(protanopia).dot(rgb2lms)
I think that the rotation matrix in 3D space is often used when making games, handling coordinates with 3DCG, and touching the motion sensor of a smartphone. There are 6 patterns of rotation matrices depending on the order in which the X, Y, and Z axes are rotated. Until now, I've been doing a lot of waste by searching Wikipedia one by one and saying that the one I chose is wrong and doesn't work. However, it is actually the most reliable and quickest to multiply from the left one by one while firmly grasping how the vector is converted from the local coordinate system to the global coordinate system. That's why I also calculate the rotation matrix myself. Just Python has a library called Sympy that does algebraic calculations.
from sympy import *
#x-axis rotation
theta = Symbol('theta')
Rx = Matrix([[1, 0, 0], [0, cos(theta), -sin(theta)], [0, sin(theta), cos(theta)]])
#Rotation in the y-axis direction
yaw = Symbol('yaw')
Ry = Matrix([[cos(yaw), 0, sin(yaw)], [0, 1, 0], [-sin(yaw), 0, cos(yaw)]])
#Rotation in the z-axis direction
phi = Symbol('phi')
Rz = Matrix([[cos(phi), -sin(phi), 0], [sin(phi), cos(phi), 0], [0, 0, 1]])
R = Ry * Rx * Rz
R
The result is also displayed in a beautiful form rendered by TeX, so it is easy to check, and since I am building the rotation matrix myself, try changing the order or reducing the rotation matrix by one even if something is wrong. You can easily do that.
I received a request to display an image on an IoT device by projection conversion without any library. It was. The derivation process is not the main subject, so I will omit it (although the format is slightly different, it is essentially homography --Shogo Computing Laboratory. The same calculation should be done), but as a result, when the inverse matrix of the following 9-by-9 matrix is solved and multiplied by the vector, each element of the 3-by-3 projective transformation matrix becomes a 9-dimensional vector. You can find the thing X.
from sympy import *
# x0 = Symbol('x0')Substitution like x0- x3, y0 -Do for y3
for i in range(4):
for c in "xy":
globals().update({c + str(i): Symbol(c + str(i))})
u0 = 640
v0 = 0
u1 = 640
v1 = 480
u2 = 0
v2 = 480
u3 = 0
v3 = 0
A = Matrix([
[ x0, y0, 1, 0, 0, 0, -x0*u0, -y0*u0, 0 ],
[ 0, 0, 0, x0, y0, 1, -x0*v0, -y0*v0, 0 ],
[ x1, y1, 1, 0, 0, 0, -x1*u1, -y1*u1, 0 ],
[ 0, 0, 0, x1, y1, 1, -x1*v1, -y1*v1, 0 ],
[ x2, y2, 1, 0, 0, 0, -x2*u2, -y2*u2, 0 ],
[ 0, 0, 0, x2, y2, 1, -x2*v2, -y2*v2, 0 ],
[ x3, y3, 1, 0, 0, 0, -x3*u3, -y3*u3, 0 ],
[ 0, 0, 0, x3, y3, 1, -x3*v3, -y3*v3, 0 ],
[ 0, 0, 0, 0, 0, 0, 0, 0, 1 ],
])
A
B = Matrix([u0, v0, u1, v1, u2, v2, u3, v3, 1])
B
X = A.inv() * B
It seems that it could be calculated in about 5 seconds. However, it seems that each element of X has a lot of terms, and if you try to display it, it will take a lot of time to compile TeX, so I will extract it as a character string.
X_strs = [str(simplify(X[i])) for i in range(9)]
It took almost 2 minutes, probably because it took a long time to simplify. I want to convert it to C code as it is, so I rename the exponentiation term x0 ** 2 to x0p2 etc.
list(zip([v + str(i) + '**2' for v in 'xy' for i in range(4)],[v + str(i) + 'p2' for v in 'xy' for i in range(4)]))
It seems that we can rename it like this, so we will replace it with reduce at once.
X_strs2 = [functools.reduce(lambda s, t: s.replace(*t), list(zip([v + str(j) + '**2' for v in 'xy' for j in range(4)],[v + str(j) + 'p2' for v in 'xy' for j in range(4)])), X_strs[i]) for i in range(9)]
The final conversion to C code is done with code like this:
for v in 'xy':
for i in range(4):
print('float %s%dp2 = %s%d * %s%d;\n' % (v, i, v, i, v, i))
for i in range(9):
print('float X%d = ' % i + X_strs2[i] + ';\n')
Now the projective transformation function with arbitrary parameters is made for embedded devices in a form that does not depend on any library. There was a conversion omission in one place, but when I corrected it, it worked as expected.
I also found the inverse of this projective transformation matrix with a similar feeling.
It's easy because if you write the formula almost as it is, it will be completed.
It is also used to calculate the size of UI elements designed with different screen sizes. It's not really a big deal, but I'm happy that it's easy to see and fix.
I am making a signage control system, and a certain power control device should work with UDP binary packets, so I tried to make packets according to the specifications and send them by UDP.
First, make a packet.
switches = [True, False, True, True, False]
model_id = 3 # LA-5R
unit_id = 0b00001111 ^ 0
def make_send_bytes(model_id, unit_id, switches):
return bytes([0xf0 | unit_id, switches[4] << 4 | switches[3] << 3 | switches[2] << 2 | switches[1] << 1 | switches[0]])
send_bytes = make_send_bytes(model_id, unit_id, switches)
[format(send_bytes[i], '02x') for i in [0, 1]]
It seems that the packet has been created, so I will actually send it.
switches = [random.random() < 0.5 for i in range(0, 5)]
send_bytes = make_send_bytes(model_id, unit_id, switches)
[format(send_bytes[i], '02x') for i in [0, 1]]
s.sendto(send_bytes, (HOST, PORT))
I can't prepare screenshots, but I was able to confirm that it works when I send a packet.
Next, check if the switch ON / OFF of all 5ch patterns and 32 patterns can be controlled.
sl = 0.08
l = 0.08 / sl
for j in range(0, int(l)):
for i in range(0, 32):
switches = [c == '1' for c in format(i, '05b')]
send_bytes = make_send_bytes(model_id, unit_id, switches)
[format(send_bytes[i], '02x') for i in [0, 1]]
s.sendto(send_bytes, (HOST, PORT))
time.sleep(sl)
Looking only at the result of writing, it is almost the same as writing in C, but without being accustomed to bit operation, trial and error whether the packet is made correctly or not, actually sent and tried and works It turns out that the fast loop of is very important to the implementation efficiency.
At the same time, I also verified the control with TCP connection, but if you add a table of contents to the notebook, it will not be in a state full of mysterious comment outs, so you can sort out the parts that can be diverted between TCP and UDP and the parts that can not be diverted. , You can look back later and make the code easier to use.
This may be the most typical case. As a software engineer, there are many cases where you want to write a simple script to solve various format conversions in order to get the raw material data and make it the final form. Even in such a case, the Jupyter Notebook's "trial and error while looking at it" is strong and very useful.
During game development, I sometimes wanted to port images placed using the 3D layer function in After Effects to Unity. Of course, it can be realized by creating a Unity Quad object while looking at the numerical values one layer at a time, but I do not want to do such a simple task even though there are about 50 layers. So, first, I exported each information of the After Effects layer in JSON using ESTK, and then processed it with Jupyter Notebook.
First, check what value is included,
import json
import codecs
from operator import itemgetter
with codecs.open('transforms.json', 'r', encoding='shift-jis') as f:
root = json.load(f)
root
Take out one layer,
layer = root['9']
layer
Position, anchor point, width, height, scale, etc. are summarized for each item of X and Y,
list(zip(layer['position'],layer['Anchor point'],[layer['width'], layer['height']], [layer['scale'][0], layer['scale'][1]]))
We will use the position specification and scaling that are the anchor point criteria as the central reference.
[ps[0] - ps[1] * ps[3] + ps[2] * ps[3] / 2 for ps in zip(
layer['position'],
layer['Anchor point'],
[layer['width'], layer['height'], 0],
[layer['scale'][0] / 100, layer['scale'][1] / 100, 1])]
Now that we know how to handle one layer, let's apply it to all layers. The reason for reverse is that the overlapping order is reversed in After Effects and Unity.
t_list = [([ps[0] - ps[1] * ps[3] + ps[2] * ps[3] / 2 for ps in zip(layer['position'],layer['Anchor point'],[layer['width'], layer['height'], 0], [layer['scale'][0], layer['scale'][1], 1])],
[layer['width'] * layer['scale'][0] / 100, layer['height'] * layer['scale'][1] / 100], layer['name']) for layer in root.values()]
t_list.reverse()
t_list
Now that we know the coordinates, width, and height for all layers, we'll type in the Immediate pane of Unity Editor and generate the C # code to create the objects in the scene.
cs_str = '''{GameObject q = GameObject.CreatePrimitive(PrimitiveType.Quad);
q.name = "%s";
q.transform.position = new Vector3(%ff, %ff, %ff);
q.transform.localScale = new Vector3(%ff, %ff, 1.0f);
var m = (q.GetComponent<Renderer>().material = UnityEditor.AssetDatabase.LoadAssetAtPath<Material>("Assets/Textures/Background/_Materials/%s.mat"));
if (m != null) m.renderQueue = %d;
}'''
sss = '\n'.join([cs_str % (t[2].replace('.psd', ''),
t[0][0] / 1000, -t[0][1] / 1000, t[0][2] / 1000,
t[1][0] / 1000, t[1][1] / 1000, t[2].replace('.psd', ''),
3000 + i + start) for i, t in
enumerate(sorted(t_list, key=lambda t: int(t[0][2]), reverse=True)[start:start + lll])])
print(sss)
In fact, typing the code into the Immediate pane recreated the After Effects scene. It seems like it took as long as typing 50 times, but even if the designer plays with the After Effects scene, it can be handled mechanically, so it feels a lot easier from the second time. Unfortunately, the designer said, "I don't want to play with such complicated and difficult-to-operate AE files anymore."
Occasionally I also code HTML, but sometimes I have to handwrite a place like a blog article list where the title string etc. appears many times almost repeatedly. There is talk that you can put in CMS and use it with a static website generator, but in most cases you have to prioritize publishing over that, and if the repeating part is a little overall, after all HTML editors such as Brackets I feel that it is faster to write directly in (although it is also a big reason that I have little experience with HTML / CSS). Even in such a case, it is tedious and troublesome to copy the same character string in multiple places, and it is nerve-wracking, so the programmer's feelings are that he wants to skip it. So, once I write it in Brackets and it is displayed well, I will make a template and replace it.
news_template=''' <article class="news-box">
<div class="news-photo-box"><img src="assets/image/news/###image_name###"></div>
<div class="news-date">###date_str###</div>
<div class="news-title">###title_str###</div>
<button class="news-more" alt="MORE"></button>
<div class="news-description news-description-hidden">
<button class="close-button"></button>
<button class="news-prev-button"></button>
<button class="news-next-button"></button>
<div class="news-description-inner">
<div class="news-description-pdt-box">
<div class="news-description-photo-box"><img src="assets/image/news/###image_name###"></div>
<div class="news-description-dt-box">
<div class="news-description-date">###date_str###</div>
<div class="news-description-title">###title_str###</div>
</div>
</div>
<div class="news-description-body">
###body_str###
</div>
</div>
</div>
</article>
'''
def make_news(date_str, title_str, body_str, image_name='default.png'):
return news_template \
.replace('###date_str###', date_str) \
.replace('###title_str###', title_str) \
.replace('###body_str###', body_str) \
.replace('###image_name###', image_name)
Now you can definitely create HTML just by filling in the required items. It's great to be able to do it with zero learning costs. Also, when you are asked to add news in the future, opening this notebook will remind you of everything, which is a good point in the sense that it will not leave debt in the future even though it is not clearly documented. think.
Besides that,
--Instantly convert CSV format metadata prepared by the customer to JSON format for Web clients with pandas and json --Cut the photo data directory for delivery that does not fit on the DVD at any span and divide it appropriately for the DVD while checking the total file size. --Read GPS data and convert to KML --Read CSV data and output to map image --Easy to add comments by converting the items in the bug mail into a work Excel file --Convert from PGM to BMP (Since there are various versions of BMP and I could not investigate the appropriate ImageMagick option, only the header was ported from another BMP and the data was from PGM to Gatchanko) --When creating an animated serial number tile image for Unity, to save texture space, check and eliminate the serial number image with md5 hash. --Combine CSV files to make big data easier to read in Houdini ――The same sound source is combined with 6 types of videos with 3 types of audio codecs, and a total of 18 video files are recreated many times.
And so on, reading and converting is mostly done with Jupyter Notebook unless it can be done with a single command.
I think there are many things you can do with traditional command line tools, but it's a hassle to find out how to use the tools and think about passing data between tools, and it's almost certain that records will remain and you'll have to start over. I don't have to worry about it, so I continue to use Jupyter Notebook instead of the shell.
Think you've been fooled and try using the Jupyter Notebook. I think that it is very suitable for people who have a lot of non-standard work on a project basis.
Recommended Posts