This article is a contribution to Data Visualization Advent Calendar 2015.
There are so many different tools out there to choose from for data visualization. In addition to applications such as Excel and Tableau, toolkits such as D3 for creating custom visualizations are also available as open source software for individuals. There is no such thing as "this is the correct answer" when selecting a tool, and basically you should choose one that is easy to use according to your skill set and data type. However, since this is a place for programmers, we will focus on the visualization that is done by writing code. In this article, we will introduce useful tools for programming visualization such as Jupyter Notebook and Beaker Notebook. __ Among them, I will focus on the part of how to use such an environment when writing JavaScript code by myself and creating a custom visualization. __
Among the various visualization tasks, when dealing with relatively small data, the tasks will be as follows.
You should be working in a loop like this. The following is a review of these steps from a tool perspective.
work | tool |
---|---|
Data collection | Analog methods such as paper and digital cameras, experimental equipment, programs such as crawlers(For data on the web) |
Processing to machine readable state(cleansing) | Python/R/Perl/Node.js/Data processing scripts such as awk |
analysis | Python/R向けの統計analysisパッケージ |
Visualization | Original drawing code by JavaScript, Python/RのVisualizationライブラリ |
Of course it is possible to do everything in one programming language, but I think that it is often necessary to do the cleansing and analysis part and the actual drawing part in different languages, especially when creating a custom visualization. .. If you need to use multiple languages and tools like this, you can do it only with a text editor, a terminal and a browser for checking the results, but for exploratory visualization work, you also need to repeat each step. The problem is that it makes it difficult to get a complete picture of the work.
A notebook-type application is very useful in this case. Originally a software used by professionals such as Mathematica [Lab Notebook](https://ja.wikipedia.org/wiki/%E5%AE%9F%E9%A8%93%E3%83%8E%E3%83] It is a concept created in a position like the digital version of% BC% E3% 83% 88), but it is very convenient for data analysts to mix code with human-readable documents, visualization results, etc. Nowadays, it is widely used not only in science but also in the field of data analysis.
Jupyter Notebook
I think it's the most famous open source one. Originally an application named IPython Notebook for Python, it changed policy some time ago and was split into a notebook application part and a kernel part that executes the actual code, and now Python, R, Julia. It supports over 40 programming languages, including.
It has a very high affinity with the originally supported language Python, and visualizations such as the well-known library matplotlib are supported without any special action. But what if you want to develop your own visualizations such as D3.js with JavaScript?
This screenshot is used in on this notebook Cytocsape.js This is a rendered network diagram with the embedded visualization module at: //js.cytoscape.org/). In this way, it is possible to embed a third-party visualization library in a cell in your notebook. However, the method is not very sophisticated ...
Here, I can install with pip based on @domitory's Prototype [Python Package](https://pypi.python. Let's take a look at the cases summarized in (org / pypi / py2cytoscape).
First, prepare HTML that can be embedded. Again, this isn't exactly full HTML, but something that Jupyter Notebook can interpret as a template for jinja2. In this case, you would insert the actual visualization for the following tags.
<div id="{{uuid}}"></div>
This is also a problem with JavaScript at present, but since ES5 does not have a mechanism to handle external modules neatly, IPython Notebook uses RequireJS to externally. Supports JavaScript embedding.
if (window['cytoscape'] === undefined) {
//Location of JS library to read from outside
var paths = {
cytoscape: 'http://cytoscape.github.io/cytoscape.js/api/cytoscape.js-latest/cytoscape.min'
};
require.config({
paths: paths
});
require(['cytoscape'], function (cytoscape) {
console.log('Loading Cytoscape.js Module...');
window['cytoscape'] = cytoscape;
var event = document.createEvent("HTMLEvents");
event.initEvent("load_cytoscape", true, false);
window.dispatchEvent(event);
});
}
And finally, write the code to pass the data from the Python side to the prepared JS or HTML template. You need the code to render the template after passing the data on the Python side in a form that the JavaScript code can interpret.
cyjs_widget = template.render(
nodes=json.dumps(nodes),
edges=json.dumps(edges),
background=background,
uuid="cy" + str(uuid.uuid4()),
widget_width=str(width),
widget_height=str(height),
layout=layout_algorithm,
style_json=json.dumps(style)
)
display(HTML(cyjs_widget))
In this way, the current Jupyter Notebook did not originally have a purpose for creating a mixture of multiple languages or custom visualization on the spot, so load an external JS library and create visualization by trial and error in the cell. I think it's more suitable if you have an existing visualization module and want to use it in a cell rather than going.
Currently, the Jupyter project is in the process of expanding its scale by acquiring large grants from various sponsors, so it is likely that the expansion mechanism in this area will be improved in the future.
Jupyter / IPython Notebook is a very powerful tool, but at present there is no way to mix multiple languages in one notebook or exchange data between multiple languages. Also, since there is no mechanism that can easily execute JS for arbitrary HTML, the above work is required when using a unique visualization module other than the prepared visualization module (matplotlib, Bokeh, etc.). Will be. Beaker is a notebook-type application that has a mechanism to solve these problems.
The biggest difference from Jupyter is that __Jupyter limits the kernel to connect to each notebook and manages it in the form of one language per notebook, whereas in Beaker this is managed cell by cell. __. Therefore, you can do the following on the same notebook:
Specifically, it means that there is a standard mechanism for exchanging data between cells using a common object called beaker
. For example, the value assigned in Python,
beaker.mydata = "My sample data"
Access in R language
beaker::get('mydata')
You can easily use it with JavaScript.
var myJsData = beaker.mydata + " updated by JS";
By using this, you can read CSV with Pandas of Python, convert it to Dictionary object, pass it to JavaScript cell as it is via beaker object and use it for drawing, etc. with only standard functions.
The following is an example of preparing data in Python and drawing with JavaScript code using Cytoscape.js in the embedded HTML cell:
In this way, this application is recommended when you want to use Python for the processing part of __ data and R for statistic calculation, but mainly draw the data using D3.js __. This is because all the processes can be done in one notebook.
This time I introduced custom module embedding in Jupyter and Beaker Notebook, but in the second part, I will look at the actual work with Beaker.
Recommended Posts