[PYTHON] Visualize the characteristic vocabulary of a document with D3.js

This time, I used D3.js for the vocabulary extracted from Aozora Bunko's work in the article Last time. Let's visualize it.

The completed demo application can be viewed from here. (If it doesn't look good, try reloading your browser)

Visualize text data

So far, focusing on the handling of text data, How to use feeds and Extracting topics of interest from a large number of documents by Basian classification Method and Extract characteristic vocabulary from documents using TF-IDF as an index ) I have explained how to do it.

As mentioned at the end of Last time, it is better to use the visualization library than to show the result extracted in this way as data like a character string. It is transmitted well.

Create data for display in D3.js

In the past, I made Interactive visualization demo using D3.js, but implemented the application in the same way [Heroku](https: //) Let's run it at www.heroku.com/).

First, the vocabulary group is used as a key, and its weight is expressed numerically.

require 'json'
require 'codecs'

def write_json_data(dic):
    """A function that writes the result to JSON"""
    arr = [] #Since we will create a two-dimensional vector in JSON, first prepare an array
    for k, v in dic.items():
        for w, s in v:
            #Add to the array while adjusting the score appropriately
            arr.append([w, str(round(s * 10000 + 100, 2))])

        #When converting a dictionary containing Japanese into JSON in Python
        #Ensure like this_If ascii is set to False, it will not be garbled
        hash = json.dumps({'values': arr},
                          sort_keys=True,
                          ensure_ascii=False,
                          indent=2,
                          separators=(',', ': '))
                          #Clarify the separator and make it a beautiful JSON

        #To output the file codecs.with open
        f = codecs.open(os.path.join(output_dir, k),
                            "w", "utf-8")
        f.write(hash) #Export
        f.close() #Close properly

The generated JSON looks like this when only the beginning is displayed

{
  "values": [
    [
      "Back view",
      "199.26"
    ],
    [
      "Peculiar",
      "299.26"
    ],

In this way, it becomes a two-dimensional array with an array of keys and values inside the array.

Visualize with D3.js

To be honest, I'm not very good at JavaScript, so I'd like to ask for guidance from experts. I will write it with the goal of being able to display it for the time being.

//Add a node
var svg = d3.select("body")
  .append("svg")
  .attr("width", width + margin.left + margin.right)
  .attr("height", height + margin.top + margin.bottom)
  .append("g")
  .attr("transform", "translate(" + margin.left + "," + margin.top + ")");

  //JSON data binding
  d3.json('../json/novel_name.json', function(error, data) {
    data.values.forEach(function(d) {
      d.word = String(d[0]); //Key
      d.score = d[1]; //value
    });
  force
    .nodes(data.values)
    .start();

  var node = svg.selectAll("g.node")
    .data(data.values)
    .enter()
    .append("g")
    .attr("class", "node")
    .call(force.drag);
  //Determine the size of the circle based on the value
  //Also, the color is changed according to the value.
  node.append("circle")
    .attr("r", function(d) { return d.score * .1; })
    .attr("opacity", .67)
    .attr("fill", function(d){
      if (d.score <= 300) {
        return "#449944"
      } else if (d.score > 300 && d.score <= 500) {
        return "#33AA33"
      } else if (d.score > 500 && d.score <= 750) {
        return "#22CC22"
      } else if (d.score > 750 && d.score <= 1000) {
        return "#11DD11"
      }
    });
  //Add vocabulary and its values
  node.append("text")
    .text(function(d){ return d.word; })
    .attr('fill', '#fff')
    .attr('font-size', 24)
    .attr('dx', -16)
    .attr('dy', -5);
  node.append("text")
    .text(function(d){ return d.score; })
    .attr('fill', '#fff')
    .attr('dx', -25)
    .attr('dy', 15);
  //Directing
  force.on("tick", function() {
    node
    .attr('transform', function(d) {
      return 'translate('+ Math.max(20, Math.min(width-20, d.x)) + ','
        + '' + Math.max(20, Math.min(height-20, d.y)) + ')'; }); 
  });
})

Completion of demo application

Then git push to Heroku and you're done.

heroku create myapp
git push heroku master
heroku open

D3.js demo application http://d3js-data-clips.herokuapp.com/

Summary

This time, I visualized the features obtained using D3.js and moved it on Heroku.

At this point, the list of words and numerical values that characterize the document have been obtained, so I think that it can be applied to match with other data sources or to investigate the relationship between multiple documents. I will.

Recommended Posts

Visualize the characteristic vocabulary of a document with D3.js
Visualize the inner layer of a neural network
Visualize the behavior of the sorting algorithm with matplotlib
Visualize the range of interpolation and extrapolation with python
Visualize the appreciation status of art works with OpenCV
Calculate the product of matrices with a character expression?
Visualize the orbit of Hayabusa2
A network diagram was created with the data of COVID-19.
Measure the importance of features with a random forest tool
Visualize the results of decision trees performed with Python scikit-learn
Visualize the "regional color" of the city by applying document vectorization
Let's visualize the number of people infected with coronavirus with matplotlib
Visualize the flow rate of tweets with Diamond + Graphite + Grafana
Analyze the topic model of becoming a novelist with GensimPy3
The story of making a question box bot with discord.py
Process the contents of the file in order with a shell script
A story stuck with the installation of the machine learning library JAX
Save the result of the life game as a gif with python
Find the optimal value of a function with a genetic algorithm (Part 2)
[Statistics] Grasp the image of the central limit theorem with a graph
[python, ruby] fetch the contents of a web page with selenium-webdriver
[Introduction to StyleGAN] I played with "The Life of a Man" ♬
The story of making a standard driver for db with python.
Count the maximum concatenated part of a random graph with NetworkX
Get the URL of a JIRA ticket created with the jira-python library
Visualize the frequency of word occurrences in sentences with Word Cloud. [Python]
The idea of feeding the config file with a python file instead of yaml
Visualize the response status of the census 2020
The story of writing a program
The story of making a module that skips mail with python
Create a compatibility judgment program with the random module of python.
A story that visualizes the present of Qiita with Qiita API + Elasticsearch + Kibana
The story of a Parking Sensor in 10 minutes with GrovePi + Starter Kit
[AtCoder explanation] Control the A, B, C problems of ABC182 with Python!
Calculate the shortest route of a graph with Dijkstra's algorithm and Python
Get the number of searches with a regular expression. SeleniumBasic VBA Python
The story of having a hard time introducing OpenCV with M1 MAC
[AtCoder explanation] Control the A, B, C problems of ABC186 with Python!
Generate a list packed with the number of days in the current month.
[Introduction to Python] How to sort the contents of a list efficiently with list sort
[AtCoder explanation] Control the A, B, C problems of ABC185 with Python!
Calculate the probability of being a squid coin with Bayes' theorem [python]
Hit a method of a class instance with the Python Bottle Web API
Receive a list of the results of parallel processing in Python with starmap
The story of making a sound camera with Touch Designer and ReSpeaker
I made a GAN with Keras, so I made a video of the learning process.
I tried to visualize the text of the novel "Weathering with You" with WordCloud
Make a DNN-CRF with Chainer and recognize the chord progression of music
[AtCoder explanation] Control the A, B, C problems of ABC187 with Python!
Get the average salary of a job with specified conditions from indeed.com
I made a mistake in fetching the hierarchy with MultiIndex of pandas
[AtCoder explanation] Control the A, B, C problems of ABC184 with Python!
Align the size of the colorbar with matplotlib
Visualize the boundary values of the multi-layer perceptron
Check the existence of the file with python
Measure the relevance strength of a crosstab
Search the maze with the python A * algorithm
A quick overview of the Linux kernel
Visualize the effects of deep learning / regularization
[python] [meta] Is the type of python a type?
The third night of the loop with for