[PYTHON] Create a Word Cloud from an academic program

Introduction

I wanted to investigate the trends of the Japanese Society for Artificial Intelligence (JSAI2020) that I participated in the other day, so I created Word Cloud from the lecture title of the program.

What is Word Cloud?

The frequency of appearance of words in a sentence is checked, and the size of the letters is changed according to the frequency. You may have seen a visualization of a lot of muttering words by creating a Word Cloud from tweets.

Preparation of text

I downloaded the JSAI2020 conference proceedings from here, opened the file index.html, and copied and pasted the session list into Notepad.

If the program is published in PDF, you can extract the text as follows.


from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage

input_path = './Program.pdf'
output_path = 'Program.txt'

manager = PDFResourceManager()

with open(output_path, "wb") as output:
    with open(input_path, 'rb') as input:
        with TextConverter(manager, output, codec='utf-8', laparams=LAParams()) as conv:
            interpreter = PDFPageInterpreter(manager, conv)
            for page in PDFPage.get_pages(input):
                interpreter.process_page(page)

Delete unnecessary information

Removes non-lecture information from the text, such as the date and presenter. Since the presenter was surrounded by (), the characters in () were deleted. Regular expressions are used to specify the characters in () and the time / date.

import re

with open('Program.txt', mode='rt', encoding='utf-8') as fo:
    Program = fo.read()

#[]To()Conversion to
Program = Program.replace("[", "(")
Program = ProgramP.replace("]", ")")
#()Delete the characters enclosed in(This time[]The above code is required because it cannot be deleted.)
Program = re.sub(r'\([^)]*\)', '', Program)
#Delete time / date
Program = re.sub(r'((0?|1)[0-9]|2[0-3])[:][0-5][0-9]?', '', Program)
Program = re.sub(r'2020([0-1]?[0-9])Month([0-3]?[0-9])Day?', '', Program)
Program = re.sub('Time / venue', '', Program)
Program = re.sub('session', '', Program)
Program = re.sub('Announcement list', '', Program)
Program = re.sub('Venue', '', Program)

with open('Program_new.txt', 'w') as f:
  print(Program, file=f)

Creating a Word Cloud

from matplotlib import pyplot as plt
from wordcloud import WordCloud

with open('Program_new.txt', mode='rt', encoding='utf-8') as fo:
    cloud_text = fo.read()

#font_path specifies the Japanese font on your device
word_cloud = WordCloud(width=640, height=480, font_path="/System/Library/AssetsV2/com_apple_MobileAsset_Font6/c7c8e5cb889b80fff0175bf138a7b66c6f027f21.asset/AssetData/ToppanBunkyuMidashiGothicStdN-ExtraBold.otf").generate(cloud_text)
word_cloud.to_file('wordcloud4.png')

plt.imshow(word_cloud)
plt.axis('off')
plt.show()

wordcloud3.png

Recommended Posts

Create a Word Cloud from an academic program
I made a tool to create a word cloud from wikipedia
Create an academic society program with combinatorial optimization
If you want to create a Word Cloud.
Scraping your Qiita articles to create a word cloud
Create a pandas Dataframe from a string.
Create an instance of a predefined class from a string in Python
Create an API that returns data from a model using turicreate
Create a word cloud with only positive / negative words on Twitter
How to create a clone from Github
Create a standings from JFL match results
Create a word frequency counter with Python 3.4
Create a deb file from a python package
How to create a repository from media
Create an English word app with python
Create a dataframe from excel using pandas
WEB scraping with python and try to make a word cloud from reviews
I made a package to create an executable file from Hy source code
Edit Excel from Python to create a PivotTable
How to create a function object from a string
Create a C array from a Python> Excel sheet
Create an executable file in a scripting language
[Python pandas] Create an empty DataFrame from an existing DataFrame
Create a game UI from scratch with pygame2!
Create a record table from JFL match results
Create a phylogenetic tree from Biopyton using ClustalW2
Create a score ranking from JFL match results
Create a decision tree from 0 with Python (1. Overview)
Create a datetime object from a string in Python (Python 3.3)
Create an account from CentOS mail server construction
How to use NUITKA-Utilities hinted-compilation to easily create an executable file from a Python script
How to create an article from the command line
Receive dictionary data from a Python program in AppleScript
From buying a computer to running a program with python
Create a random number with an arbitrary probability density
Python script to create a JSON file from a CSV file
[Python] Create a Tkinter program distribution file with cx_Freeze
How to create a kubernetes pod from python code
Create a shape on the trajectory of an object
Generate Word Cloud from case law data in python3
Create a machine learning environment from scratch with Winsows 10
An introduction to machine learning from a simple perceptron
Let's create a program that automatically registers ID/PW from CSV to Bitwarden with Python + Selenium