Generate Word Cloud from case law data in python3

--Download case data (PDF) - http://www.courts.go.jp/app/hanrei_jp/search1 --Convert PDF to text using Automator - http://qiita.com/yuki_bg/items/2e6efe93992d83752312 --After that, install MeCab, wordcloud, etc. -Install anaconda Maybe you need to --Clone the strong mecab dictionary (neologd)

(zsh)


brew install mecab mecab-ipadic
pip3.5 install mecab-python3

pip3.5 install wordcloud
pip3.5 install numpy Pillow matplotlib #Libraries required to use wordcloud
#brew install numpy # error
#brew install homebrew/python/numpy # smthngs wrong...
#sudo xcode-select --install # doesnt work...

###MeCab dictionary with new words added"mecab-ipadic-neologd"Get
cd /usr/local/lib/mecab/dic
git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git
./bin/install-mecab-ipadic-neologd -n

wordcloud.py


import MeCab
from os import path
from wordcloud import WordCloud
import matplotlib.pyplot as plt

pos_list = [10, 11, 31, 32, 34]
pos_list.extend(list(range(36,50)))
pos_list.extend([59, 60, 62, 67])
def create_mecab_list(text):
	mecab_list = []
	mecab = MeCab.Tagger("-Ochasen -d /usr/local/lib/mecab/dic/mecab-ipadic-neologd")
	mecab.parse("")
	# encoding = text.encode('utf-8')
	node = mecab.parseToNode(text)
	while node:
		if len(node.surface) > 1:
			if node.posid in pos_list:
				morpheme = node.surface
				mecab_list.append(morpheme)
		node = node.next
	return mecab_list

with open("./086064_hanrei_utf8.txt", "r") as file:
	hanrei = file.read()

string = " ".join(create_mecab_list(hanrei))#.decode("utf-8")


fpath = "/Library/Fonts/Hiragino Maru Go ProN W4.ttc"
wordcloud = WordCloud(
	# background_color="white",
	max_font_size=40,
	relative_scaling=.5,
	# width=900,
	# height=500,
	font_path=fpath
	).generate(string)
plt.figure()
plt.imshow(wordcloud)
plt.axis("off")
plt.show()

(zsh)


python3 wordcloud.py

Screen Shot 2016-10-01 at 1.51.19 AM.png

Recommended Posts

Generate Word Cloud from case law data in python3
Get data from Quandl in Python
Module to generate word N-gram in Python
Generate a class from a string in Python
Generate C language from S-expressions in Python
Get time series data from k-db.com in Python
Split camel case string word by word in Python
case class in python
Receive dictionary data from a Python program in AppleScript
Get data from GPS module at 10Hz in Python
Handle Ambient data in Python
Generate rounded thumbnails in Python
Display UTM-30LX data in Python
Generate QR code in Python
Generate 8 * 8 (64) cubes in Blender Python
How to generate exponential pulse time series data in python
Hit REST in Python to get data from New Relic
Get Leap Motion data in Python.
Python: Exclude tags from html data
[Python] Generate QR code in memory
Read Protocol Buffers data in Python3
Hit treasure data from Python Pandas
Post a message from IBM Cloud Functions to Slack in Python
Handle NetCDF format data in Python
Extract text from images in Python
Hashing data in R and Python
Visualize the frequency of word occurrences in sentences with Word Cloud. [Python]
Conditional element extraction from data frame: R is% in%, Python is .isin ()
Law of large numbers in python
Extract strings from files in Python
Detect Japanese characters from images using Google's Cloud Vision API in Python
Copy data from Amazon S3 to Google Cloud Storage with Python (boto)
Cloud Pak for Data object operation example in Python (WML client, project_lib)
Generate a first class collection in Python
Get additional data in LDAP with python
Run Cloud Dataflow (Python) from App Engine
Receive textual data from mysql with python
Get exchange rates from open exchange rates in Python
[Note] Get data from PostgreSQL with Python
Data input / output in Python (CSV, JSON)
Generate AWS-S3 signed (time-limited) URLs in Python
Ant book in python: Sec. 2-4, data structures
Use PostgreSQL data type (jsonb) from Python
Python: Reading JSON data from web API
Try working with binary data in Python
Revived from "no internet access" in Python
Prevent double boot from cron in Python
Generate Japanese test data with Python faker
Get Google Fit API data in Python
Automatically generate Python Docstring Comment in Emacs
Use Google Cloud Vision API from Python
[Python] Web application from 0! Hands-on (4) -Data molding-
Download images from URL list in Python
Get battery level from SwitchBot in Python
Easily graph data in shell and Python
How to switch python versions in cloud9
Text mining with Python ② Visualization with Word Cloud
Python: Preprocessing in machine learning: Data conversion
Convert from Markdown to HTML in Python
Get Precipitation Probability from XML in Python
Get Cloud Logging available in Python in 10 minutes