[PYTHON] Set up a development environment for natural language processing

Install the following packages. The OS is ubuntu 16.04.

--python (3.5.0) --Language with many natural language processing libraries --pyenv --python version control package --MeCab (0.996) --Morphological analysis engine --CaboCha (0.69) --Dependency analysis engine --gensim (0.12.4) --A library that can use popular LDA and word2vec

python3,pyenv

For the time being, insert python. $ sudo apt-get install python Probably only this will install python2.7, so I will drop pyenv which manages the version of python. $ git clone https://github.com/yyuu/pyenv.git ~/.pyenv To use pyenv, add the following script to a shell config file like .zshenv.

export PYENV_ROOT="$HOME/.pyenv"
export PATH=$PATH:$PYENV_ROOT/bin
eval "$(pyenv init -)"

** Addendum (2017-12-11) ** I reversed the order of export. It cannot be done correctly unless PYENV_ROOT is defined first and called when defining PATH.

I'm using zsh, but when I call python from a shell script saved as a file, it becomes python2.7. I wrote all these settings in .zshrc, but if you look closely, .zshrc is a setting that only applies on the stream (when a person types a command), not in a shell script. It seems. .zshenv is a configuration file that is always executed when zsh is started. Write all environment variables in .zshenv.

Let's use pyenv. Check the list of python versions that can be installed. $ pyenv install -l

After confirming that there is 3.5.0, install python 3.5.0, change the version used, and update. If the final version check shows 3.5.0, it is successful.

$ pyenv install 3.5.0
$ pyenv global 3.5.0
$ pyenv rehash
$ python --version

Then install python's library management tool, pip. It will be used several times in the subsequent settings.

$ sudo apt-get install python-pip

Reference URL Super fast setup guide for Zsh beginners http://qiita.com/uasi/items/c4288dd835a65eb9d709 Minimum memo when using Python on Mac (pyenv edition) http://qiita.com/zaburo/items/dd1a8323633035614efc pyenv + virtualenv (CentOS7) http://qiita.com/saitou1978/items/e82421e29e118bd397cc If you want to use easy_install or pip with Python on Ubuntu http://tech.g.hatena.ne.jp/rx7/20101129/p1

MeCab

Install MeCab and other required packages. $ sudo apt-get install mecab mecab-ipadic libmecab-dev

If you insert mecab-ipadic, the character code will be utf-8. If libmecab-dev is not included, it will cause anger if mecab-config is not included. Dictionaries that can be used with MeCab include ipadic and juman, but this time we will use mecab-ipadic-neologd. The feature of this dictionary is that it contains many proper nouns, symbols, emoticons, etc. Let's install it with the following command.

$ git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git <Path to save location>
$ cd <Saved location>/mecab-ipadic-neologd
$ ./bin/install-mecab-ipadic-neologd -h

I think the location to save should be the same as the existing dictionary. You can find the location of the dictionary you are currently using with mecab -D. To use it, use the following command.

mecab -d <save location> / mecab-ipadic-neologd /

Next, bind so that MeCab can be used from python. Use the following command.

$ pip install mecab-python3

If there is no error with the following command, it is successful.

$ python
>>> import MeCab

Reference URL mecab-ipadic-NEologd : Neologism dictionary for MeCab https://github.com/neologd/mecab-ipadic-neologd/blob/master/README.ja.md

CaboCha

I tried to install it with the following command as I did before.

$ sudo apt-get install subversion
$ pip install 'svn+http://cabocha.googlecode.com/svn/trunk/python@r99'

I was angry that I couldn't find the package. I tried various other methods, but in the end I decided to drop it by the method described on the official website. First of all, the library CRF ++ required for cabocha, but I guess it didn't work with wget, so I downloaded it from the link below.

CRF++ https://drive.google.com/folderview?id=0B4y35FiV1wh7fngteFhHQUN2Y1B5eUJBNHZUemJYQV9VWlBUb3JlX0xBdWVZTWtSbVBneU0&usp=drive_web#list

I dropped cabocha itself with wget. The version is 0.67 at the link destination, but let's set it to the latest 0.69.

$ tar zvxf  CRF++-0.58.tar.gz
$ cd CRF++-0.58
$ ./configure
$ make 
$ sudo make install
$ sudo ldconfig
$ wget http://cabocha.googlecode.com/files/cabocha-0.69.tar.bz2
$ tar xjvf cabocha-0.69.tar.bz
$ cd cabocha-0.69
$ ./configure --with-charset=UTF8 --with-posset=IPA
$ make
$ sudo make install
$ sudo ldconfig
$ cabocha

Next, bind to python3. Since it does not support python3 in the original state, modify setup.py a little. setup.py is under cabocha-69 / python.

setup.py


#Omission
def cmd2(str):
#   return string.split (cmd1(str))Delete this line
    return cmd1(str).split() #Insert this line
#Omission

After fixing it, install it with the following command.

$ cd cabocha-0.69/python
$ sudo python setup.py build_ext
$ sudo python setup.py install
$ sudo ldconfig

When using cabocha, specify the dictionary as shown in the following command.

cabocha -d <save location> / mecab-ipadic-neologd /

If there is no error with the following command, it is successful.

$ python
>>> import CaboCha

Reference URL CaboCha official website https://taku910.github.io/cabocha/ Cabocha installation notes http://qiita.com/ShingoOikawa/items/ef4ac2929ec19599a3cf I wrote a patch to use CaboCha with python3 http://nosada.hatenablog.com/entry/2014/03/14/002954 Specify dictionary with CaboCha (python) http://studylog.hateblo.jp/entry/2016/01/25/134507

gensim

You can easily install it with the following command. numpy and scipy are libraries required to use gensim.

$ pip install numpy
$ pip install scipy
$ pip install gensim

Check if it can be installed with the following command as in the example.

$ python
>>> import numpy
>>> import scipy
>>> import gensim

Reference URL gensim:installation https://radimrehurek.com/gensim/install.html

This completes the environment settings. Thank you for your hard work.

At the end

Most of them referred to the articles I wrote on my own blog before.

Upgrade from python2.7 to 3.5 (NLP flavor) http://woody-kawagoe.hatenablog.com/entry/2016/04/18/222535

I was addicted to it again and wanted to write various things on qiita, so I rewrote it and posted it on qiita.

Recommended Posts

Set up a development environment for natural language processing
Set up a Python development environment on Marvericks
Set up TinyGo development environment for VS Code
Building an environment for natural language processing with Python
Set up a Python development environment with Sublime Text 2
Natural language processing for busy people
Set up a Python development environment with Visual Studio Code
Building a Python development environment for AI development
Creating a development environment for machine learning
Try to set up a Vim test environment quite seriously (for Python)
Build a C language development environment with a container
Prepare a programming language environment for data analysis
I created a Dockerfile for Django's development environment
Set up a UDP server in C language
Build a Kubernetes environment for development on Ubuntu
How to set up a local development server
Set up an Objective-C 2.0 development environment on Linux
Build a mruby development environment for ESP32 (Linux)
Python: Natural language processing
RNN_LSTM2 Natural language processing
How to set up a Python environment using pyenv
Build a local development environment for Laravel6.X on Mac
3. Natural language processing with Python 4-1. Analysis for words with KWIC
How to build a development environment for TensorFlow (1.0.0) (Mac)
Development environment suitable for ArcPy
Set Up for Mac (Python)
Natural language processing 1 Morphological analysis
Natural language processing 3 Word continuity
[For organizing] Python development environment
Natural language processing 2 Word similarity
[Memo] Build a development environment for Django + Nuxt.js with Docker
Building a development environment for Android apps-creating Android apps in Python
Let's set up a survival prediction model for Titanic passengers
How to set up WSL2 on Windows 10 and create a study environment for Linux commands
I want to set up a GUI development environment with Python or Golang on Mac
Study natural language processing with Kikagaku
100 natural language processing knocks Chapter 4 Commentary
Build a local development environment for Lambda + Python using Serverless Framework
[Natural language processing] Preprocessing with Japanese
[For beginners] Django -Development environment construction-
Easily build a natural language processing model with BERT + LightGBM + optuna
Artificial language Lojban and natural language processing (artificial language processing)
[Introduction to RasPi4] Environment construction; natural language processing system mecab, etc. .. .. ♪
Python development environment options for May 2020
Set up Python environment on CentOS
Emacs settings for Python development environment
Dockerfile with the necessary libraries for natural language processing in python
Loose articles for those who want to start natural language processing
Why is distributed representation of words important for natural language processing?
100 language processing knock 2020 "for Google Colaboratory"
Create a development environment for Go + MySQL + nginx with Docker (docker-compose)
I made a development environment for Django 3.0 with Docker, Docker-compose, Poetry
Preparing to start natural language processing
Natural language processing analyzer installation summary
How to set up the development environment of ev3dev [Windows version]
[DynamoDB] [Docker] Build a development environment for DynamoDB and Django with docker-compose
Create a Python development environment locally at the fastest speed (for beginners)
[Development environment] How to create a data set close to the production DB
3. Natural language processing with Python 1-2. How to create a corpus: Aozora Bunko
Python development environment for macOS using venv 2016
Easily build a development environment with Laragon