Output Excel data in separate writing using Python3 + xlrd + mecab

Purpose

Text data exists in an Excel file, and the text data is divided and output in tab-delimited format (tsv file).

Operation check environment

Mac OS 10.12.3 Python 3.6.0 mecab of 0.996 mecab-python3==0.7

Preparation

Install mecab and dictionary

Installation with Homebrew (Mac) Use MeCab from Python3 Or make and install it yourself Merphological analysis engine MeCab can be used with Python3 (March 2016 version)

Install python bindings

Using mecab with Python3

Install xlrd

[Python] Read Excel with pandas

Source code

mecab.py


#!/usr/bin/env python

import xlrd
import MeCab
import sys

args = sys.argv

#Open excel file
book = xlrd.open_workbook(args[1])
sh = book.sheet_by_index(0)

# header
print("\t".join(('text','price')))

#Perspective with word-separation option
t = MeCab.Tagger ("-Owakati")

#About each line
for rx in range(1, sh.nrows):

    #Pick up the columns you need
    text = sh.cell_value(rowx=rx, colx=1)
    price = sh.cell_value(rowx=rx, colx=2)

    #Delete line breaks
    text = text.replace('\n','').replace('\r','')

    try:
        #Perspective and line break removal
        m = t.parse(text).replace('\n','')

        #output
        print( "\t".join((m, price)) )

    except RuntimeError as e:
        print("RuntimeError:" + e)

Run

$ ./mecab.py [excel file name]





Recommended Posts

Output Excel data in separate writing using Python3 + xlrd + mecab
Data input / output in Python (CSV, JSON)
Get Youtube data in Python using Youtube Data API
Summary of Excel operations using OpenPyXL in Python
Data analysis using Python 0
Data cleaning using Python
Japanese output in Python
Python programming in Excel
Create a data collection bot in Python using Selenium
Collectively register data in Firestore using csv file in Python
Get LEAD data using Marketo's REST API in Python
Handle Ambient data in Python
Display UTM-30LX data in Python
Inflating text data by retranslation using google translate in Python
Graph time series data in Python using pandas and matplotlib
A memo of writing a basic function in Python using recursion
Data analysis using python pandas
Translate using googletrans in Python
Using Python mode in Processing
Read Fortran output in python
[Python ORM] Notation when writing SQL using subquery in IN clause in SQLAlchemy
Get Leap Motion data in Python.
GUI programming in Python using Appjar
Precautions when using pit in Python
Data acquisition using python googlemap api
Read Protocol Buffers data in Python3
Get data from Quandl in Python
Notes on using MeCab from Python
Handle NetCDF format data in Python
Try using LevelDB in Python (plyvel)
Make standard output non-blocking in Python
Windows10: Install MeCab library in python
Using global variables in python functions
Hashing data in R and Python
Let's see using input in python
Infinite product in Python (using functools)
Edit videos in Python using MoviePy
Export and output files in Python
Excel graph creation using python xlwings
Interactively output BPE using python curses
Reading and writing text in Python
Handwriting recognition using KNN in Python
web coder tried excel in Python
Try using Leap Motion in Python
Depth-first search using stack in Python
When using MeCab with virtualenv python
When using regular expressions in Python
When writing a program in Python
GUI creation in python using tkinter 2
How to plot galaxy visible light data using OpenNGC database in python
Get additional data in LDAP with python
Mouse operation using Windows API in Python
Notes using cChardet and python3-chardet in Python 3.3.1.
Try using the Wunderlist API in Python
From Python to using MeCab (and CaboCha)
GUI creation in python using tkinter part 1
Get Suica balance in Python (using libpafe)
(Bad) practice of using this in Python
Slowly hash passwords using bcrypt in Python
Ant book in python: Sec. 2-4, data structures
Using venv in Windows + Docker environment [Python]