Read and analyze arff format dataset with python scipy.io

Introduction

I needed to analyze the weka dataset in arff format, and I had a little trouble getting it to be read and used by python, so I will summarize it.

Read

You can load it using loadaiff () in scipy.io. (See scipy.io reference) https://docs.scipy.org/doc/scipy-0.18.1/reference/tutorial/io.html

readarff.py


from scipy.io import arff
import numpy as np
dataset, meta = arff.loadarff("DARPA99Week3-46.arff")

Array conversion

To analyze with scipy or scikit-learn, I want to make it a normal numpy array, so convert it with the following script. (See "Prepare scipy.io loadarff result for scikit-learn" on Stack Overflow)

arff1.py


ds=np.asarray(dataset.tolist(), dtype=np.float32)
target=np.asarray(ds[:,22].tolist(), dtype=np.int8)
train=ds[:, :21]

Or

arff2.py


train_data = dataset[meta.names()[:-1]]
train_array = train_data.view(np.float).reshape(data.shape + (-1,))

graph display

Once you have a numpy array, you can use matplotlib etc. to display graphs and perform analysis.

hist.py


%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

duration=ds[:,16]
plt.hist(duration, bins=50)
plt.show()

reference

scipy.io reference https://docs.scipy.org/doc/scipy-0.18.1/reference/tutorial/io.html

Prepare scipy.io loadarff result for scikit-learn (Stack Overflow) http://stackoverflow.com/questions/22873434/prepare-scipy-io-loadarff-result-for-scikit-learn

Recommended Posts

Read and analyze arff format dataset with python scipy.io
Read json file with Python, format it, and output json
Read CSV and analyze with Pandas and Seaborn
Read and format a csv file mixed with comma tabs with Python pandas
Read JSON with Python and output as CSV
[Python3] Read and write with datetime isoformat with json
Read and write files with Slackbot ~ Bot development with Python ~
Programming with Python and Tkinter
Read csv with python pandas
Python and hardware-Using RS232C with Python-
Python indentation and string format
python with pyenv and venv
Format json with Vim (with python)
String format with Python% operator
Works with Python and R
Read json data with python
[Python] Read the csv file and display the figure with matplotlib
Analyze stocks with python and look for favorable trading phases
Communicate with FX-5204PS with Python and PyUSB
Shining life with Python and OpenCV
Extract bigquery dataset and table list with python and output as CSV
Robot running with Arduino and python
Install Python 2.7.9 and Python 3.4.x with pip.
Neural network with OpenCV 3 and Python 3
Scraping with Node, Ruby and Python
Scraping with Python, Selenium and Chromedriver
Scraping with Python and Beautiful Soup
Read and use Python files from Python
JSON encoding and decoding with python
Hadoop introduction and MapReduce with Python
[GUI with Python] PyQt5-Drag and drop-
[python] Read information with Redmine API
Reading and writing NetCDF with Python
[python] Extract text from pdf and read characters aloud with Open-Jtalk
I played with PyQt5 and Python3
Read files in parallel with Python
Reading and writing CSV with Python
Multiple integrals with Python and Sympy
Read fbx from python with cinema4d
Coexistence of Python2 and 3 with CircleCI (1.0)
Easy modeling with Blender and Python
Let's analyze voice with Python # 1 FFT
Sugoroku game and addition game with python
FM modulation and demodulation with Python
Create and read messagepacks in Python
[Python] Format when to_csv with pandas
Put Cabocha 0.68 on Windows and try to analyze the dependency with Python
Read the file with python and delete the line breaks [Notes on reading the file]
Read CSV file with Python and convert it to DataFrame as it is
Communicate between Elixir and Python with gRPC
Data pipeline construction with Python and Luigi
Calculate and display standard weight with python
Monitor Mojo outages with Python and Skype
FM modulation and demodulation with Python Part 3
Read CSV file with python (Download & parse CSV file)
Python installation and package management with pip
Using Python and MeCab with Azure Databricks
POST variously with Python and receive with Flask
Capturing images with Pupil, python and OpenCV
Fractal to make and play with Python
A memo with Python2.7 and Python3 on CentOS