I needed to analyze the weka dataset in arff format, and I had a little trouble getting it to be read and used by python, so I will summarize it.
You can load it using loadaiff () in scipy.io. (See scipy.io reference) https://docs.scipy.org/doc/scipy-0.18.1/reference/tutorial/io.html
readarff.py
from scipy.io import arff
import numpy as np
dataset, meta = arff.loadarff("DARPA99Week3-46.arff")
To analyze with scipy or scikit-learn, I want to make it a normal numpy array, so convert it with the following script. (See "Prepare scipy.io loadarff result for scikit-learn" on Stack Overflow)
arff1.py
ds=np.asarray(dataset.tolist(), dtype=np.float32)
target=np.asarray(ds[:,22].tolist(), dtype=np.int8)
train=ds[:, :21]
Or
arff2.py
train_data = dataset[meta.names()[:-1]]
train_array = train_data.view(np.float).reshape(data.shape + (-1,))
Once you have a numpy array, you can use matplotlib etc. to display graphs and perform analysis.
hist.py
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
duration=ds[:,16]
plt.hist(duration, bins=50)
plt.show()
scipy.io reference https://docs.scipy.org/doc/scipy-0.18.1/reference/tutorial/io.html
Prepare scipy.io loadarff result for scikit-learn (Stack Overflow) http://stackoverflow.com/questions/22873434/prepare-scipy-io-loadarff-result-for-scikit-learn
Recommended Posts