[PYTHON] How to use shogun

A memorandum of SHOGUN, a machine learning library available at here. The installation method is described here.

1. Label


An example of a Binary Label that is a binary label. Represented by -1 or 1. It can be created from an array or a CSV file.

from modshogun import BinaryLabels

#Randomly generate 5 labels
label = BinaryLabels(5)

label.get_num_labels() 
→ 5

label.get_values()
→ array([  2.00000000e+000,   2.00000000e+000,   1.38338381e-322,0.00000000e+000,   0.00000000e+000])

from modshogun import CSVFile

#Can be created from a CSV file prepared in advance
label_from_csv = BinaryLabels(CSVFile(file_path))

2. Features


It can be created from a numpy matrix or a CSV file. Note that one feature is represented by one column, not one row.

from modshogun import RealFeatures
import numpy as np

#3x3 random matrix
feat_arr = np.random.rand(3, 3)
→ array([[ 0.02818103,  0.72093824,  0.92727711],
       [ 0.66853622,  0.14594737,  0.90522684],
       [ 0.97941639,  0.14188234,  0.80854797]])

#Initialization of Real Features
features = RealFeatures(feat_arr)

#Display of features
features.get_feature_matrix(features)
→ array([[ 0.02818103,  0.72093824,  0.92727711],
       [ 0.66853622,  0.14594737,  0.90522684],
       [ 0.97941639,  0.14188234,  0.80854797]])

#Get features for a particular column
features.get_feature_vector(1)
→array([ 0.72093824,  0.14594737,  0.14188234])

#Types of features(Number of rows)
features.get_num_features()
→3

#Number of features(Number of columns)
features.get_num_vectors()
→3

from modshogun import CSVFile

#Of course, this can also be read from a CSV file.
feats_from_csv = RealFeatures(CSVFile(file_path))

3. Kernel


An example with a chi-square kernel.

from modshogun import Chi2Kernel, RealFeatures, CSVFile

#Training data
feats_train = RealFeatures(CSVFile(file_path))

#Test data
feats_test = RealFeatures(CSVFile(file_path))

#Kernel width
width = 1.4

#size_cache settings
size_cache = 10

#Kernel generation
kernel = Chi2Kernel(feats_train, feats_train, width, size_cache)

#Kernel training
kernel.init(feats_train, feats_test)

4.SVMLight


Classification by support vector machine using SVMLight

from modshogun import SVMLight, CSVFile, BinaryLabels, RealFeatures, Chi2Kernel

feats_train = RealFeatures(CSVFile(train_data_file_path))
feats_test = RealFeatures(CSVFile(test_data_file_path))

kernel = Chi2Kernel(feats_train, feats_train, 1.4, 10)

labels = BinaryLabels(CSVFile(label_traindat_path))
 
C = 1.2
epsilon = 1e-5
num_threads = 1
svm = SVMLight(C, kernel, labels)
svm.set_epsilon(epsilon)
svm.parallel.set_num_threads(num_threads)
svm.train()

kernel.init(feats_train, feats_test)
res = svm.apply().get_labels()

res
→array(Result label)

5. Cross-validation


Import the CrossValidation class. To initialize CrossValidation

from modshogun import LibLinear, BinalyLabels, RealFeatures, CrossValidationSplitting, ContingencyTableEvaluation, CSVFile, ACCURACY

#Classifier
classifier = LibLinear(L2R_L2LOSS_SVC)
#Feature value
features = RealFeatures(CSVFile(feature_file_path))
#label
labels = BinalyLabels(CSVFile(label_file_path))


#SplittingStrategy seems to be able to specify how to split the data. I don't know much about it. In this example, it is divided into five.
splitting_strategy = CrossValidationSplitting(labels, 5)

#Evaluation criteria class. ACCURACY is just a constant declared in E PontingencyTableMeasureType.
evaluation_criterium = ContingencyTableEvaluation(ACCURACY)

#Cross-validation class.
cross_validation = CrossValidation(classifier, features, labels. splitting_strategy, evaluation_criterium)
cross_validation.set_autolock(False)

#Setting the number of repetitions
cross_validation.set_num_runs(10)

#95%Confidence interval setting? I'm not sure
cross_validation.set_conf_int_alpha(0.05)

#The return value is the CEvaluationResult class
result = cross_validation.evaluate()

#You can get the average value of the cross-validation results.
print result.mean

#Click here if you want to output everything else
print result.print_result()

6. Grid search


If you can do so far, grid search can be done quite easily. GridSearchModelSelection of CModelSelection class

If you pass and initialize it, you can already search the grid.

---Omit up to initialize CrossValidation class in LibLinear---

from modshogun import ModelSelectionParameters, R_EXP
from modsghoun import GridSearchModelSelection

#An object that stores parameters to change
param_tree_root = ModelSelectionParameters()

#Parameter C1
C1 = ModelSelectionParameters("C1")
param_tree_root.append_child(c1)

build_values()Minimum value, maximum value, step(Parameter increase)To set. R_EXP(index),R_LOG(Logarithm),R_LINEAR(linear)There are three types, but details are unknown.
c1.build_values(-1.0, 0.0, R_EXP)

c2 = ModelSelectionParameters("C2")
param_tree_root.append_child(c2)
c2.build_values(-1.0, 0.0, R_EXP)

#Print here_tree()When you execute param_tree_You can see that root has a tree structure.
param_tree_root.print_tree()
→root with
	 with values: vector=[0.5,1]
	 with values: vector=[0.5,1]

#Generate grid search class
model_selection = GridSearchModelSelection(cross_validation, param_tree_root)

#This will automatically determine the best parameters and return an object of class CParameterCombination. Also, if you pass True as an argument, the combination of each parameter and the result will be output.
best_parameters = model_selection.select_model()

#It is also possible to apply the best returned parameters as classifier or model parameters.
best_parameters.apply_to_machine(classifier)
result = cross_validation.evaluate()

7. Save and load the created model


Objects can be saved and loaded using the save_serializable () and load_serializable () functions of CSGObject, which is the basis of almost all classes.

from modshogun import SerializableAsciiFile
from modshogun import MulticlassLabels
from numpy import array

save_labels = MulticlassLabels(array([1.0, 2, 3]))

#File name setting Supports csv and asc
save_file = SerializableAsciiFile("foo.csv", "w")
#Save file
save_labels.save_serializable(save_file)

load_file = SerializableAsciiFile("foo.csv", "r")
load_labels = MulticlassLabels()
load_labels.load_serializable(load_file)
→[ 1.  2.  3.]

8. Spit log


You can spit out logs for each object. Pass MSG_DEBUG for the debug log and MSG_ERROR for the error log only. Declared with EMessageType.

from modshogun import MSG_DEBUG, MSG_ERROR
from modshogun import Chi2Kernel
from modshogun import LibSVM

kernel = Chi2Kernel()
svm = LibSVM()

kernel.io.set_loglevel(MSG_DEBUG)
svm.io.set_loglevel(MSG_ERROR)

in conclusion


It's kind of messy, so if you have any requests, please comment.

Recommended Posts

How to use shogun
How to use xml.etree.ElementTree
How to use Python-shell
How to use tf.data
How to use virtualenv
How to use Seaboan
How to use image-match
How to use Pandas 2
How to use Virtualenv
How to use numpy.vectorize
How to use pytest_report_header
How to use partial
How to use Bio.Phylo
How to use SymPy
How to use x-means
How to use WikiExtractor.py
How to use IPython
How to use virtualenv
How to use Matplotlib
How to use iptables
How to use numpy
How to use TokyoTechFes2015
How to use venv
How to use dictionary {}
How to use Pyenv
How to use list []
How to use python-kabusapi
How to use OptParse
How to use return
How to use dotenv
How to use pyenv-virtualenv
How to use Go.mod
How to use imutils
How to use import
How to use Qt Designer
How to use search sorted
[gensim] How to use Doc2Vec
python3: How to use bottle (2)
Understand how to use django-filter
How to use the generator
[Python] How to use list 1
How to use FastAPI ③ OpenAPI
How to use Python argparse
How to use IPython Notebook
How to use Pandas Rolling
[Note] How to use virtualenv
How to use redis-py Dictionaries
Python: How to use pydub
[Python] How to use checkio
[Go] How to use "... (3 periods)"
How to use Django's GeoIp2
[Python] How to use input ()
How to use the decorator
[Introduction] How to use open3d
How to use Python lambda
How to use Jupyter Notebook
[Python] How to use virtualenv
python3: How to use bottle (3)
python3: How to use bottle
How to use Google Colaboratory
How to use Python bytes