Purpose of this time

SIGNATE Practice, predicting wine varieties.

Main subject

Learning algorithm used

SVC --Logistic Regression --RandomForest

As a result of trying three algorithms, the random forest was the most accurate, so it was adopted as the final classifier.

code

Data reading

`wine-learning.py`


wine_data = pd.read_csv('train.tsv',sep='\t')
wine_test = pd.read_csv('test.tsv',sep='\t')

I used read_table last time, but I also tried using read_csv because it's a big deal. I feel that table is easier. By the way, both methods are the same, so there is no right one.

Separation of feature data and teacher data

`wine-learning.py`


X = wine_data.loc[:,['Alcohol','Malic acid','Ash','Alcalinity of ash','Magnesium','Total phenols','Flavanoids','Nonflavanoid ohenols','Proanthocyanins','Color intensity','Hue','OD280/OD315 of diluted wines','Proline']].values
y = wine_data.loc[:,'Y'].values

I want to do something about this because it tends to be long when there are many variables. Let's consider if there is a way to improve it when doing the next task. By the way, the test data is finally here

`wine-learning.py`


Xt = wine_test.loc[:,['Alcohol','Malic acid','Ash','Alcalinity of ash','Magnesium','Total phenols','Flavanoids','Nonflavanoid ohenols','Proanthocyanins','Color intensity','Hue','OD280/OD315 of diluted wines','Proline']].values

Divided into training data and test data

`wine-learning.py`


X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2)

This time as well, the data was divided at a ratio of 8: 2.

Delete missing values

`wine-learning.py`


X_train = X_train[:, ~np.isnan(X_train).any(axis=0)]
X_test = X_test[:, ~np.isnan(X_test).any(axis=0)]
Xt = Xt[:, ~np.isnan(Xt).any(axis=0)]

A missing value that did not exist before the division suddenly appeared. I did not understand the cause, so I will try to verify it soon. This time, we decided to delete the missing values.

Model learning

SVC

`wine-learning.py`


clf = svm.SVC()
clf.fit(X_train, y_train)

Logistic regression

`wine-learning.py`


clf = LogisticRegression()
clf.fit(X_train, y_train)

Random forest

`wine-learning.py`


clf = RandomForestClassifier(n_estimators=500, random_state=0)
clf.fit(X_train, y_train)

Random_state was set to 0, and n_estimators (number of decision trees) was set to 500.

Model evaluation

`wine-learning.py`


y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print('Correct answer rate= ' , accuracy)

As in the example, the accuracy rate was calculated using the ʻaccuracy` function.

SVC correct answer rate

Correct answer rate=  0.6111111111111112

Correct answer rate for logistic regression

Correct answer rate=  0.8888888888888888

Random forest correct answer rate

Correct answer rate=  1.0

Classification

`wine-learning.py`


X_pred = np.array(Xt)
y_pred = clf.predict(X_pred)
print(y_pred)

result

I did it (Pachi Pachi)

Consideration

--Change the learning algorithm depending on the amount of variables. --Reduce the number of variables (remove dimensions) and then apply the learning algorithm. --Understand the characteristics of the algorithm in the first place. --Investigating the cause of missing values that occurred at the time of data division --Adoption of mixed matrix

[PYTHON] I tried using Random Forest

Purpose of this time

Main subject

Learning algorithm used

code

Data reading

wine-learning.py

Separation of feature data and teacher data

wine-learning.py

wine-learning.py

Divided into training data and test data

wine-learning.py

Delete missing values

wine-learning.py

Model learning

wine-learning.py

Logistic regression

wine-learning.py

Random forest

wine-learning.py

Model evaluation

wine-learning.py

SVC correct answer rate

Correct answer rate for logistic regression

Random forest correct answer rate

Classification

wine-learning.py

result

Consideration

`wine-learning.py`

`wine-learning.py`

`wine-learning.py`

`wine-learning.py`

`wine-learning.py`

`wine-learning.py`

`wine-learning.py`

`wine-learning.py`

`wine-learning.py`

`wine-learning.py`