Modelle für maschinelles Lernen neigen dazu, Trainingsdaten zu übertrainieren. Daher ist es üblich, die vorliegenden Daten in Trainingsdaten und Testdaten für die Leistungsbewertung (Validierung) zu unterteilen. Die Erklärung zur Validierung von Scicit-Learn ist über verschiedene Methoden dieser Unterteilung leicht zu verstehen. https://scikit-learn.org/stable/modules/cross_validation.html#cross-validation

1. Für Stratified K Fold of Scikit-Learn

Bei der Klassifizierung möchte ich so teilen, dass die Verteilung der korrekten Beschriftungen gleich ist. Daher denke ich, dass häufig die geschichtete K-Falte des Scikit-Lernens verwendet wird.

`python`


import numpy as np
from sklearn.model_selection import StratifiedKFold

X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
y = np.array([0, 0, 1, 1])
skf = StratifiedKFold(n_splits=2)
for train_index, test_index in skf.split(X, y):
    print("TRAIN:", train_index, "TEST:", test_index)
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

`Ergebnis`


TRAIN: [1 3] TEST: [0 2]
TRAIN: [0 2] TEST: [1 3]

Dies ist ausreichend, wenn Sie nur ein Etikett haben, jedoch nicht mehrere Etiketten unterstützen.

`python`


import numpy as np
from sklearn.model_selection import StratifiedKFold

X = np.array([[1,2], [3,4], [1,2], [3,4], [1,2], [3,4], [1,2], [3,4]])
y = np.array([[0,0], [0,0], [0,1], [0,1], [1,1], [1,1], [1,0], [1,0]])
skf = StratifiedKFold(n_splits=2)

for train_index, test_index in skf.split(X, y):
    print("TRAIN:", train_index, "TEST:", test_index)
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

`Ergebnis`


ValueError: Supported target types are: ('binary', 'multiclass'). Got 'multilabel-indicator' instead.

2. Für iterstrat

iterstrat unterstützt mehrere Labels https://github.com/trent-b/iterative-stratification

Installation

`terminal`


pip install iterative-stratification

Verwendung (wenn nicht gemischt)

`python`


from iterstrat.ml_stratifiers import MultilabelStratifiedKFold
import numpy as np

X = np.array([[1,2], [3,4], [1,2], [3,4], [1,2], [3,4], [1,2], [3,4]])
y = np.array([[0,0], [0,0], [0,1], [0,1], [1,1], [1,1], [1,0], [1,0]])

mskf = MultilabelStratifiedKFold(n_splits=2, shuffle=True, random_state=0)

for train_index, test_index in mskf.split(X, y):
    print("TRAIN:", train_index, "TEST:", test_index)
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

`Ergebnis`


TRAIN: [0 3 4 6] TEST: [1 2 5 7]
TRAIN: [1 2 5 7] TEST: [0 3 4 6]

Verwendung (beim Mischen)

`python`


from iterstrat.ml_stratifiers import MultilabelStratifiedShuffleSplit
import numpy as np

X = np.array([[1,2], [3,4], [1,2], [3,4], [1,2], [3,4], [1,2], [3,4]])
y = np.array([[0,0], [0,0], [0,1], [0,1], [1,1], [1,1], [1,0], [1,0]])

msss = MultilabelStratifiedShuffleSplit(n_splits=3, test_size=0.5, random_state=0)

for train_index, test_index in msss.split(X, y):
   print("TRAIN:", train_index, "TEST:", test_index)
   X_train, X_test = X[train_index], X[test_index]
   y_train, y_test = y[train_index], y[test_index]

`Ergebnis`


TRAIN: [1 2 5 7] TEST: [0 3 4 6]
TRAIN: [2 3 6 7] TEST: [0 1 4 5]
TRAIN: [1 2 5 6] TEST: [0 3 4 7]

Verwendung (beim Bootstrap)

`python`


from iterstrat.ml_stratifiers import RepeatedMultilabelStratifiedKFold
import numpy as np

X = np.array([[1,2], [3,4], [1,2], [3,4], [1,2], [3,4], [1,2], [3,4]])
y = np.array([[0,0], [0,0], [0,1], [0,1], [1,1], [1,1], [1,0], [1,0]])

rmskf = RepeatedMultilabelStratifiedKFold(n_splits=2, n_repeats=2, random_state=0)

for train_index, test_index in rmskf.split(X, y):
   print("TRAIN:", train_index, "TEST:", test_index)
   X_train, X_test = X[train_index], X[test_index]
   y_train, y_test = y[train_index], y[test_index]

`Ergebnis`


TRAIN: [0 3 4 6] TEST: [1 2 5 7]
TRAIN: [1 2 5 7] TEST: [0 3 4 6]
TRAIN: [0 1 4 5] TEST: [2 3 6 7]
TRAIN: [2 3 6 7] TEST: [0 1 4 5]

[PYTHON] Verwenden Sie iter trat für einen mehrschichtigen, geschichteten Lebenslauf

1. Für Stratified K Fold of Scikit-Learn

python

Ergebnis

python

Ergebnis

2. Für iterstrat

Installation

terminal

Verwendung (wenn nicht gemischt)

python

Ergebnis

Verwendung (beim Mischen)

python

Ergebnis

Verwendung (beim Bootstrap)

python

Ergebnis

`python`

`Ergebnis`

`python`

`Ergebnis`

`terminal`

`python`

`Ergebnis`

`python`

`Ergebnis`

`python`

`Ergebnis`