Underfitting and overfitting
Unable to capture logic when underlearning model is behind the data Overfitting training is overfitting to training data and deviates from the essence
Library import
import numpy as np
from sklearn.model_selection import train_test_splita
/ * train_test_splita = As you can see from the name, a training module * /
Creating data to divide
a = np.arange(1,101)
a
Output of divided data
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,
27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52,
53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,
66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78,
79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91,
92, 93, 94, 95, 96, 97, 98, 99, 100])
You can see that the values from 1 to 100 are stored as an array. / * The point here is that the data is an array → The procedure to divide the data into two becomes easier * /
b = np.arange(501,601)
b
array([501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513,
514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526,
527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539,
540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552,
553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565,
566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578,
579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591,
592, 593, 594, 595, 596, 597, 598, 599, 600])
Create split data in both a .b
Data split
train_test_split(a) #(b)And check if the same data can be divided
[array([87, 32, 90, 1, 2, 8, 51, 73, 22, 95, 4, 57, 27, 58, 48, 99, 96,
74, 72, 29, 76, 64, 3, 12, 53, 6, 18, 16, 65, 66, 63, 46, 39, 17,
91, 25, 15, 78, 83, 19, 45, 68, 33, 98, 97, 14, 44, 86, 80, 34, 70,
47, 54, 93, 94, 85, 42, 60, 92, 41, 61, 71, 89, 23, 21, 11, 84, 13,
82, 59, 49, 79, 36, 55, 5]),
array([ 24, 56, 40, 9, 69, 75, 10, 28, 38, 30, 62, 67, 100,
88, 37, 20, 7, 31, 77, 43, 35, 26, 81, 52, 50])]
Divide the two data (objects) into four
a_train, a_test = train_test_split(a, b, test_size=0.2, random_state=365)
Check the result
Confirmation of shape
a_train,shape.a_test,shape
((80,), (20,))
Confirmation of data contents
a_train
array([ 25, 32, 99, 73, 91, 66, 3, 59, 94, 1, 8, 15, 90,
54, 31, 20, 77, 82, 30, 35, 95, 42, 38, 7, 11, 50,
21, 48, 2, 17, 10, 58, 68, 43, 41, 16, 88, 72, 79,
100, 80, 39, 24, 86, 22, 23, 62, 76, 18, 47, 55, 26,
60, 19, 71, 64, 51, 63, 65, 28, 12, 78, 13, 44, 75,
87, 40, 4, 29, 49, 37, 57, 27, 74, 6, 45, 92, 34,
53, 83])
/ * If you output in this state, the data is shuffled. In most cases the data is shuffled. * /
Confirmation of data contents
a_test
array([ 9, 69, 81, 56, 33, 93, 84, 61, 46, 89, 85, 67, 97, 5, 70, 36, 98,
96, 14, 52])
/ * Benefits of rain_test_split Array or matrix Can be divided into random training and test data * /
Recommended Posts