Using the scikit-learn function train_test_split, as shown below,
dataset_train,dataset_test=train_test_split(dataset,train_size=0.8)
They will divide the data set into training and test data, but
If there are many classes (for example, 100 classes),
Training and test data may have different classes.
For example, the number of classes included in the training data is 100, while
The test data can be 98.
Train_test_split is just randomly shuffling and splitting, so
This is likely to happen if the number of data in the class is unbalanced.
To split data to keep the number of classes intact, use the
What should I do?
Thank you for your cooperation.
python scikit-learn
Just to guess, is this what you mean?
dataset_train,dataset_test=train_test_split(dataset,stratify=dataset,train_size=0.8)
356 Unity Virtual Stick Does Not Return to Center When You Release Your Finger
341 Understanding the Meaning of mpm prefork Settings
342 Memory layouts learned in theory don't work out as expected when actually printed as addresses.
367 To Limit Column Values to Strings in a String List Using sqlalchemy
© 2023 OneMinuteCode. All rights reserved.