Using the scikit-learn function train_test_split, as shown below,
They will divide the data set into training and test data, but
If there are many classes (for example, 100 classes),
Training and test data may have different classes.
For example, the number of classes included in the training data is 100, while
The test data can be 98.
Train_test_split is just randomly shuffling and splitting, so
This is likely to happen if the number of data in the class is unbalanced.
To split data to keep the number of classes intact, use the
What should I do?
Thank you for your cooperation.python scikit-learn
StratifiedShuffleSplit to keep class proportional and split.
Just to guess, is this what you mean?
© 2022 OneMinuteCode. All rights reserved.