IBM Planning Analytics V2.0 Developer — Question 20
What is the main criteria for separating training and test data when training a machine learning system?
Answer options
- A. Test data should be as random as possible, so that it tests the boundaries of the system.
- B. Training data should be random, but the test data should be created by a subject matter expert.
- C. Training data should be as random as possible, in order to create a robust model.
- D. The data set should be representative and randomly split in to a training set and a test set so that they do not overlap.
Correct answer: C
Explanation
The correct answer, C, emphasizes the importance of randomness in the training data to create a robust model. Option A incorrectly suggests that test data randomness is paramount for boundary testing, while B misplaces emphasis on expert-created test data. Option D, while highlighting the necessity for non-overlapping sets, does not address the randomness needed in training data specifically.