CompTIA DataX (DY0-001) — Question 1
Which of the following issues should a data scientist be most concerned about when generating a synthetic data set?
Answer options
- A. The data set consuming too many resources
- B. The data set having insufficient features
- C. The data set having insufficient row observations
- D. The data set not being representative of the population
Correct answer: D
Explanation
The correct answer is D because ensuring that the synthetic data set accurately reflects the characteristics of the population is crucial for the validity of any analyses performed on it. Options A, B, and C, while relevant, are secondary concerns compared to the representativeness of the data, which directly impacts the reliability of the results.