Databricks Certified Machine Learning Professional — Question 35
A machine learning engineer is monitoring categorical input variables for a production machine learning application. The engineer believes that missing values are becoming more prevalent in more recent data for a particular value in one of the categorical input variables.
Which of the following tools can the machine learning engineer use to assess their theory?
Answer options
- A. Kolmogorov-Smirnov (KS) test
- B. One-way Chi-squared Test
- C. Two-way Chi-squared Test
- D. Jenson-Shannon distance
- E. None of these
Correct answer: B
Explanation
The One-way Chi-squared Test is appropriate for assessing the distribution of categorical data and can help determine if the missing values are statistically significant in recent datasets. The Kolmogorov-Smirnov test and the Two-way Chi-squared Test are not suitable for this scenario, as they do not specifically address single categorical variable distributions. The Jenson-Shannon distance is more relevant for measuring similarity between probability distributions, making it unsuitable for this analysis.