Databricks Certified Machine Learning Professional — Question 35

A machine learning engineer is monitoring categorical input variables for a production machine learning application. The engineer believes that missing values are becoming more prevalent in more recent data for a particular value in one of the categorical input variables.
Which of the following tools can the machine learning engineer use to assess their theory?

Answer options

Correct answer: B

Explanation

The One-way Chi-squared Test is appropriate for assessing the distribution of categorical data and can help determine if the missing values are statistically significant in recent datasets. The Kolmogorov-Smirnov test and the Two-way Chi-squared Test are not suitable for this scenario, as they do not specifically address single categorical variable distributions. The Jenson-Shannon distance is more relevant for measuring similarity between probability distributions, making it unsuitable for this analysis.