Databricks Certified Machine Learning Professional — Question 19

A machine learning engineering team wants to build a continuous pipeline for data preparation of a machine learning application. The team would like the data to be fully processed and made ready for inference in a series of equal-sized batches.
Which of the following tools can be used to provide this type of continuous processing?

Answer options

Correct answer: B

Explanation

Structured Streaming is designed for continuous data processing in Apache Spark, allowing data to be handled in real-time and in equal-sized batches. The other options, while useful in various contexts, do not specifically provide the continuous processing capability required for the task described. For instance, Spark UDFs are used for custom processing, but not for continuous streaming.