Databricks Certified Data Engineer Professional — Question 150
Which statement characterizes the general programming model used by Spark Structured Streaming?
Answer options
- A. Structured Streaming leverages the parallel processing of GPUs to achieve highly parallel data throughput.
- B. Structured Streaming is implemented as a messaging bus and is derived from Apache Kafka.
- C. Structured Streaming relies on a distributed network of nodes that hold incremental state values for cached stages.
- D. Structured Streaming models new data arriving in a data stream as new rows appended to an unbounded table.
Correct answer: D
Explanation
The correct answer, D, accurately depicts how Spark Structured Streaming treats incoming data as it is continuously added to an unbounded table. Option A is incorrect because Structured Streaming does not rely on GPU processing; instead, it works with distributed computing. Option B is misleading as it suggests a direct relationship with a messaging bus rather than focusing on the table abstraction. Option C is false because while it does involve distributed nodes, it does not specifically rely on incremental state values in the way described.