Databricks Certified Data Engineer Professional — Question 12
Which statement characterizes the general programming model used by Spark Structured Streaming?
Answer options
- A. Structured Streaming leverages the parallel processing of GPUs to achieve highly parallel data throughput.
- B. Structured Streaming is implemented as a messaging bus and is derived from Apache Kafka.
- C. Structured Streaming uses specialized hardware and I/O streams to achieve sub-second latency for data transfer.
- D. Structured Streaming models new data arriving in a data stream as new rows appended to an unbounded table.
- E. Structured Streaming relies on a distributed network of nodes that hold incremental state values for cached stages.
Correct answer: D
Explanation
The correct answer, D, accurately describes how Structured Streaming conceptualizes incoming data as new rows in an unbounded table, which is fundamental to its operation. Options A, B, and C are incorrect as they misrepresent the nature of Structured Streaming, which does not rely on GPUs, does not function solely as a messaging bus, and does not require specialized hardware for low latency. Option E, while involving distributed nodes, overlooks the core table abstraction that defines Structured Streaming.