A Generative AI Engineer has written scalable PySpark code to ingest unstructured PDF doc…

Question

A Generative AI Engineer has written scalable PySpark code to ingest unstructured PDF documents and chunk them in preparation for storing in a Databricks Vector Search index. Currently, the two columns of their dataframe include the original filename as a string and an array of text chunks from that document. What set of steps should the Generative AI Engineer perform to store the chunks in a ready-to-ingest manner for Databricks Vector Search?

Accepted Answer

Correct answer: B. B. Flatten the dataframe to one chunk per row, create a unique identifier for each row, and enable change feed on the output Delta table. — The correct answer B outlines the necessary steps to ensure each chunk is uniquely identifiable and compatible with Delta Lake's change data capture features. Options A and C do not adequately prepare the data for ingestion, while D, although close, does not mention enabling change feed, which is crucial for tracking changes in the output Delta table.

Databricks Certified Generative AI Engineer Associate — Question 46

Answer options

Correct answer: B

Explanation