APICS Certified Supply Chain Professional (CSCP) — Question 188

A data engineer is designing a data pipeline. The source system generates files in a shared directory that is also used by other processes. As a result, the files should be kept as is and will accumulate in the directory. The data engineer needs to identify which files are new since the previous run in the pipeline, and set up the pipeline to only ingest those new files with each run.
Which of the following tools can the data engineer use to solve this problem?

Answer options

Correct answer: E

Explanation

Auto Loader is designed to efficiently handle the ingestion of new files in a directory without affecting existing files, making it the ideal choice for this scenario. The other options, such as Unity Catalog and Delta Lake, are more focused on data governance and storage, while Databricks SQL and Data Explorer do not specifically address the need for incremental file ingestion.