Databricks Certified Data Engineer Associate — Question 162
A data engineer needs to parse only png files in a directory that contains files with different suffixes.
Which code should the data engineer use to achieve this task?
Answer options
- A. df = spark.readStream.format("cloudFiles") \ .option("cloudFiles.format", "binaryFile") \ .append("/*.png")
- B. df = spark.readstream. format("cloudFiles") \ .option("cloudFiles.format", "binaryFile") \ .option("pathGlobfilter", "*.png") \ .load()
- C. df = spark.readStream.format("cloudFiles") \ .option("cloudFiles.format", "binaryFile") \ .option("pathGlobfilter", "*.png") \ .append()
- D. df = spark.readstream.format("cloudFiles") \ .option("cloudFiles.format", "binaryFile") \ .load("/*.png")
Correct answer: B
Explanation
Option B is correct because it uses the .option("pathGlobfilter", "*.png") method, which allows the data engineer to specify that only png files should be loaded. Options A and D incorrectly use .append() and .load() with a wildcard path, which does not filter by file type. Option C also does not correctly implement the load method, as it uses .append() instead of .load() to read files.