Databricks Certified Data Engineer Professional — Question 181
An upstream system has been configured to pass the date for a given batch of data to the Databricks Jobs API as a parameter. The notebook to be scheduled will use this parameter to load data with the following code:
df = spark.read.format("parquet").load(f"/mnt/source/(date)")
Which code block should be used to create the date Python variable used in the above code block?
Answer options
- A. date = spark.conf.get("date")
- B. import sys date = sys.argv[1]
- C. date = dbutils.notebooks.getParam("date")
- D. dbutils.widgets.text("date", "null") date = dbutils.widgets.get("date")
Correct answer: D
Explanation
The correct answer is D because it initializes a widget for the date parameter, allowing the notebook to retrieve the value passed from the upstream system. Options A and C are incorrect as they do not utilize the appropriate method for retrieving parameters from widgets, and option B is not suitable since it relies on command-line arguments, which is not applicable in this context.