Databricks Certified Data Engineer Professional — Question 3
An upstream system has been configured to pass the date for a given batch of data to the Databricks Jobs API as a parameter. The notebook to be scheduled will use this parameter to load data with the following code: df = spark.read.format("parquet").load(f"/mnt/source/(date)")
Which code block should be used to create the date Python variable used in the above code block?
Answer options
- A. date = spark.conf.get("date")
- B. input_dict = input() date= input_dict["date"]
- C. import sys date = sys.argv[1]
- D. date = dbutils.notebooks.getParam("date")
- E. dbutils.widgets.text("date", "null") date = dbutils.widgets.get("date")
Correct answer: E
Explanation
The correct answer is E because it properly creates a widget for the date parameter, allowing the notebook to retrieve the value passed from the upstream system. Options A, B, C, and D do not correctly set up the interaction with the Databricks Jobs API as they either rely on different methods that do not utilize widgets or are not suited for this specific scenario.