Databricks Certified Data Engineer Associate — Question 154
A data engineer is debugging a Python notebook in Databricks that processes a dataset using PySpark. The notebook fails with an error during a DataFrame transformation. The engineer wants to inspect the state of variables, such as the input DataFrame and intermediate results, to identify where the error occurs.
Which tool should the engineer use to debug the notebook and inspect the values of variables like DataFrames?
Answer options
- A. Use the Databricks CLI to download and analyze driver logs for detailed error messages
- B. Use the Python Notebook Interactive Debugger to set breakpoints and inspect variable values in real-time
- C. Use the Ganglia UI to monitor cluster resource usage and identify hardware issues
- D. Use the Spark UI to analyze the execution plan and identify stages where the job failed
Correct answer: B
Explanation
The correct answer is B because the Python Notebook Interactive Debugger allows the engineer to set breakpoints and examine variable values directly within the notebook, making it easier to identify issues. Options A, C, and D, while useful for other types of analysis, do not provide the real-time inspection of variable states that the engineer needs for debugging the notebook.