Databricks Certified Data Engineer Professional — Question 162
The data engineering team is configuring environments for development, testing, and production before beginning migration on a new data pipeline. The team requires extensive testing on both the code and data resulting from code execution, and the team wants to develop and test against data as similar to production data as possible.
A junior data engineer suggests that production data can be mounted to the development and testing environments, allowing pre-production code to execute against production data. Because all users have admin privileges in the development environment, the junior data engineer has offered to configure permissions and mount this data for the team.
Which statement captures best practices for this situation?
Answer options
- A. All development, testing, and production code and data should exist in a single, unified workspace; creating separate environments for testing and development complicates administrative overhead.
- B. In environments where interactive code will be executed, production data should only be accessible with read permissions; creating isolated databases for each environment further reduces risks.
- C. Because access to production data will always be verified using passthrough credentials, it is safe to mount data to any Databricks development environment.
- D. Because Delta Lake versions all data and supports time travel, it is not possible for user error or malicious actors to permanently delete production data; as such, it is generally safe to mount production data anywhere.
Correct answer: B
Explanation
Option B is correct because it emphasizes the importance of restricting access to production data to read-only permissions in interactive environments, which minimizes the risk of accidental changes or data loss. Option A incorrectly suggests that combining all environments simplifies management, which can actually increase risk. Option C is misleading as relying solely on passthrough credentials does not protect against other potential risks. Option D falsely assumes that versioning and time travel completely mitigate the risks associated with mounting production data.