Databricks Certified Data Engineer Professional — Question 105
The data engineering team is configuring environments for development, testing, and production before beginning migration on a new data pipeline. The team requires extensive testing on both the code and data resulting from code execution, and the team wants to develop and test against data as similar to production data as possible.
A junior data engineer suggests that production data can be mounted to the development and testing environments, allowing pre-production code to execute against production data. Because all users have admin privileges in the development environment, the junior data engineer has offered to configure permissions and mount this data for the team.
Which statement captures best practices for this situation?
Answer options
- A. All development, testing, and production code and data should exist in a single, unified workspace; creating separate environments for testing and development complicates administrative overhead.
- B. In environments where interactive code will be executed, production data should only be accessible with read permissions; creating isolated databases for each environment further reduces risks.
- C. As long as code in the development environment declares USE dev_db at the top of each notebook, there is no possibility of inadvertently committing changes back to production data sources.
- D. Because Delta Lake versions all data and supports time travel, it is not possible for user error or malicious actors to permanently delete production data; as such, it is generally safe to mount production data anywhere.
- E. Because access to production data will always be verified using passthrough credentials, it is safe to mount data to any Databricks development environment.
Correct answer: B
Explanation
Option B is correct because it emphasizes the importance of restricting access to production data and suggests creating isolated databases to mitigate risks. Option A is incorrect as it overlooks the administrative and security benefits of separate environments. Option C is flawed because merely declaring USE dev_db does not prevent accidental changes to production data. Options D and E are misleading as they downplay the risks associated with mounting production data in development environments.