Google Cloud Professional Data Engineer — Question 37
You receive data files in CSV format monthly from a third party. You need to cleanse this data, but every third month the schema of the files changes. Your requirements for implementing these transformations include:
✑ Executing the transformations on a schedule
✑ Enabling non-developer analysts to modify transformations
✑ Providing a graphical tool for designing transformations
What should you do?
Answer options
- A. Use Dataprep by Trifacta to build and maintain the transformation recipes, and execute them on a scheduled basis
- B. Load each month's CSV data into BigQuery, and write a SQL query to transform the data to a standard schema. Merge the transformed tables together with a SQL query
- C. Help the analysts write a Dataflow pipeline in Python to perform the transformation. The Python code should be stored in a revision control system and modified as the incoming data's schema changes
- D. Use Apache Spark on Dataproc to infer the schema of the CSV file before creating a Dataframe. Then implement the transformations in Spark SQL before writing the data out to Cloud Storage and loading into BigQuery
Correct answer: A
Explanation
The correct answer is A because Dataprep by Trifacta allows for the creation and management of transformation recipes in a user-friendly graphical interface, enabling analysts to modify them without coding. Options B and D involve more complex processes that require SQL or Spark knowledge, which may not be suitable for non-developer analysts. Option C relies on Python coding, which contradicts the requirement for non-developers to make changes easily.