AWS Certified Solutions Architect – Associate (SAA-C03) — Question 768

A company hosts a data lake on Amazon S3. The data lake ingests data in Apache Parquet format from various data sources. The company uses multiple transformation steps to prepare the ingested data. The steps include filtering of anomalies, normalizing of data to standard date and time values, and generation of aggregates for analyses.

The company must store the transformed data in S3 buckets that data analysts access. The company needs a prebuilt solution for data transformation that does not require code. The solution must provide data lineage and data profiling. The company needs to share the data transformation steps with employees throughout the company.

Which solution will meet these requirements?

Answer options

Correct answer: C

Explanation

AWS Glue DataBrew is a visual, no-code data preparation tool that natively provides both data profiling and data lineage capabilities, fitting the requirement perfectly. Its transformation steps are saved as reusable and shareable 'recipes' that can be distributed to other employees. Other options like AWS Glue Studio, Amazon EMR, or Amazon Athena either require writing code/SQL or do not offer built-in, no-code data profiling and lineage.