AWS Certified Solutions Architect – Professional — Question 333

A financial services company receives a regular data feed from its credit card servicing partner. Approximately 5,000 records are sent every 15 minutes in plaintext, delivered over HTTPS directly into an Amazon S3 bucket with server-side encryption. This feed contains sensitive credit card primary account number
(PAN) data. The company needs to automatically mask the PAN before sending the data to another S3 bucket for additional internal processing. The company also needs to remove and merge specific fields, and then transform the record into JSON format. Additionally, extra feeds are likely to be added in the future, so any design needs to be easily expandable.
Which solutions will meet these requirements?

Answer options

Correct answer: C

Explanation

AWS Glue is a fully managed serverless ETL service designed precisely for schema discovery, data transformation, and formatting tasks, making it highly scalable for future feeds. An AWS Glue crawler can define the schema, and an AWS Glue ETL job triggered by Lambda on file upload can easily mask PAN data and output the result in JSON format. Options A and B introduce unnecessary orchestration complexity with SQS queues and container scaling, while Option D is incorrect because Amazon Athena queries cannot natively trigger Amazon EMR ETL jobs.