AWS Certified Solutions Architect – Associate (SAA-C03) — Question 405
A payment processing company records all voice communication with its customers and stores the audio files in an Amazon S3 bucket. The company needs to capture the text from the audio files. The company must remove from the text any personally identifiable information (PII) that belongs to customers.
What should a solutions architect do to meet these requirements?
Answer options
- A. Process the audio files by using Amazon Kinesis Video Streams. Use an AWS Lambda function to scan for known PII patterns.
- B. When an audio file is uploaded to the S3 bucket, invoke an AWS Lambda function to start an Amazon Textract task to analyze the call recordings.
- C. Configure an Amazon Transcribe transcription job with PII redaction turned on. When an audio file is uploaded to the S3 bucket, invoke an AWS Lambda function to start the transcription job. Store the output in a separate S3 bucket.
- D. Create an Amazon Connect contact flow that ingests the audio files with transcription turned on. Embed an AWS Lambda function to scan for known PII patterns. Use Amazon EventBridge to start the contact flow when an audio file is uploaded to the S3 bucket.
Correct answer: C
Explanation
Amazon Transcribe is the native AWS service designed to convert speech to text and features built-in PII redaction to automatically identify and mask sensitive customer data. Triggering an AWS Lambda function upon S3 upload to start this transcription job provides a seamless, serverless automation pipeline. Other options are incorrect because Amazon Textract is designed for document text extraction rather than audio, and custom Lambda code for PII redaction is more complex and less efficient than using Amazon Transcribe's native capabilities.