AWS Certified Data Engineer – Associate (DEA-C01) — Question 182
A company has AWS resources in multiple AWS Regions. The company has an Amazon EFS file system in each Region where the company operates. The company’s data science team operates within only a single Region. The data that the data science team works with must remain within the team's Region.
A data engineer needs to create a single dataset by processing files that are in each of the company's Regional EFS file systems. The data engineer wants to use an AWS Step Functions state machine to orchestrate AWS Lambda functions to process the data.
Which solution will meet these requirements with the LEAST effort?
Answer options
- A. Peer the VPCs that host the EFS file systems in each Region with the VPC that is in the data science team’s Region. Enable EFS file locking. Configure the Lambda functions in the data science team's Region to mount each of the Region specific file systems. Use the Lambda functions to process the data.
- B. Configure each of the Regional EFS file systems to replicate data to the data science team's Region. In the data science team’s Region, configure the Lambda functions to mount the replica file systems. Use the Lambda functions to process the data.
- C. Deploy the Lambda functions to each Region. Mount the Regional EFS file systems to the Lambda functions. Use the Lambda functions to process the data. Store the output in an Amazon S3 bucket in the data science team’s Region.
- D. Use AWS DataSync to transfer files from each of the Regional EFS files systems to the file system that is in the data science team's Region. Configure the Lambda functions in the data science team's Region to mount the file system that is in the same Region. Use the Lambda functions to process the data.
Correct answer: D
Explanation
The correct answer, D, effectively addresses the requirement of keeping data within the data science team’s Region while minimizing effort by automating the file transfer process using AWS DataSync. Options A and B complicate the architecture by requiring VPC peering and data replication, respectively, which adds unnecessary complexity. Option C involves deploying Lambda functions in multiple Regions, increasing management overhead and violating the requirement to keep the data local to the team's Region.