AWS Certified Data Engineer – Associate (DEA-C01) — Question 9
A data engineer needs to securely transfer 5 TB of data from an on-premises data center to an Amazon S3 bucket. Approximately 5% of the data changes every day. Updates to the data need to be regularly proliferated to the S3 bucket. The data includes files that are in multiple formats. The data engineer needs to automate the transfer process and must schedule the process to run periodically.
Which AWS service should the data engineer use to transfer the data in the MOST operationally efficient way?
Answer options
- A. AWS DataSync
- B. AWS Glue
- C. AWS Direct Connect
- D. Amazon S3 Transfer Acceleration
Correct answer: A
Explanation
AWS DataSync is designed for transferring large amounts of data between on-premises storage and AWS services, making it ideal for this scenario. It automates and schedules data transfers, efficiently handling the daily changes. In contrast, AWS Glue is primarily for ETL processes, AWS Direct Connect provides dedicated network connections, and Amazon S3 Transfer Acceleration is optimized for faster uploads but does not handle ongoing synchronization as effectively as DataSync.