AWS Certified Data Analytics – Specialty — Question 112
A company receives data from its vendor in JSON format with a timestamp in the file name. The vendor uploads the data to an Amazon S3 bucket, and the data is registered into the company's data lake for analysis and reporting. The company has configured an S3 Lifecycle policy to archive all files to S3 Glacier after 5 days.
The company wants to ensure that its AWS Glue crawler catalogs data only from S3 Standard storage and ignores the archived files. A data analytics specialist must implement a solution to achieve this goal without changing the current S3 bucket configuration.
Which solution meets these requirements?
Answer options
- A. Use the exclude patterns feature of AWS Glue to identify the S3 Glacier files for the crawler to exclude.
- B. Schedule an automation job that uses AWS Lambda to move files from the original S3 bucket to a new S3 bucket for S3 Glacier storage.
- C. Use the excludeStorageClasses property in the AWS Glue Data Catalog table to exclude files on S3 Glacier storage.
- D. Use the include patterns feature of AWS Glue to identify the S3 Standard files for the crawler to include.
Correct answer: C
Explanation
The correct answer is C because the excludeStorageClasses property allows the AWS Glue crawler to ignore files stored in S3 Glacier, ensuring only S3 Standard files are cataloged. Option A is incorrect because excluding patterns alone does not specifically target storage classes. Option B suggests moving files, which would complicate the setup unnecessarily. Option D, while it mentions including S3 Standard files, does not address the need to exclude S3 Glacier files effectively.