AWS Certified Machine Learning – Specialty — Question 212
A data scientist has 20 TB of data in CSV format in an Amazon S3 bucket. The data scientist needs to convert the data to Apache Parquet format.
How can the data scientist convert the file format with the LEAST amount of effort?
Answer options
- A. Use an AWS Glue crawler to convert the file format.
- B. Write a script to convert the file format. Run the script as an AWS Glue job.
- C. Write a script to convert the file format. Run the script on an Amazon EMR cluster.
- D. Write a script to convert the file format. Run the script in an Amazon SageMaker notebook.
Correct answer: B
Explanation
The correct answer is B because using an AWS Glue job is a serverless option that requires minimal setup and allows for the efficient conversion of large datasets. The other options, while feasible, involve more manual effort or additional infrastructure management, making them less suitable for the least amount of effort criterion.