AWS Certified Solutions Architect – Professional (SAP-C02) — Question 451
A company is running a web-crawling process on a list of target URLs to obtain training documents for machine learning training algorithms. A fleet of Amazon EC2 t2.micro instances pulls the target URLs from an Amazon Simple Queue Service (Amazon SQS) queue. The instances then write the result of the crawling algorithm as a .csv file to an Amazon Elastic File System (Amazon EFS) volume. The EFS volume is mounted on all instances of the fleet.
A separate system adds the URLs to the SQS queue at infrequent rates. The instances crawl each URL in 10 seconds or less.
Metrics indicate that some instances are idle when no URLs are in the SQS queue. A solutions architect needs to redesign the architecture to optimize costs.
Which combination of steps will meet these requirements MOST cost-effectively? (Choose two.)
Answer options
- A. Use m5.8xlarge instances instead of t2.micro instances for the web-crawling process. Reduce the number of instances in the fleet by 50%.
- B. Convert the web-crawling process into an AWS Lambda function. Configure the Lambda function to pull URLs from the SQS queue.
- C. Modify the web-crawling process to store results in Amazon Neptune.
- D. Modify the web-crawling process to store results in an Amazon Aurora Serverless MySQL instance.
- E. Modify the web-crawling process to store results in Amazon S3.
Correct answer: B, E
Explanation
AWS Lambda is ideal for short-lived (under 10 seconds) and infrequent tasks because it scales to zero when there are no SQS messages, eliminating the idle EC2 costs. Amazon S3 is a highly durable and significantly more cost-effective storage option for static .csv files compared to Amazon EFS. Other database options like Amazon Neptune and Amazon Aurora Serverless are overly complex and expensive for storing simple flat files.