AWS Certified Machine Learning – Specialty — Question 226
An online retail company wants to develop a natural language processing (NLP) model to improve customer service. A machine learning (ML) specialist is setting up distributed training of a Bidirectional Encoder Representations from Transformers (BERT) model on Amazon SageMaker. SageMaker will use eight compute instances for the distributed training.
The ML specialist wants to ensure the security of the data during the distributed training. The data is stored in an Amazon S3 bucket.
Which combination of steps should the ML specialist take to protect the data during the distributed training? (Choose three.)
Answer options
- A. Run distributed training jobs in a private VPC. Enable inter-container traffic encryption.
- B. Run distributed training jobs across multiple VPCs. Enable VPC peering.
- C. Create an S3 VPC endpoint. Then configure network routes, endpoint policies, and S3 bucket policies.
- D. Grant read-only access to SageMaker resources by using an IAM role.
- E. Create a NAT gateway. Assign an Elastic IP address for the NAT gateway.
- F. Configure an inbound rule to allow traffic from a security group that is associated with the training instances.
Correct answer: A, C, D
Explanation
Option A is correct because running distributed training in a private VPC with inter-container traffic encryption ensures that data remains secure during processing. Option C is also necessary as creating an S3 VPC endpoint and configuring the respective policies helps secure access to S3 resources. Option D is important since granting read-only access through an IAM role restricts any unintended modifications to the data. Options B, E, and F do not address the specific security needs during distributed training effectively.