Google Cloud Professional Machine Learning Engineer — Question 228

You need to use TensorFlow to train an image classification model. Your dataset is located in a Cloud Storage directory and contains millions of labeled images. Before training the model, you need to prepare the data. You want the data preprocessing and model training workflow to be as efficient, scalable, and low maintenance as possible. What should you do?

Answer options

Correct answer: A

Explanation

Option A is the correct choice because it effectively utilizes a Dataflow job to create sharded TFRecord files, which is optimal for large datasets. Options B, C, and D involve unnecessary steps or less efficient methods for organizing data, which can complicate the workflow and increase maintenance efforts.