Google Cloud Professional Machine Learning Engineer — Question 48
You are training a TensorFlow model on a structured dataset with 100 billion records stored in several CSV files. You need to improve the input/output execution performance. What should you do?
Answer options
- A. Load the data into BigQuery, and read the data from BigQuery.
- B. Load the data into Cloud Bigtable, and read the data from Bigtable.
- C. Convert the CSV files into shards of TFRecords, and store the data in Cloud Storage.
- D. Convert the CSV files into shards of TFRecords, and store the data in the Hadoop Distributed File System (HDFS).
Correct answer: C
Explanation
The correct answer is C because converting CSV files into TFRecords is optimal for TensorFlow, as TFRecords are designed for efficient data loading and processing. While options A and B propose moving data to managed services, they don't specifically cater to TensorFlow's performance needs. Option D suggests using HDFS, which may not provide the same level of integration and optimization for TensorFlow compared to using Cloud Storage with TFRecords.