Google Cloud Professional Machine Learning Engineer — Question 77

You are profiling the performance of your TensorFlow model training time and notice a performance issue caused by inefficiencies in the input data pipeline for a single 5 terabyte CSV file dataset on Cloud Storage. You need to optimize the input pipeline performance. Which action should you try first to increase the efficiency of your pipeline?

Answer options

Correct answer: C

Explanation

The best first action to improve the input pipeline's efficiency is to split the CSV into multiple files and use a parallel interleave transformation (Option C). This allows for better parallel processing and reduces bottlenecks in data loading. The other options may improve performance but not as effectively as splitting the data for parallel processing.