Google Cloud Professional Machine Learning Engineer — Question 270
You work at an organization that maintains a cloud-based communication platform that integrates conventional chat, voice, and video conferencing into one platform. The audio recordings are stored in Cloud Storage. All recordings have an 8 kHz sample rate and are more than one minute long. You need to implement a new feature in the platform that will automatically transcribe voice call recordings into a text for future applications, such as call summarization and sentiment analysis. How should you implement the voice call transcription feature following Google-recommended best practices?
Answer options
- A. Use the original audio sampling rate, and transcribe the audio by using the Speech-to-Text API with synchronous recognition.
- B. Use the original audio sampling rate, and transcribe the audio by using the Speech-to-Text API with asynchronous recognition.
- C. Upsample the audio recordings to 16 kHz, and transcribe the audio by using the Speech-to-Text API with synchronous recognition.
- D. Upsample the audio recordings to 16 kHz, and transcribe the audio by using the Speech-to-Text API with asynchronous recognition.
Correct answer: B
Explanation
The correct answer is B because using asynchronous recognition with the original sampling rate is recommended for longer audio files, allowing for better performance and handling of larger workloads. Option A is incorrect as synchronous recognition is not ideal for lengthy recordings, while options C and D suggest unnecessary upsampling of the audio which is not needed in this context.