Google Cloud Professional Machine Learning Engineer — Question 313
You work at an organization that maintains a cloud-based communication platform that integrates conventional chat, voice, and video conferencing into one platform. The audio recordings are stored in Cloud Storage. All recordings have a 16 kHz sample rate and are more than one minute long. You need to implement a new feature in the platform that will automatically transcribe voice call recordings into text for future applications, such as call summarization and sentiment analysis. How should you implement the voice call transcription feature while following Google-recommended practices?
Answer options
- A. Use the original audio sampling rate, and transcribe the audio by using the Speech-to-Text API with synchronous recognition.
- B. Use the original audio sampling rate, and transcribe the audio by using the Speech-to-Text API with asynchronous recognition.
- C. Downsample the audio recordings to 8 kHz, and transcribe the audio by using the Speech-to-Text API with synchronous recognition.
- D. Downsample the audio recordings to 8 kHz, and transcribe the audio by using the Speech-to-Text API with asynchronous recognition.
Correct answer: B
Explanation
The correct answer is B because using the original 16 kHz sample rate with asynchronous recognition allows for better accuracy and efficiency in processing longer audio recordings. Options A and C are less efficient, with A being synchronous and not ideal for long recordings, and C downsampling which can reduce audio quality. Option D, while using asynchronous recognition, compromises quality by downsampling the audio.