Microsoft Azure AI Fundamentals — Question 192
You have 100 instructional videos that do NOT contain any audio. Each instructional video has a script.
You need to generate a narration audio file for each video based on the script.
Which type of workload should you use?
Answer options
- A. language modeling
- B. speech recognition
- C. speech synthesis
- D. translation
Correct answer: C
Explanation
The correct answer is C, speech synthesis, because it involves generating audio from written text, which is exactly what is needed to create narration for the videos. Option A, language modeling, focuses on predicting the next word in a sentence rather than generating audio. Option B, speech recognition, is about converting spoken language into text, which is not applicable here. Option D, translation, involves converting text from one language to another, which is not relevant to the task of creating audio from a script.