AWS Certified Machine Learning – Specialty — Question 254

A pharmaceutical company performs periodic audits of clinical trial sites to quickly resolve critical findings. The company stores audit documents in text format. Auditors have requested help from a data science team to quickly analyze the documents. The auditors need to discover the 10 main topics within the documents to prioritize and distribute the review work among the auditing team members. Documents that describe adverse events must receive the highest priority.

A data scientist will use statistical modeling to discover abstract topics and to provide a list of the top words for each category to help the auditors assess the relevance of the topic.

Which algorithms are best suited to this scenario? (Choose two.)

Answer options

Correct answer: A, C

Explanation

The correct algorithms, Latent Dirichlet allocation (LDA) and Neural topic modeling (NTM), are both effective for topic modeling and can identify abstract topics within text data. The other options, like Random forest classifier, Linear support vector machine, and Linear regression, are primarily classification or regression techniques and are not designed for discovering topics in unstructured text.