CompTIA DataX (DY0-001) — Question 56
A data scientist is standardizing a large data set that contains website addresses. A specific string inside some of the web addresses needs to be extracted. Which of the following is the best method for extracting the desired string from the text data?
Answer options
- A. Regular expressions
- B. Named-entity recognition
- C. Large language model
- D. Find and replace
Correct answer: A
Explanation
Regular expressions are specifically designed for pattern matching and extraction of substrings from text, making them ideal for this task. Named-entity recognition is more about identifying entities within text rather than extracting specific strings. A large language model can generate text but is not optimized for precise string extraction, and 'Find and replace' is too simplistic for complex patterns.