AWS Certified Data Engineer – Associate (DEA-C01) — Question 37
A data engineer is using Amazon Athena to analyze sales data that is in Amazon S3. The data engineer writes a query to retrieve sales amounts for 2023 for several products from a table named sales_data. However, the query does not return results for all of the products that are in the sales_data table. The data engineer needs to troubleshoot the query to resolve the issue.
The data engineer's original query is as follows:
SELECT product_name, sum(sales_amount)
FROM sales_data -
WHERE year = 2023 -
GROUP BY product_name -
How should the data engineer modify the Athena query to meet these requirements?
Answer options
- A. Replace sum(sales_amount) with count(*) for the aggregation.
- B. Change WHERE year = 2023 to WHERE extract(year FROM sales_data) = 2023.
- C. Add HAVING sum(sales_amount) > 0 after the GROUP BY clause.
- D. Remove the GROUP BY clause.
Correct answer: B
Explanation
The correct answer is B because using the extract function allows the query to properly filter the year from the sales data. The other options do not address the problem effectively; A changes the aggregation type, C adds a condition after grouping which might not help if there are no sales, and D would remove the grouping altogether, leading to aggregated results that do not separate by product.