APICS Certified Supply Chain Professional (CSCP) — Question 335
A data analyst has developed a query that runs against Delta table. They want help from the data engineering team to implement a series of tests to ensure the data returned by the query is clean. However, the data engineering team uses Python for its tests rather than SQL.
Which of the following operations could the data engineering team use to run the query and operate with the results in PySpark?
Answer options
- A. SELECT * FROM sales
- B. spark.delta.table
- C. spark.sql
- D. There is no way to share data between PySpark and SQL.
- E. spark.table
Correct answer: C
Explanation
The correct answer is C, spark.sql, as it allows the data engineering team to execute SQL queries within a PySpark environment. Options A and E are valid SQL commands but do not provide a method to run SQL queries directly in PySpark. Option B, spark.delta.table, is used for accessing Delta tables but does not execute SQL. Option D is incorrect because PySpark can interoperate with SQL through spark.sql.