Databricks Certified Associate Developer for Apache Spark — Question 185
A Data Analyst is working on the sensor_df; this data frame contains two columns: a record_datetime: timestamp and record: array.
Which code fragment returns a DataFrame that splits the record column into separate columns and has one array item per row?
Answer options
- A. exploded_df = sensor_df.withColumn (“record_exploded”, explode(“record”)) exploded_df = exploded_df.select( "record datetime", “sensor_id", "status", "health" ) exploded_df = sensor_df.withColumn ("record exploded", col ("record"))
- B. exploded_df = exploded_df.select( "record datetime", “record_exploded.sensor_id", "record_exploded.status", "record_exploded.health" ) exploded_df = sensor_df.withColumn ("record exploded", explode("record"))
- C. exploded_df = exploded_df.select( "record datetime", “record_exploded.sensor_id", "record_exploded.status", "record_exploded.health" ) exploded_df = sensor_df.withColumn ("record exploded", explode("record"))
- D. exploded_df = exploded_df.select(“record_datetime”, “record_exploded”)
Correct answer: B
Explanation
The correct answer is B because it properly uses the explode function on the 'record' column and then selects the relevant fields from the exploded DataFrame. Options A and D do not utilize the explode function correctly in conjunction with the selection of fields, while option C, although similar to B, does not accurately represent the order of operations needed to achieve the desired DataFrame structure.