Databricks Certified Associate Developer for Apache Spark — Question 67
Which of the following code blocks returns a 10 percent sample of rows from DataFrame storesDF with replacement?
Answer options
- A. storesDF.sample(true)
- B. storesDF.sample(true, fraction = 0.1)
- C. storesDF.sample(true, fraction = 0.15)
- D. storesDF.sampleBy(fraction = 0.1)
- E. storesDF.sample(false, fraction = 0.1)
Correct answer: B
Explanation
The correct answer is B because it specifies 'true' for sampling with replacement and sets the fraction to 0.1, which means it will return 10 percent of the rows. Option A does not specify a fraction, so it defaults to a sample size of 1. Option C has a fraction of 0.15, which exceeds 10 percent. Option D uses sampleBy, which is not suitable for this scenario, and Option E specifies 'false', indicating sampling without replacement.