Google Cloud Professional Data Engineer (PDE): Complete Study Guide
Design data pipelines and ML systems on Google Cloud. PDE exam format, the skill areas, the BigQuery/Dataflow/Pub/Sub core, and a focused study plan.
Practice 316 free Google Cloud Professional Data Engineer questions
Official exam page: https://cloud.google.com/learn/certification/data-engineer
The Google Cloud Professional Data Engineer (PDE) validates your ability to design and build data processing systems, operationalize machine learning models, and ensure data quality, security and reliability on Google Cloud.
Exam at a glance
- Format: 50–60 questions (multiple choice and multiple select)
- Duration: 2 hours
- Cost: 200 USD
- Result: pass / fail
- Validity: 2 years
Skill areas
- Designing data processing systems. Choosing storage and processing services, designing for reliability, security and compliance.
- Ingesting and processing the data. Batch and streaming pipelines, Dataflow, Pub/Sub, Dataproc, Data Fusion.
- Storing the data. Selecting between Cloud Storage, BigQuery, Bigtable, Spanner and Firestore.
- Preparing and using data for analysis. BigQuery modeling, performance and cost optimization, visualization.
- Maintaining and automating data workloads. Orchestration (Cloud Composer), monitoring, and CI/CD for pipelines.
Core services to master
- BigQuery: partitioning, clustering, slots vs on-demand, cost control, nested/repeated fields
- Pub/Sub + Dataflow: streaming pipelines, windowing, exactly-once, Apache Beam concepts
- Dataproc: managed Spark/Hadoop, when to choose it over Dataflow
- Storage choices: Bigtable (wide-column, low latency) vs Spanner (global relational) vs BigQuery (analytics)
- ML on GCP: Vertex AI basics, BigQuery ML, pre-trained APIs
A study plan
- Weeks 1–2: BigQuery in depth — modeling, performance and cost.
- Week 3: Streaming with Pub/Sub and Dataflow; batch with Dataproc.
- Week 4: Storage selection and ML operationalization.
- Week 5: Orchestration, monitoring, then practice exams.
Exam-day tips
- Many questions are "choose the right storage/processing service" — know the trade-offs cold.
- Understand BigQuery cost and performance levers (partitioning, clustering, slots).
- Know when Dataflow (streaming/Beam) beats Dataproc (Spark) and vice versa.
Practice now
Reinforce each area with the free PDE questions below, prioritizing BigQuery and streaming pipelines.