Tips and tricks from my personal experience—followed by a list of study materials.
Hi—you must be interested in passing the Google Cloud Professional Data Engineer Exam. Google recommends that you have 3+ years of experience before attempting the exam. However, I think that if you have some experience with other cloud providers, databases and SQL, you can still do it, namely because GCP is much more intuitive than its competitors (in my humble opinion).
Unlike other certifications, there isn’t a regimented coursebook or training manual. That is because Google expects you to be a practitioner and know most things from experience. But, realistically speaking, it’s very hard to gain exposure to all of the products and services. So I decided to write this article and hand over some of my learnings that helped me pass the exam. If there is anything you think I missed or got wrong, please leave a comment and I will try and fix it.
I took the exam in December 2020 and passed it on my first try after spending about 30 hours on prep time.
Two of the answers you can discard right away
For example, look at the question below. You have two answers that specify BigQuery and two for Cloud Storage. And half of those mention Dataflow and the other half, Dataproc.
In questions like this it should be easy to discard the less viable option (Cloud Storage, because the requirement asks for SQL queries), and then focus on choosing between Dataflow or Dataproc.
Even if you don’t know the exact answer, it is possible to bring your chances to 50/50.
Read the questions VERY carefully.
Once you remove the improbable answers, you are often left with two options that seem equally plausible. As demonstrated in the example above, you need to pick between Dataflow or Dataproc. If you have paid close attention, you would have noticed the correct answer is Dataproc because they mention custom Spark jobs.
The correct answer often hinges on a single word or phrase. So read VERY carefully.
Google products over open source
You probably already know this, but correct answers in this exam are almost always the ones that imply deeper integration with GCP. Look at the question below. It is asking you to choose between Pub/Sub and Kafka. It should come as no surprise that the correct option is the former.
Practice questions are the key to passing
In hindsight, the most efficient method (for me) was to go over example questions and then double down on the incorrect answers. If you are not a complete GCP beginner, this will save you lots of time and help you fish out the areas that you need to improve on.
There are a few paid courses out there, and you can find the list at the bottom. However, I found none of them worth the money or time. They felt too basic and targeted at beginners, people who have next to no experience with cloud platforms, databases or ML models. I had to resort to listening at 2x speed or just skimming through the transcripts. Not to mention that some of them were created with the old (pre-April 2019) exam in mind.
I hold a similar sentiment towards Qwiklabs (you would get to experience those if you enrol for any of the Coursera courses, or you can even sign up for them directly). If you are new to GCP and cloud environments in general they can be a great stepping stone. But in my case, I don’t think they taught me anything that was useful for the exam. Most of the labs felt like a fancy copy-paste exercise. And in production projects, we almost never use the Cloud Console, but Terraform and Cloud Deployment Manager instead (no worries, those are not covered in the exam).
In my opinion, this is the only course you should take. It is lead by the guy who actually makes the GCP exams. The course will not teach you what the answers are, but it will give you a more practical idea of what the questions will be like. There are a fair amount of example questions, followed by detailed explanations (something that was quite hard to come by).
I enrolled in the 7-day trial and finished the course in 2–3 afternoons, and then cancelled my subscription.
The above is a great cheat-sheet, available for free. I came across it when I did the Linux Academy course (see ‘Paid Resources’ below). Highly recommended.
Another cheat-sheet available on Github. A bit outdated, but still quite useful.
The above is a link to a curated YouTube playlist (not by Google). It has 9 videos from the Cloud Next 19 conference. It adds up to 6.5 hours of content of varying complexity. They are by no means targeted at exam takers, but I found them quite useful and informative.
Good luck!