Integrate Kafka with Google Cloud/Pub Sub
Did you know that Kafka was developed by LinkedIn and donated to the Apache Software Foundation? Kafka is a powerful tool for building real-time data pipelines and streaming apps. But what if you want to choreograph Dataflow jobs or use topics to trigger Cloud Functions? You can still have the flexibility of having Pub/Sub as your GCP event notifier and exchange messages between Kafka and Pub/Sub.
So how do you exchange messages between Kafka and Pub/Sub? There’s a lab for that! Streaming IoT Kafka to Google Cloud Pub/Sub will explain how to integrate Kafka with Google Cloud.
Here is a glimpse at what all you will be doing in this lab:
The set up of this lab is just like other labs. You will use the Google Cloud Shell in this lab. After opening the Google Console you have to activate your Google Cloud Shell following the instructions given.
Read though the intro section — I know, I know — this will help you in understanding the integration of Kafka with the Google Cloud Platform.
Setting up the environment:
The setup instructions can be a little confusing. Here is a tip: when you search for “Kafka” in the Marketplace you will see multiple similar results. You have to pick the right one in order to go further. You will see something like this:
Confused? Choose the option highlighted in red above.
Once you make your selection, you will see a Launch on Compute Engine button. Click it! Then you will see the details of the Kafka deployment which you are about to launch. Keep all the default settings as they are.
Check the terms and conditions of the GCP Marketplace and then click Deploy.
Deploying the VM instance will take a couple of minutes. While you’re patiently waiting for the Kafka VM to be fully deployed, you will see the status as pending.
Once your VM is successfully deployed, you will see this on the right side of your screen:
Great, you now have a VM running Kafka! Now you will have to stop the Kafka instance which you have created, because you need to Allow full access to all Cloud APIs. In order to do this, click on the VM instance to reveal a control panel, then click the Stop button. This may take a minute. You cannot edit the Cloud Access Scopes till the Kafka instance completely stops.
Configure the Kafka VM instance
This section is all about running and understanding commands. You will configure your Kafka VM using SSH. Make sure you are entering your project ID wherever it is instructed. Find your project ID back in the lab manual.
Use the given commands to copy the file generated in the cloud storage bucket to the Kafka VM instance. Then you will move the jar file created in the sub-directory of the Kafka application.
Change the operational directory to config, then use the nano file to add the cps-sink-connector.properties. Make sure you add your project ID according to the given instructions.
Congrats, you have successfully configured the Kafka VM Instance to use the connector!
Pub/Sub Topic and Subscription setup
In this section we will create a Pub/Sub Topic to-kafka from-kafka. ‘To-kafka’ will be communicating with from-kafka. In the next step we will create the subscription for to-kafka and also from-kafka. This step will ensure that communication happens between Kafka and GCP. We can say that Pub/Sub acts as the mediator between Kafka and GCP.
Start the Kafka VM application instance
This is a very crucial step in the lab: start the Kafka application! Follow the steps as they are given and you will be good to proceed in the lab. Please note that you have to perform the given steps in the SSH.
First you will create a topic to exchange information through pub/sub. Then you will create another topic to receive messages from pub/sub. Now you have to make certain edits using nano. Here is a screenshot of the changes which you will make:
After this step you will move to the home directory. You have to create the run-connector.sh and add the given content in the lab instructions in the file. Use the command nano run-connector.sh
Data exchange between Kafka and Pub/Sub
In this section you will be working on a new SSH of Kafka instance. You will enter the Kafka console and add certain elements in the console. Once you have added the elements in the Kafka console you need to check if these elements are reflected successfully in the cloud shell. For this you will run the command gcloud pubsub subscriptions pull from-kafka — auto-ack — limit=10. Once you run this command it will take some time to sync with the Kafka console. You will get the results after running this command a couple of times. Once you see the output given in the lab instructions you are good to proceed further with testing Pub/sub to Kafka.
Pub/Sub to Kafka Testing
This is a very interesting task in which you will be verifying the data exchange between Pub/Sub to Kafka. You will run the commands in the Cloud Shell and see the output in the Kafka VM SSH. Your output will look like this
Now you will be verifying the exact opposite procedure where in you will be running the command in the Kafka VM and seeing the output in the Cloud Shell. It will take some time for the output to be reflected and you may have to run the command gcloud pubsub subscriptions pull from-kafka — auto-ack — limit=10 a couple of times to see the output. Your output will look like this
IOT Simulator — IoT core
In this section you will see how the IoT simulator works with Kafka. IRL (in real life) you can also use this for connecting to other devices. First you will make the set up ready in order to create a device registry. For this purpose you will be using git and cloning the repository which will help you to gain access to some specific lab tools. You will also create a cryptographic key which would allow IoT devices to connect to the Cloud Pub/Sub.
Simply follow the steps in the lab manual and you will be able to understand and complete this task. You know you have completed this task successfully when you see the list of temperatures being displayed in the SSH of the iot-device-simulator and the SSH window for Kafka is receiving the temperatures. The final output will look like this:
Iot-device-simulator SSH window
Kaka VM SSH
Hope you have enjoyed this lab. If you want more interesting labs, we have good news: You still have a chance to enroll in our 30 Day Challenge! Earn your Data Engineering Badge by 31st August, and along with the badge you will also get a second month free + an exclusive invitation to play a Data Engineering game, open only to those who complete the challenge. You’ll compete for swag and glory.
Use code 1q-thirty-14 and enroll into the challenge by today as this offer is valid just for 24 hours!!