Managing Streaming Dataflow Job using Kubernetes Config Connector

Kubernetes config connector (KCC) is an open source addon that allows us to manage GCP resources through Kubernetes, thus preserving it as infrastructure as a code. Whenever changes are applied through the cluster, the Config Connector Custom Resource (CR) will create and manage a Kubernetes resource. Kubernetes config connector, supports various Google Cloud resources, such as Dataflow, PubSub, Bigquery, and so on.

Suppose we want to create a streaming Dataflow job from PubSub to Bigquery, it will require us to create a set of resources such as PubSub topic, Bigquery table, Storage's bucket to act as a temporary path for the job, and a Dataflow job as well.

Here's an available sample from Google Cloud on how to create a streaming job using KCC.

  • Define a PubSub Topic
apiVersion: pubsub.cnrm.cloud.google.com/v1beta1
kind: PubSubTopic
metadata:
name: dataflowjob-dep-streaming
  • Define Bigquery Dataset and Table
apiVersion: bigquery.cnrm.cloud.google.com/v1beta1
kind: BigQueryDataset
metadata:
name: dataflowjobdepstreaming
---
apiVersion: bigquery.cnrm.cloud.google.com/v1beta1
kind: BigQueryTable
metadata:
name: dataflowjobdepstreaming
spec:
datasetRef:
name: dataflowjobdepstreaming
  • Create a Storage Bucket

Since in this example we want to make sure the bucket is destroyed when we delete the resources we set the annotation of force-destroyas true.

apiVersion: storage.cnrm.cloud.google.com/v1beta1
kind: StorageBucket
metadata:
annotations:
cnrm.cloud.google.com/force-destroy: "true"
name: ${PROJECT_ID?}-dataflowjob-dep-streaming
  • Create a Dataflow Job specification

The following sample will create a Dataflow Job with an available public template from Google. We use all of the resources that we already defined in this Job specification. As this is a sa mple, we set the on-delete annotation as cancel. However for production usage where we want to make sure all the data are processed first there is another value called drain . All specs can be seen in this YAML specification. Generally speaking if we already create a template we can push it into Storage Bucket and create a YAML specification pointing the template to our template.


apiVersion: dataflow.cnrm.cloud.google.com/v1beta1
kind: DataflowJob
metadata:
annotations:
cnrm.cloud.google.com/on-delete: "cancel"
labels:
label-one: "value-one"
name: dataflowjob-sample-streaming
spec:
tempGcsLocation: gs://${PROJECT_ID?}-dataflowjob-dep-streaming/tmp
templateGcsPath: gs://dataflow- templates/2020-02-03-01_RC00/PubSub_to_BigQuery
parameters:
inputTopic: projects/${PROJECT_ID?}/topics/dataflowjob-dep-streaming
outputTableSpec: ${PROJECT_ID?}:dataflowjobdepstreaming.dataflowjobdepstreaming
zone: us-central1-a
machineType: "n1-standard-1"
maxWorkers: 3

Suppose we want to create a new Streaming Job, then we can define the following steps to create said job.

CI/CD definition for creating a new Streaming Job.

Managing Streaming Dataflow Job using Kubernetes Config Connector was originally published in Google Cloud - Community on Medium, where people are continuing the conversation by highlighting and responding to this story.

Namaste Devops is a one stop solution view, read and learn Devops Articles selected from worlds Top Devops content publishers inclusing AWS, Azure and others. All the credit/appreciations/issues apart from the Clean UI and faster loading time goes to original author.

Comments

Did you find the article or blog useful? Please share this among your dev friends or network.

An android app or website on your mind?

We build blazing fast Rest APIs and web-apps and love to discuss and develop on great product ideas over a Google meet call. Let's connect for a free consultation or project development.

Contact Us

Trending DevOps Articles

Working with System.Random and threads safely in .NET Core and .NET Framework

Popular DevOps Categories

Docker aws cdk application load balancer AWS CDK Application security AWS CDK application Application Load Balancers with DevOps Guru Auto scale group Automation Autoscale EC2 Autoscale VPC Autoscaling AWS Azure DevOps Big Data BigQuery CAMS DevOps Containers Data Observability Frequently Asked Devops Questions in Interviews GCP Large Table Export GCP Serverless Dataproc DB Export GTmetrix Page Speed 100% Google Page Speed 100% Healthy CI/CD Pipelines How to use AWS Developer Tools IDL web services Infrastructure as code Istio App Deploy Istio Gateways Istio Installation Istio Official Docs Istio Service Istio Traffic Management Java Database Export with GCP Jenkin K8 Kubernetes Large DB Export GCP Linux MSSQL March announcement MySQL Networking Popular DevOps Tools PostgreSQL Puppet Python Database Export with GCP Python GCP Large Table Export Python GCP Serverless Dataproc DB Export Python Postgres DB Export to BigQuery Sprint Top 100 Devops Questions TypeScript Client Generator anti-patterns of DevOps application performance monitoring (APM) aws amplify deploy blazor webassembly aws cdk application load balancer security group aws cdk construct example aws cdk l2 constructs aws cdk web application firewall aws codeguru reviewer cli command aws devops guru performance management aws service catalog best practices aws service catalog ci/cd aws service catalog examples azure Devops use cases azure devops whitepaper codeguru aws cli deploy asp.net core blazor webassembly devops guru for rds devops guru rds performance devops project explanation devops project ideas devops real time examples devops real time scenarios devops whitepaper aws docker-compose.yml health aware ci/cd pipeline example host and deploy asp.net core blazor webassembly on AWS scalable and secure CI/CD pipelines security vulnerabilities ci cd pipeline security vulnerabilities ci cd pipeline aws smithy code generation smithy server generator
Show more