Migration path for python based applications in GCP platform

The data migration projects have 2 distinct pillars, one being data and other being application running on top of data. The applications can be of varied nature — data, web, request-response model, micro-service..
In this blog, we are going to focus on different migration patterns for python based jobs and where it should be executed.

In a typical lift-shift migrations, the focus is more on getting the things running with minimal changes in the code as much as possible. The reason being to reduce the time required for the migration.

Although this is a valid reason as it impacts the overall cost of migration but it is important to understand choosing the correct approach can bring in cost savings and additional benefits from cloud native service adoption.

The approach should be based on factors like long term cost savings, scalability, availability and maintenance ease.

Migration Paths for python applications

Micro-services pattern & request/response model

Generally these applications are responsible for performing business specific functions like generating recommendations on demand, insights generation or a data correction feedback loop.

These applications are generally running in the form of service.

The approach for these applications would be to check if these can be made container compatible.

Google Kubernetes Engine or Cloud Run can be le veraged for deployment of such applications. Additionally for cloud run, refer the container runtime contract model to understand the pre-requisites of cloud run services.

GKE provides
1. More control on infrastructure and deployments.
2. No timeout bounds
3. GKE also comes in auto-pilot mode which reduces maintenance overhead
4. Advanced configuration options with respect to scalability and resilience

Cloud Run can be leveraged when
1. Infrastructure Maintenance overhead is not desired
2. Quicker deployments and leverage serverless deployments
3. Does not need the advanced features of GKE
4. Does not have traffic 24/7
5 . Simple application services

It is possible to use a combination of both the services to achieve the desired results.

Operational Utilities

These are light weight and re-usable application code for performing operational tasks like housekeeping, sending email notifications, event based functions, report delivery and FTP functionalities.

These applications can be deployed on cloud functions.

Cloud functions Generation 2 leverages Cloud Run underneath and provides
1. Integration with Event arc triggers
2. Larger instance support
3. Longer timeout duration

Py-spark code

Spark Ecosystem

Spark — Data Integration and Transfo rmation
Spark based applications can be batch or micro-batch applications leveraging spark-streaming capabilities.
The pyspark code can be migrated and executed on Dataproc (Managed hadoop cluster) or Dataproc serverless option with minimal changes on the code.
The orchestration of code can be performed using dataproc templates or composer DAGS

Spark — SQL Analytics
Spark SQL scripts
are leveraged for performing analytics on Hive.
Although spark SQL code can be easily executed on Dataproc, it is worth considering to move the Spark SQL code to Big Query SQL if the target data warehouse in GCP is Big Query.
This comes with added advantage of having the transformation logic in Big Query and performing ELT instead of ETL

Spark Machine Learning
Spark ML is used for implementing machine learning models.
Dataproc provides support for the spark based machine learning libraries.

Jupyter Notebooks
There are also use-cases where Jupyter notebooks with spark exploration code is used, which requires hosting on the GCP platform.
Vertex AI provides managed notebooks that can be leveraged for hosting the on-premise jupyter notebooks.

Native Python based functionalities

A percentage of application code base is python native code — pandas, numpy, sklearn which is used for scenarios typically like data transformation, feature engineering, data visualisation and excel report generation to name a few.

This code can be executed on adhoc or batch basis.

These scenarios generally does not fit the use-case for GKE or c loud Run.

There are 2 options which can be leveraged for such code deployments

Option 1 — If cloud composer is part of architecture and used for orchestration, python operators or python virtual environment operators can be leveraged for the execution of batch python workloads

This option comes up with some downside
1. Load on composer resources for program execution

Use this option only if resource requirement of the code is less and does not impact the scheduling requirements of composer.

Option 2 — Google Cloud Compute Batch (recently launched) provides mechanism for execution of such adhoc/scheduled batch code.

The underlying instance for Compute batch can be created from instance template with all dependencies installed or as container execution.

The pr oduct is still in preview stage

Google Cloud Compute Batch provides
1. Fully Managed Batch Service
2. Auto-scaling capacity
3. No dedicated infrastructure for execution
4. Spot instance usage
5. Container execution support / Instance Template support

Option 3 — Jupyter Notebooks on Vertex AI can be used for the data exploration and visualisation code. These are developed generally by the data analysts and scientists performing data exploration.

References
https://cloud.google.com/blog/products/containers-kubernetes/when-to-use-google-kubernetes-engine-vs-cloud-run-for-containers
https://cloud.google.com/blog/products/s erverless/cloud-functions-2nd-generation-now-generally-available
https://kubernetes.io/case-studies/
https://cloud.google.com/blog/products/data-analytics/broadcom-adopts-cloud-based-data-lake-for-security-analytics
https://cloud.google.com/blog/products/data-analytics/wayfair-uses-google-cloud-for-data-analytics
https://cloud.google.com/blog/products/compute/new-batch-service-processes-batch-jobs-on-google-cloud

Linked In Handle — https://www.linkedin.com/in/murli-krishnan-a1319842/


Migration path for python based applications in GCP platform was originally published in Google Cloud - Community on Medium, where people are continuing the conversation by highlighting and responding to this story.

Namaste Devops is a one stop solution view, read and learn Devops Articles selected from worlds Top Devops content publishers inclusing AWS, Azure and others. All the credit/appreciations/issues apart from the Clean UI and faster loading time goes to original author.

Comments

Did you find the article or blog useful? Please share this among your dev friends or network.

An android app or website on your mind?

We build blazing fast Rest APIs and web-apps and love to discuss and develop on great product ideas over a Google meet call. Let's connect for a free consultation or project development.

Contact Us

Trending DevOps Articles

Working with System.Random and threads safely in .NET Core and .NET Framework

Popular DevOps Categories

Docker aws cdk application load balancer AWS CDK Application security AWS CDK application Application Load Balancers with DevOps Guru Auto scale group Automation Autoscale EC2 Autoscale VPC Autoscaling AWS Azure DevOps Big Data BigQuery CAMS DevOps Containers Data Observability Frequently Asked Devops Questions in Interviews GCP Large Table Export GCP Serverless Dataproc DB Export GTmetrix Page Speed 100% Google Page Speed 100% Healthy CI/CD Pipelines How to use AWS Developer Tools IDL web services Infrastructure as code Istio App Deploy Istio Gateways Istio Installation Istio Official Docs Istio Service Istio Traffic Management Java Database Export with GCP Jenkin K8 Kubernetes Large DB Export GCP Linux MSSQL March announcement MySQL Networking Popular DevOps Tools PostgreSQL Puppet Python Database Export with GCP Python GCP Large Table Export Python GCP Serverless Dataproc DB Export Python Postgres DB Export to BigQuery Sprint Top 100 Devops Questions TypeScript Client Generator anti-patterns of DevOps application performance monitoring (APM) aws amplify deploy blazor webassembly aws cdk application load balancer security group aws cdk construct example aws cdk l2 constructs aws cdk web application firewall aws codeguru reviewer cli command aws devops guru performance management aws service catalog best practices aws service catalog ci/cd aws service catalog examples azure Devops use cases azure devops whitepaper codeguru aws cli deploy asp.net core blazor webassembly devops guru for rds devops guru rds performance devops project explanation devops project ideas devops real time examples devops real time scenarios devops whitepaper aws docker-compose.yml health aware ci/cd pipeline example host and deploy asp.net core blazor webassembly on AWS scalable and secure CI/CD pipelines security vulnerabilities ci cd pipeline security vulnerabilities ci cd pipeline aws smithy code generation smithy server generator
Show more