Thursday, August 11, 2022
HomeBig DataSynchronize your AWS Glue Studio Visible Jobs to totally different environments 

Synchronize your AWS Glue Studio Visible Jobs to totally different environments 


AWS Glue has change into a well-liked choice for integrating information from disparate information sources resulting from its means to combine giant volumes of information utilizing distributed information processing frameworks. Many shoppers use AWS Glue to construct information lakes and information warehouses. Information engineers preferring to develop information processing pipelines visually utilizing AWS Glue Studio to create information integration jobs. This submit introduces Glue Visible Job API to writer the Glue Studio Visible Jobs programmatically, and Glue Job Sync utility that makes use of the API to simply synchronize Glue jobs to totally different environments with out dropping the visible illustration.

Glue Job Visible API

AWS Glue Studio has a graphical interface known as Visible Editor that makes it straightforward to writer extract, remodel, and cargo (ETL) jobs in AWS Glue. The Glue jobs created within the Visible Editor include its visible illustration that composes information transformation. On this submit, we name the roles Glue Studio Visible Jobs.

For instance, it’s widespread to develop and take a look at AWS Glue jobs in a dev account, after which promote the roles to a prod account. Beforehand, while you copied the AWS Glue Studio Visible jobs to a distinct setting, there was no mechanism to repeat the visible illustration collectively. Which means the visible illustration of the job was misplaced and you could possibly solely copy the code produced with Glue Studio. It may be time consuming and tedious to both copy the code or recreate the job.

AWS Glue Job Visible API helps you to programmatically create and replace Glue Studio Visible Jobs by offering a JSON object that signifies visible illustration, and likewise retrieve the visible illustration from current Glue Studio Visible Jobs. A Glue Studio Visible Job consists of information supply nodes for studying the information, remodel nodes for modifying the information, and information goal nodes for writing the information.

There are some typical use instances for Glue Visible Job API:

  • Automate creation of Glue Visible Jobs.
  • Migrate your ETL jobs from third-party or on-premises ETL instruments to AWS Glue. Many AWS companions, corresponding to Bitwise, Bladebridge, and others have constructed convertors from the third-party ETL instruments to AWS Glue.
  • Synchronize AWS Glue Studio Visible jobs from one setting to a different with out dropping visible illustration.

On this submit, we give attention to a utility that makes use of Glue Job Visible APIs to realize the mass synchronization of your Glue Studio Visible Jobs with out dropping the visible illustration.

Glue Job Sync Utility

There are widespread necessities to synchronize the Glue Visible Jobs between totally different environments.

  • Promote Glue Visible Jobs from a dev account to a prod account.
  • Switch possession of Glue Visible Jobs between totally different AWS accounts.
  • Replicate Glue Visible Job configurations from one area to a different for catastrophe restoration function.

Glue Job Sync Utility is constructed on high of Glue Visible Job API, and the utility helps you to synchronize the roles to totally different accounts with out dropping the visible illustration. The Glue Job Sync Utility is a python utility that allows you to synchronize your AWS Glue Studio Visible jobs to totally different environments utilizing the brand new Glue Job Visible API. This utility requires that you just present supply and goal AWS setting profiles. Optionally, you’ll be able to present an inventory of jobs that you just wish to synchronize, and specify how the utility ought to change your environment-specific objects utilizing a mapping file. For instance, Amazon Easy Storage Service (Amazon S3) areas in your growth setting and position may be totally different than your manufacturing setting. The mapping config file can be used to exchange the setting particular objects.

Methods to use Glue Job Sync Utility

On this instance, we’re synchronizing two AWS Glue Studio Visible jobs, test1 and test2, from the event setting to the manufacturing setting in a distinct account.

  • Supply setting (dev setting)
    • AWS Account ID: 123456789012
    • AWS Area: eu-west-3 (Paris)
    • AWS Glue Studio Visible jobs: test1, test2
    • AWS Identification and Entry Administration (IAM) Position ARN for Glue job execution position: arn:aws:iam::123456789012:position/GlueServiceRole
    • Amazon S3 bucket for Glue job script and different asset location: s3://aws-glue-assets-123456789012-eu-west-3/
    • Amazon S3 bucket for information location: s3://dev-environment/
  • Vacation spot setting (prod setting)
    • AWS Account ID: 234567890123
    • AWS Area: eu-west-3 (Paris)
    • IAM Position ARN for Glue job execution position: arn:aws:iam::234567890123:position/GlueServiceRole
    • Amazon S3 bucket for Glue job script and different asset location: s3://aws-glue-assets-234567890123-eu-west-3/
    • Amazon S3 bucket for information location: s3://prod-environment/

Arrange the utility in your native setting

You will have the next conditions for this utility:

  • Python 3.6 or later.
  • Newest model of boto3.
  • Create two AWS named profiles, dev and prod, with the corresponding credentials in your setting. Observe this instruction.

Obtain the Glue Job Sync Utility

Obtain the sync utility from the GitHub repository to your native machine.

Create AWS Glue Studio Visible Jobs

  1. Create two AWS Glue Studio Visible jobs, test1, and test2, within the supply account.
    • For those who don’t have any AWS Glue Studio Visible jobs, then comply with this instruction to create the Glue Studio Visible jobs.

  2. Open AWS Glue Studio within the vacation spot account and confirm that the test1 and test2 jobs aren’t current.

Run the Job Sync Utility

  1. Create a brand new file named mapping.json, and enter the next JSON code. With the configuration in line 1, the sync utility will change all the Amazon S3 references inside the job (on this case s3://aws-glue-assets-123456789012-eu-west-3) to the mapped location (on this case s3://aws-glue-assets-234567890123-eu-west-3). Then, the utility will create the job to the vacation spot setting. Alongside these traces, line 2 and line 3 will set off applicable substitutions within the job. Observe that these are instance values and also you’ll must substitute the best values that match your setting.
    {
        "s3://aws-glue-assets-123456789012-eu-west-3": "s3://aws-glue-assets-234567890123-eu-west-3",
        "arn:aws:iam::123456789012:position/GlueServiceRole": "arn:aws:iam::234567890123:position/GlueServiceRole",
        "s3://dev-environment": "s3://prod-environment"
    }

  2. Execute the utility by working the next command:
    $ python3 sync.py --src-profile dev --src-region eu-west-3 --dst-profile prod --dst-region eu-west-3 --src-job-names test1,test2 --config-path mapping.json

  3. Confirm profitable synchronization by opening AWS Glue Studio within the vacation spot account:
  4. Open the Glue Studio Visible jobs, test1, and test2, and confirm the visible illustration of the DAG.

The screenshot above reveals that you just have been in a position to copy the roles test1 and test2 whereas preserving DAG into the vacation spot account.

Conclusion

AWS Glue Job Visible API and the AWS Glue Sync Utility simplify the way you synchronize your jobs to totally different environments. These are designed to simply combine into your Steady Integration pipelines whereas retaining the visible illustration that improves the readability of the ETL pipeline.


In regards to the Authors

Noritaka Sekiyama is a Principal Massive Information Architect on the AWS Glue group. He’s accountable for designing AWS options, implementing software program artifacts, and serving to buyer architectures. In his spare time, he enjoys watching anime in Prime Video.

Aaron Meltzer is a Software program Engineer on the AWS Glue Studio group. He leads the design and implementation of options to simplify the administration of AWS Glue jobs. Exterior of labor, Aaron likes to learn and be taught new recipes.

Mohamed Kiswani is the Software program Improvement Supervisor on the AWS Glue Crew

Shiv Narayanan is a Senior Technical Product Supervisor on the AWS Glue group.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments