Data Masking with GitHub Actions Part 2 - Masking Algorithm
Bytebase is a database DevSecOps platform designed for developers, security, DBA, and platform engineering teams. While it offers an intuitive GUI for managing database schema changes and access control, some teams may want to integrate Bytebase into their existing DevOps platforms using the Bytebase API.
In the previous tutorial, you learned how to set up a GitHub Action that utilizes the Bytebase API to define data masking policies. In this tutorial, we will explore how to customize both the masking algorithm and semantic types.
This is Part 2 of our tutorial series on implementing automated database masking using GitHub Actions:
- Part 1: Column masking
- Part 2: Masking Algorithm (this one)
- Part 3: Data Classification and Global Masking
- Part 4: Data export with masking (TBD)
Overview
In this tutorial, you'll learn how to automate database masking algorithms and semantic types using GitHub Actions and the Bytebase API. This integration allows you to:
- Manage data masking rules as code
- Automatically apply masking policies when PRs are merged
Here is a merged pull request as an example.
This tutorial skips the setup part, if you haven't set up the Bytebase and GitHub Action, please follow Setup Instructions section in the previous tutorial.
Masking Algorithm
You may customize your own data masking algorithm with the help of a predefined masking type, such as Full mask, Range mask, MD5 mask and Inner/Outer mask.
In Bytebase console
Go to Data Access > Data Masking, click Masking Algorithm and click Add. You can create a new masking algorithm with a name and description, and later it can be used in the definition of semantic types.
In GitHub Workflow
In the GitHub workflow bb-masking-2.yml
, find the step Apply masking algorithm
, which will apply the masking algorithm to the database via API. All the masking algorithms should be defined in one file in the root directory of masking/masking-algorithm.json
. The code it calls Bytebase API is as follows:
By changing file masking/masking-algorithm.json
, you can apply the masking algorithm to the database. Go to Bytebase console, click Data Access > Data Masking, go to Masking Algorithm page, you can see the masking algorithm is applied to the database.
Semantic Type
You may define semantic types and apply them to columns of different tables. Columns with the same semantic type will be masked with the same masking algorithm. For example, you may define a semantic type mobile
and apply it to all the columns of phone number. Then you can define a masking algorithm range 4-10
for the partial level masking for semantic type mobile
.
In Bytebase Console
Go to Data Access > Data Masking, click Semantic Types and click Add. You can create a new semantic type with a name and description, and select the masking algorithm.
In GitHub Workflow
Find the step Apply semantic type
, which will apply the semantic type to the database via API. All the masking algorithms should be defined in one file in the root directory as masking/semantic-type.json
. The code it calls Bytebase API is as follows:
By changing file masking/semantic-type.json
, you can apply the semantic type to the database. Go to Bytebase console, click Data Access > Data Masking, go to Semantic Types page, you can see the semantic type is applied to the database.
Next Steps
Now you have successfully applied data masking algorithm and semantic type using GitHub Actions and Bytebase API. In the next part of this tutorial, you'll learn how to use data classification and global masking with GitHub Actions. Stay tuned!