Applying Data Masking with GitHub Actions - Part 1

Estimated: 30 mins

Bytebase is a database DevSecOps platform designed for developers, security, DBA, and platform engineering teams. While it offers an intuitive GUI for managing database schema changes and access control, some teams may want to integrate Bytebase into their existing DevOps platforms using the Bytebase API.

Bytebase provides database dynamic data masking in the Enterprise Plan, which can mask sensitive data in the SQL Editor query result based on the context on the fly. It helps organizations to protect sensitive data from being exposed to unauthorized users.

By using GitHub Actions with Bytebase API, you can implement policy-as-code to apply database masking policies via the GitOps workflow. This tutorial will guide you through the process.


This is Part 1 of our tutorial series on implementing automated database masking using GitHub Actions:

  • Part 1: Column masking and masking exception with GitHub Actions (this one)
  • Part 2: Masking Algorithm with GitHub Actions
  • Part 3: Data Classification and Global Masking with GitHub Actions

Overview

In this tutorial, you'll learn how to automate database masking policies using GitHub Actions and the Bytebase API. This integration allows you to:

  • Manage data masking rules as code
  • Automatically apply masking policies when PRs are merged

Here is a merged pull request as an example.

The complete code for this tutorial is available at: database-security-github-actions-example

Prerequisites

Before you begin, make sure you have:

  • Docker installed
  • A GitHub account
  • An ngrok account
  • Bytebase Enterprise Plan subscription (you can request a free trial)

Setup Instructions

Step 1 - Start Bytebase in Docker and set the External URL generated by ngrok

ngrok is a reverse proxy tunnel, and in our case, we need it for a public network address in order to receive webhooks from VCS. ngrok we used here is for demonstration purposes. For production use, we recommend using Caddy.

ngrok-reverse-proxy

  1. Run Bytebase in Docker with the following command:

    docker run --rm --init \
      --name bytebase \
      --publish 8080:8080 --pull always \
      --volume ~/.bytebase/data:/var/opt/bytebase \
      bytebase/bytebase:3.1.0
  2. Bytebase is running successfully in Docker, and you can visit it via localhost:8080. Register an admin account and it will be granted the workspace admin role automatically.

  3. Login to ngrok Dashboard and complete the Getting Started steps to install and configure. If you want to use the same domain each time you launch ngrok, go to Cloud Edge > Domains, where you'll find the domain <<YOURS>>.ngrok-free.app linked to your account.

  4. Run the ngrok command ngrok http --domain=<<YOURS>>.ngrok-free.app 8080 to start ngrok with your specific domain, and you will see the output displayed below:

    terminal-ngrok

  5. Log in Bytebase and click the gear icon (Settings) on the top right. Click General under Workspace. Paste <<YOURS>>.ngrok-free.app as External URL under Network section and click Update.

    external-url

  6. Now you can access Bytebase via <<YOURS>>.ngrok-free.app.

Step 2 - Create Service Account

  1. Log in as the admin user, and go to Security & Policy > Users & Groups. Click + Add User, fill in with api-example, choose the DBA role that is sufficient for this tutorial and click Confirm. service-account-create

  2. Find the newly created service account and click on Copy Service Key. We will use this token to authenticate the API calls. service-account-key

Step 3 - Prepare Test Data

  1. Bytebase by default provides a project Sample Project with two database hr_test and hr_prod.
  2. Click IAM & Admin > Users & Groups on the left sidebar. Add users: dev@example.com, dev2@example.com and dev3@example.com with no roles.
  3. Add a group contractor@example.com with dev3@example.com as a member.
  4. Go to project Sample Project, click Manage > Members on the left sidebar.
  5. Click Grant Access and select users dev@example.com and dev2@example.com with Developer role and group contractor@example.com with Querier role.

Step 4 - Configure GitHub Actions

  1. Fork Database Security GitHub Actions Example.

  2. Click Settings and then click Secrets and variables > Actions. Add the following secrets:

    • BYTEBASE_URL: ngrok external URL
    • BYTEBASE_SERVICE_KEY: api-example@service.bytebase.com
    • BYTEBASE_SERVICE_SECRET: service key copied in the previous step

Step 5 - Understanding the GitHub Workflow

Let's dig into the GitHub Actions workflow code:

  1. Trigger: Workflow runs when PRs are merged to main.

  2. Authentication: The step Login Bytebase will log in Bytebase using the official bytebase-login action. The variables you configured in the GitHub Secrets and variables are mapped to the variables in the action.

  3. File Detection: The step Get changed files will monitor the changed files in the pull request. For this workflow, we only care about column masking and masking exception. So masking/databases/**/**/column-masking.json and masking/projects/**/masking-exception.json are filtered out.

  4. PR Feedback: The step Comment on PR will comment on the merged pull to notify the result.

Column Masking

Column Masking lets you specify table columns different Masking Level to mask the data.

In Bytebase console, go to a database page, then pick a table, you can specify masking level by clicking pen icon on table detail page.

In the GitHub workflow, find the step Apply column masking, which will apply the column masking to the database via API. First it will parse all the column masking files and then do a loop to apply the column masking to the database one by one. The code it calls Bytebase API is as follows:

response=$(curl -s -w "\n%{http_code}" --request PATCH "${BYTEBASE_API_URL}/instances/${INSTANCE_NAME}/databases/${DATABASE_NAME}/policies/masking?allow_missing=true&update_mask=payload" \
   --header "Authorization: Bearer ${BYTEBASE_TOKEN}" \
   --header "Content-Type: application/json" \
   --data @"$CHANGED_FILE")

By changing the files masking/databases/**/**/column-masking.json, create a PR and then merge, the change will be applied to the database.

Log in Bytebase console, at the workspace level, click Data Access > Data Masking. Click Explicit Masked Columns, you can see the column masking is applied to the database.

bb-column-masking

Access Unmasked Data

Access Unmasked Data lets you relax the masking levels for the users. Full masked column to partial or partial masked column to none.

In the GitHub workflow, find the step Apply masking exception, which will apply the masking exception to the database and the process is similar, the code it calls Bytebase API is as follows:

response=$(curl -s -w "\n%{http_code}" --request PATCH "${BYTEBASE_API_URL}/projects/${PROJECT_NAME}/policies/masking_exception?allow_missing=true&   update_mask=payload" \
   --header "Authorization: Bearer ${BYTEBASE_TOKEN}" \
   --header "Content-Type: application/json" \
   --data @"$CHANGED_FILE")

By changing the files masking/projects/**/masking-exception.json, create a PR and then merge, the change will be applied to the database.

Log in Bytebase console, go to the project Sample Project, click Database > Masking Access, you can see the masking exception is applied to the database.

bb-masking-exception

Next Steps

Now you have successfully applied data masking policies using GitHub Actions and Bytebase API. In the next part of this tutorial, you'll learn how to customize the masking algorithm. Stay tuned!

Edit this page on GitHub

Subscribe to Newsletter

By subscribing, you agree with Bytebase's Terms of Service and Privacy Policy.