GCS-Triggered Google Cloud Functions

Google Cloud Functions is an even-driven, serverless computing platform. It provides an easy, auto-scaled way to stand up functions in the cloud that are run on a given payload, based on events. One of its use cases is the real-time processing of files. Since Cloud Functions is a Google product, it provides an especially easy way to respond to change notifications emerging from Google Cloud Storage (GCS). These notifications can be configured to trigger in response to various events inside a bucket—object finalization, deletion, archiving and metadata updates (learn more about those triggers: https://cloud.google.com/functions/docs/calling/storage). This guide shows an example of how to configure a solution that will update a BigQuery (BQ) table every time a new data file is loaded into a given GCS bucket. Along the way there will be notes about realistic configuration options, such as how to organize/name the necessary files, how to load dependencies, how to customize deployments, how to pass environment variables, and how to update deployments. Limitations of the solution will also be listed.

Architecture

Cloud Storage events used by Cloud Functions are based on Cloud Pub/Sub Notifications for Google Cloud Storage (https://cloud.google.com/storage/docs/pubsub-notifications). When a new item is uploaded to GCS, a notification is sent through Pub/Sub to Cloud Functions, which will run our function, updating the chosen BQ table.

Limitations:

  • Cloud Functions can only be triggered by Cloud Storage buckets in the same Google Cloud Platform project.
  • While git repos on Github and Bitbucket can be mirrored in Google Cloud Source Repositories and used as sources of the code for deployments, BitBucket self-hosted servers are still not supported (https://stackoverflow.com/questions/54667858/is-it-possible-to-mirror-a-bitbucket-repository-hosted-in-a-private-domain-in-go). The only alternatives to manual local deployment that work for folks with a self-hosted server is to have a process for triggering zipping and pushing the latest version of the code to GCS from within the self-hosted solution.

Steps

  1. Install and initialize the Cloud SDK (https://cloud.google.com/sdk/docs/).
    • Important Note: Several of the available deploy flags we’ll use later require an updated version, so make sure to upgrade before starting.
  2. Create a GCS bucket if you don’t already have one you’d like to use
  3. Create a repo with the following files and file structure

your_repo_name/
├── main.py
├── requirements.txt
└── optional_custom_dependency  # Could be a local or private remote dependency
            ├── init.py  # don’t forget to initialize the folder
            └── actual_module.py

  • You must have a main.py file.
  • Dependencies:
    • If you have python requirements, they must be in a requirements.txt file in the same directory as main.py.
    • Dependencies are installed in a Cloud Build environment that does not provide access to SSH keys. Packages hosted in repositories that require SSH-based authentication (e.g. our BitBucket) must be vendored and uploaded alongside your project’s code. In other words, you’d need to copy the modules/code from the BitBucket repo and include it in the directory uploaded for your cloud function. See https://cloud.google.com/functions/docs/writing/specifying-dependencies-python

4. Write a function that we want run in the cloud

5. Deploy the files to Cloud Functions

Note: To update the deployed code or other flags, simply rerun the gcloud deploy command with only the changed flags.

6. Drop files into the chosen GCS bucket and watch the magic happen

A full example repo, with the deployment command in deploy.sh can be found at https://github.com/ZaxR/gcs_triggered_google_cloud_functions.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.