Fundamental Cloud Security Part 15 – Case Study: Building a Secure Research Environment in Google Cloud Platform

The goal of this post is to present a common case study for building a research environment in Google Cloud Platform (GCP).

Building an environment in the cloud involves several topics we need to take under consideration (such as how do I access resources in the cloud, where and how do I store data in the cloud, how do I protect the infrastructure, etc.)

Let’s consider the following architecture:

Researchers will connect to the cloud environment remotely over the internet and connect to a Linux machine with data analytics tools
Original data sets will be stored using file storage
Output data will be processed in a MySQL database
Due to data sensitivity, data must be protected at all times

In the following sections we will break-down the research team requirements to best practices using built-in GCP services:

Infrastructure

For the base OS image, we will use the most up-to-date Deep Learning VM Image, which uses Debian Linux and includes the latest security patches
After deploying the VM, we will install the latest build of our analytics tools and development interpreters (such as Python)

Network connectivity

Secure access to the cloud environment remotely will be done by deploying OpenVPN from the Google marketplace and using OpenVPN clients
All resources will be located in a single Google Cloud VPC, but the Linux VM and the MySQL database will be located in separate subnets
The Linux VM will be located in a DMZ subnet, and access to this subnet will be protected using GCP firewall rules, for VPN authenticated clients on port 22 TCP
The database will be located in DB subnet, and access to this subnet will be protected using GCP firewall rules, with access to Cloud SQL port from the DMZ subnet only
Further explanation about GCP firewall rules can be found here: https://cloud.google.com/vpc/docs/using-firewalls#creating_firewall_rules

Database

The MySQL database will be deployed as a managed service using Google Cloud SQL for MySQL
The traffic between the Linux machine and the Cloud SQL database will be encrypted using TLS, as explained here: https://cloud.google.com/sql/docs/mysql/configure-ssl-instance

Data inside the Cloud SQL database will be encrypted at rest, as explained here: https://cloud.google.com/sql/faq#encryption-manage-rest and https://cloud.google.com/security/encryption-at-rest/default-encryption/

Storage

Data will be stored in Google Cloud Storage, as explained here: https://cloud.google.com/storage/docs/how-to

Access to the Google Cloud Storage will be restricted by roles from Google IAM, as explained here: https://cloud.google.com/storage/docs/access-control/iam-reference

Data inside the Google Cloud Storage will be encrypted at rest, as explained here: https://cloud.google.com/storage/docs/encryption/customer-managed-keys

Authentication

Access using SSH to login to the Linux VM will be performed using Google IAM role and SSH key attached to the Google G Suite account, as explained here: https://cloud.google.com/compute/docs/instances/managing-instance-access

Auditing

Access to all resources will be audited for further review using Stackdriver cloud audit logs, as explained here: https://cloud.google.com/logging/docs/audit/

Alerts for suspicious activity will raise alarm using Google Cloud Security Command Center, as explained on https://cloud.google.com/security-command-center/docs/

Summary

In the above post, I’ve explained how to use GCP services in-order to build and maintain a secured research environment, while keeping sensitive data secure and following all research requirements specified at the beginning of the post.

About the author

Eyal Estrin, cloud architect.