The goal of this post is to present a common case study for building a research environment in AWS.
Building an environment in the cloud involves several topics we need to take under consideration (such as how do I access resources in the cloud, where and how do I store data in the cloud, how do I protect the infrastructure, etc.)
Let’s consider the following architecture:
- Researchers will connect to the cloud environment remotely over the internet and connect to a Linux machine with data analytics tools.
- Original data sets will be stored using object storage.
- Output data will be processed in a MySQL database.
- Due to data sensitivity, data must be protected at all times.
In the following sections we will break-down the research team requirements to best practices using built-in AWS services:
Infrastructure
- For the base OS image, we will use the most up-to-date Linux AMI, which contains the latest security patches.
- After deploying the VM, we will install the latest build of our analytics tools and development interpreters (such as Python).
- Once the image is fully installed, we will deploy Amazon Inspector, and in-order to make sure the image is being assessed for security vulnerabilities on a regular basis.
Example of a possible solution can be seen here: https://aws.amazon.com/blogs/security/how-to-set-up-continuous-golden-ami-vulnerability-assessments-with-amazon-inspector/
- In-order to deploy security patches on the Linux machine, we will deploy AWS Patch Manager agent.
Network connectivity
- Access to the cloud environment remotely will be done using AWS Client VPN.
- All resources will be located in a single Amazon VPC, but the Linux VM and the MySQL database will be located in separate subnets.
- The Linux VM will be located in a DMZ subnet, and access to this subnet will be protected using Amazon security groups, for VPN authenticated clients on port 22 TCP.
- The database will be located in DB subnet, and access to this subnet will be protected using a DB security group, with access to MySQL port from the DMZ subnet only.
- Further explanation about security groups can be found here: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Overview.RDSSecurityGroups.html
Database
- The MySQL database will be deployed as a managed service using Amazon RDS.
- Access privileges to the MySQL database will be restricted using Amazon IAM roles.
- The traffic between the Linux machine and the MySQL database will be encrypted using TLS, as explained here: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.SSL.html
- Data inside the MySQL database will be encrypted at rest, and the encryption keys will be stored on Amazon KMS, as explained here: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Overview.Encryption.html
Storage
- Data will be stored in Amazon S3 , in a private bucket, as explained here: https://aws.amazon.com/blogs/security/how-to-use-bucket-policies-and-apply-defense-in-depth-to-help-secure-your-amazon-s3-data/
- Access to the S3 bucket will be restricted using an IAM policy, as explained here: https://docs.aws.amazon.com/AmazonS3/latest/dev/example-policies-s3.html
- Data inside the S3 bucket will be encrypted at rest, as explained here: https://docs.aws.amazon.com/AmazonS3/latest/dev/ServerSideEncryptionCustomerKeys.html
Authentication
- Access using SSH to login to the Linux VM will be performed using the AWS Directory service, as explained here: https://aws.amazon.com/answers/security/aws-controlling-os-access-to-ec2/
Auditing
- Access to all resources will be audited for further review using Amazon CloudTrail, as explained here: https://docs.aws.amazon.com/awscloudtrail/latest/userguide/best-practices-security.html
- Alerts for suspicious activity will raise alarm using Amazon CloudWatch, as explained on https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html
Summary
In this post, I’ve explained how to use AWS services in-order to build and maintain a secured research environment, while keeping sensitive data secure and following all research requirements specified at the beginning of the post.
About the author
Eyal Estrin, cloud architect.