The goal of this post is to present a common case study for building a research environment in Azure.
Building an environment in the cloud involves several issues we need to take into consideration, such as how to access resources in the cloud, where and how to store data in the cloud, how to protect the infrastructure, etc.
Let’s consider the following architecture:
- Researchers remotely connect to the cloud environment over the internet and connect to a Windows machine with data analytics tools
- Original data sets will be stored using file storage
- Output data will be processed in an Azure SQL database
- Due to data sensitivity, data must be protected at all times
Here are some best practices for a research team to adopt using built-in Azure services:
Infrastructure
- For the base OS image, we will use the most up-to-date Data Science Virtual Machines Windows machine, which contains the latest security patches
- After deploying the VM, we will install the latest build of our analytics tools and development interpreters (such as Python)
- Once the VM is fully installed, we will deploy Azure Security Center agent, in order to make sure the VM is being assessed for security vulnerabilities on a regular basis. A short guideline for using Azure security center can be found here: https://docs.microsoft.com/en-us/azure/security-center/tutorial-protect-resources
- In-order to deploy security patches on the Linux machine, we will deploy an Update Management solution
Network connectivity
- Remote access to the cloud environment will be done using Azure Point-to-Site VPN
- All resources will be located in a single Azure Resource Group, but the Windows VM and the Azure SQL database will be located in separate subnets
- The Windows VM will be located in a DMZ subnet, and access to this subnet will be protected using Azure Network Security Group, for VPN authenticated clients on port 3389 TCP
- The database will be located in a DB subnet, and access to this subnet will be protected using Azure Network Security Group, with access to Azure SQL port from the DMZ subnet only
- Further explanations on Azure Network Security Groups can be found here: https://docs.microsoft.com/en-us/azure/virtual-network/security-overview
Database
- The Azure SQL database will be deployed as a managed service using Azure SQL Database
- Network access to the Azure SQL database will be restricted using firewall rules, as explained here: https://docs.microsoft.com/en-us/azure/sql-database/sql-database-firewall-configure
- The traffic between the Windows machine and the Azure SQL database will be encrypted using TLS, as explained here: https://docs.microsoft.com/en-us/azure/sql-database/sql-database-security-overview#information-protection-and-encryption
- Data inside the Azure SQL database will be encrypted at rest, and the encryption keys will be stored on Azure Key Vault, as explained here: https://docs.microsoft.com/en-us/azure/sql-database/transparent-data-encryption-azure-sql
Storage
- Data will be stored in Azure Files, as explained here: https://docs.microsoft.com/en-us/azure/storage/files/storage-how-to-use-files-windows#using-an-azure-file-share-with-windows
- Access to the Azure Blob will be restricted using role-based access control from Azure Active Directory, as explained here: https://docs.microsoft.com/en-us/azure/storage/files/storage-files-active-directory-enable
- Data inside the Azure Files will be encrypted at rest, as explained here: https://docs.microsoft.com/en-us/azure/storage/common/storage-service-encryption
Authentication
- Access using RDP to login to the Windows VM will be performed using the Azure Active Directory, as explained here: https://docs.microsoft.com/en-us/azure/active-directory-domain-services/join-windows-vm
Auditing
- Access to all resources will be audited for further review using Azure built-in auditing features, as explained here: https://docs.microsoft.com/en-us/azure/security/fundamentals/log-audit
- Alerts for suspicious activity will raise alarms using Azure Security Center, as explained here https://docs.microsoft.com/en-us/azure/security-center/security-center-managing-and-responding-alerts
Summary
In this post, I’ve explained how to use Azure services in order to build and maintain a secured research environment, keeping sensitive data secure while addressing all the specific research requirements.
About the author
Eyal Estrin, cloud architect.