Azure Architecture — Design Principles and Lessons Learned

Shanmuganathan Raju
11 min readJun 12, 2021

Introduction

Over recent years I’ve been fortunate to have worked across a wide number of Microsoft Azure projects in various “architect” roles.

Some organizations are lifting-and-shifting existing workloads to virtual machines. Others are re-platforming legacy .NET applications to Windows containers running on Azure Kubernetes Service. Whilst others are building cloud-first applications using PaaS services or microservices.

Cloud offers great flexibility. But it also brings its own set of challenges around security, governance, availability and performance.

I wanted to share some of my lessons learned, thought processes and resources that I’ve found useful when designing and architecting Azure solutions.

Hopefully, some of these might be of interest if you also work as an Azure / Cloud / Solution / Platform architect. Or even if you have a more hands-on role such as an infrastructure or DevOps engineer.

Project lifecycle

The diagram below shows an example project lifecycle for an Azure project. I created it to show the different inputs, deliverables, etc., created throughout the project and how they relate to each other.

In my experience, it’s common for an architect to be involved to varying degrees throughout many of the stages of the project. Whether it’s producing design deliverables, being hands-on during the build phase or simply providing guidance and governance.

Example Azure project lifecycle

Success factors

Let’s dive into what I’ve seen as some key contributors to a successful Azure project -

Project success factors

Requirements

From a platform perspective, non-functional requirements (NFR’s) are key inputs to the design phases. Without knowing RTO’s, RPO’s and other performance and security requirements, it’s impossible to make key design decisions about the target Azure architecture.

For example, does the solution need to be single / multi-region? Are there any compliance requirements, i.e. PCI DSS? How many concurrent users is the system expected to support? Etc.

A set of NFR’s agreed by the business is perhaps the most critical success factor. NFR’s are also important later on as part of non-functional testing. Having these defined early on ensures traceability throughout the project. I’ve seen several Azure projects where this hasn’t been the case and has required rework of the solution to meet a newly surfaced requirement.

Obtaining NFR’s is often an organizational challenge as there may be multiple teams within the business responsible for different areas. The application owner is interested in performance and RTO/RPO’s, operations teams are interested in recoverability and security teams will have their own requirements.

As Dilbert so succinctly puts it -

Proof of Concepts

Azure projects will sometimes leverage services that the platform and application teams have limited experience with. For example, building a cloud-native application that leverages Cosmos DB as its database. We may already know that CosmosDB meets our basic requirements on paper, but our teams don’t have any hands-on knowledge.

By running a proof-of-concept, we gain experience and unearth any limitations, gotchas and other lessons learned. I’d recommend time-boxing the PoC with a clear view of what you’re looking to validate or test.

Keeping it Agile

Design decisions made early on have implications for subsequent stages for an Azure project. For example, an overly complex resource group design may hinder implementing a simple RBAC model for managing those resource groups.

For this reason, I’ve typically found that working in an agile manner is more successful than a traditional waterfall process for some parts of the project. In practice, this can mean starting to build out CI/CD pipelines and Azure infrastructure before an HLD/LLD document is finalised. Getting that early feedback from the build team and iterating the fine details of a design quickly is invaluable.

This also helps prove the CI/CD process and infrastructure as code (IaC). As the design firms up, it should be possible to repeatedly tear down Azure infrastructure and redeploy with the updated configuration. Rebuilding and updating infrastructure is something the operations team will need to do. So we might as well test the process early on.

The other advantage of this approach is that there’s less chance of “configuration drift” — that age-old problem of what actually gets implemented is different from the design.

Collaboration

As an architect, we’re normally in a fairly unique position to have the freedom to engage with application teams, project managers, business stakeholders, security and operations teams throughout the project lifecycle.

By taking a collaborative approach, socialising designs and engaging early and often with teams helps get buy-in to the project. Crucially, this helps reduce surprises for any impacted teams, i.e. operations and makes the transition from design → build → support easier.

Design Considerations

So far, we’ve covered (important) project management type considerations. Onto the more interesting Azure-specific topics!

We often talk about Azure “design” and architecture generically. But depending on the type of Azure project, the details of this can vary considerably. Initial questions that I ask to quantify the type of project include -

  • Is it greenfield, or are we deploying into the organization’s existing Azure subscription?
  • What are the application/s that will be deployed to Azure?
  • What is the application architecture? Monolithic, microservices, PaaS based etc.?
  • Which regulatory and compliance frameworks does the solution need to adhere to?

Knowing the above helps us start thinking about the target architecture and the key design considerations needed to support the project.

If it’s greenfield, there are “core infrastructure” considerations for deploying into new Azure subscriptions. There will typically be a significant application element with its own set of design factors, i.e. hosting environments, CI/CD etc. Whilst there’s a lot of overlap, I’ve separated these below for clarity.

Core infrastructure

When an organization isn’t yet consuming Azure, multiple infrastructure areas need to be thought through. This becomes our scaffolding or foundation upon which we can start building and deploying our applications. This list isn’t meant to be exhaustive but shows some of the most important topics-

  • Management groups and subscription — what does this structure look like?
  • Subscriptions — single vs multiple subscriptions?
  • Identity — is there an existing Azure AD?
  • On-premises connectivity — ExpressRoute, VPN etc.
  • Perimeter security — Application Gateways, Firewalls etc.
  • Monitoring + logging — is integration with existing tools required?
  • Security + Governance — Azure Sentinel, Security Centre, security policies, RBAC etc.
  • 3rd party services — what 3rd party services are needed, i.e. firewalls, SIEM’s etc.
  • Infrastructure configuration and deployment — Azure DevOps, GitLab, Terraform vs ARM etc.

To help with this the Microsoft’s Cloud Adoption Framework and landing zones are a great starting point. These will help ensure you have the basics in place and be confident about moving forward.

Initially, you may be putting in place the core infrastructure needed to support just one application. It’s important to ensure that it can easily scale and support future applications and products. For example, implementing network hub and spoke topologies.

Application hosting

With an idea of what the core Azure infrastructure looks like, we can build upon this and turn our attention to application environments. These are the Azure services that we’ll leverage to host the application, i.e. API Management, Kubernetes, Web Apps, Functions, Logic Apps etc.

The project may be focusing on a single application or a series of applications that will be hosted on Azure. Either way, these are some of the key areas to consider -

  • Is this an application/s to be migrated from on-premise? or cloud-native?
  • Application architecture — microservices vs traditional monolith?
  • Azure subscriptions — dedicated vs shared application subscriptions?
  • Application authentication — Azure AD, B2B/B2C?
  • Will the application be public facing? Or via a private ExpressRoute or VPN connection?
  • What industry regulations does the solution need to comply with? i.e. PCI-DSS, GPDR
  • Regional and data residency requirements?
  • Performance and availability — what are the RTO/RPO’s and performance metrics?
  • What Azure services does the application require? SaaS / PaaS / IaaS
  • What 3rd party services does the application require? i.e. auth0, SendGrid etc.
  • Database requirements — Azure SQL / CosmosDB / PostgreSQL etc.
  • Data warehousing and archival? — Synapse, storage accounts etc.
  • Monitoring and alerting — integration with incident management tools
  • Network security — firewalls, VNET service endpoints, Private Link, NSG’s
  • Disaster Recovery and Backup — what does this look like?
  • Incident management for application issues

The large number of available Azure services can make it difficult to know where to start. I’ve found that breaking the solution down into larger, more manageable building blocks such as compute, data, network etc., to be helpful. I’d also highly recommend Microsoft’s application design patterns to help get you started.

Artefacts

All of the above considerations feed into the actual artefacts and deliverables that we create throughout the project. Before writing a design document, I confirm who the target audiences and stakeholders are. These can include -

  • Client architects — key stakeholder and likely responsible for document sign-offs
  • Microsoft architects — provide Azure related technical feedback
  • Project engineers — build the solution based upon the design
  • 3rd party vendors — are interested in any integrations
  • Operations team — are interested in support implications

Below I describe some of the most common design artefacts and their purpose -

Architecture Overview document

This document provides an overarching view or blueprint for the solution. It’s here that key design decisions around Azure subscriptions, management groups, regions, operations etc. This document is the ideal place to tie design decisions back to individual NFR’s.

The Architecture Overview is also ideal for including design principles and statements of intent. For example, native Azure AD groups will be used for managing RBAC across Azure subscriptions. Or that Terraform and Azure DevOps is the preferred choice for infrastructure as code.

Think of this document as putting in place the big building blocks that are difficult or time consuming to change later.

High-Level Design document

The High-Level Design document expands upon the Architecture Overview and provides the next level of detail. It covers areas such as resource groups, VNET’s, environments, configuration of each Azure service. It also covers performance, regional resiliency, availability zones, disaster recovery and backups.

Low-Level Design / Wiki

I’m seeing traditional Low-Level Design documents being created less and less. Assuming the HLD has sufficient detail, it can often be used as the starting point for implementing the infrastructure as code and CI/CD pipelines.

Depending on an organization’s maturity and experience with IaC, they may be comfortable with not creating an LLD. I believe that well-commented and structured IaC is largely self-documenting anyway. Supporting documentation can be added to a wiki such as those available in most CI/CD platforms such as Azure DevOps and GitLab.

The advantage of wikis is that they are more easily maintained and accessible. They are useful to include the following types of information -

  • Pipeline guides and how to’s
  • Azure resource naming conventions
  • IP address schemas
  • etc.

Reusability and Best Practice

I’m a big fan of re-using artefacts. Whether they’re design document templates or snippets of ARM templates, or Terraform modules. If I have to create something from scratch, I’ll consider how I can make it reusable and application agnostic. For example, an environment hosting design can often be templatized to allow it to be easily re-used.

This is a good approach if you’re regularly deploying similar Azure environments/subscriptions. Templatizing and creating a library of commonly used artefacts is the best way to accelerate designing for and deploying to Azure consistently.

When facing a particular design challenge in Azure, I’ve often found that there’s a good chance that someone else has already encountered it. For this reason, I find Microsoft’s Azure Architecture Center and Cloud Adoption Framework are good starting points for best practice recommendations and blueprints.

However, these are always evolving. For me, part of the fun of working with Azure is finding new ways to use emerging technologies to solve problems. So yes, current best practice is a good foundation, but don’t let it constrain your thinking and innovation.

This is something I regularly hear from Microsoft. They create an Azure service envisaging how it’s going to be used by customers. But they are constantly surprised by how customers find unique ways to use that service in ways they hadn’t imagined.

Some other tips to assist with the design process include -

  • Leverage any existing Microsoft agreement (EA, Fast Track etc.) to get design input. Their teams’ goal is to help customers be successful on Azure (and consuming and paying for Azure services of course!)
  • Raise a support ticket via the Azure Portal to ask a specific technical question
  • Questions directed to Azure Q&A and Github are a good way of fostering collaboration across the wider community.

Choosing an Azure service

Choosing an Azure service for a solution can sometimes be challenging, given there are hundreds available and many overlap in their functionality. My recommendation is to consider -

  • Requirements — does it meet the project’s requirements?
  • Familiarity — do the operations and application team know the service?
  • Scalability — how does the service support current and future requirements?
  • Resilience and availability — does the service have the required SLA?
  • Cost — is the cost acceptable?
  • Limitations — what limitations does the service have?

Design tools

Apart from the standard Office applications, there are a few tools that I find useful when creating design artefacts -

Keeping up to date

New and updated services can influence an Azure implementation. On larger Azure projects of 6–9 months, it’s possible for an Azure service to be announced, enter preview and go into GA during the project timeframe.

Equally, an existing service that had a particular limitation could be updated during the project. An example of this is Azure Bastion. Originally Bastion had to be deployed for every VNET that you wished to access. It’s now possible to deploy a single Bastion and enable it across multiple VNET’s. This has a positive impact in terms of increasing simplicity and reducing cost.

Often the most difficult decision is when to take advantage of service improvements. For example, if preparing to go live, then it’s usually best to accumulate a backlog of future improvements.

I find the official Azure Updates site the best place to keep up with the latest service updates. There are lots of updates, so consuming this via your favourite RSS reader is the best way to filter out what’s relevant!

Conclusion

Starting a new Azure project from scratch can be a daunting task. It’s tempting to have a big list of cool Azure services that you wish to implement. My recommendation is to start small, get the fundamentals right and learn to iterate swiftly. Only introduce an Azure service where there is a clear requirement to do so.

I hope you’ve found this post useful. It’s not meant to be an exhaustive list of resources or guidance but reflects my experience. I’m really interested in your experiences and any tips or approaches you’d like to share. Please let me know in the comments below!

Useful links

Originally published at https://medium.com on June 12, 2021.

--

--

Shanmuganathan Raju

A Multicloud Architect, with more than 16 years in the information technology industry with experiences in architecting, solution designing and Cloud Migration.