Jump to content

How to deploy gMSA on AKS with Terraform


Recommended Posts

Guest Vinicius Apolinario
Posted

The other day I posted a blog on how to deploy an AKS cluster that is ready for Windows workloads using Terraform. Today, I wanted to expand that to include gMSA, which is a highly requested feature from Windows customers running containers on AKS. Obviously, the complexity of the Terraform template grows a lot, so this blog post will provide the details on what is needed for that to work.

 

 

 

gMSA requirements and items outside of Terraform scope

 

Before diving into the Terraform template, it’s important to review the gMSA pre-requisites and what is not part of the scope of Terraform when deploying the Azure resources:

 

  • Azure resources: As part of the gMSA environment, we need different Azure resources, such as an AKS cluster, Azure Virtual Network, Azure Key Vault, Azure Managed Identity, access for the Managed Identity to the Azure Key Vault, a secret in the Azure Key Vault containing the standard user that retrieves the GMSA, and a Domain Controller. All of these will be created using the Terraform template.
  • Non-Azure resources: To use gMSA, you will need to manually configure Active Directory in the Domain Controller VM. This includes installing the AD role, creating the a new forest with a root domain, and enabling gMSA in AD via the KDS feature. You also need to install the gMSA credential spec on your AKS cluster. These two operations are very sensitive, and the credential spec needs to be configured according to your environment.

 

A few notes on the Terraform template:

 

  1. The template deploys a Domain Controller. If your environment has a Domain Controller with Active Directory configured, you can remove this section of the Terraform template. Keep in mind that your AKS cluster needs to be configured with the IP address of the DC, so you will need to change that in the template. Also, make sure you read my other blog post with networking and AD considerations for gMSA on AKS.
  2. The script uses the same username and password for the Windows nodes on AKS and the Domain Controller. This is just so it’s easier for the deployment, but there’s no need to use the same info and you can update the template to use a different one.
  3. The standard user account stored in Azure Key Vault doesn’t exist in AD at the moment on which this script runs – the DC is being created by the script. Make sure you create the user account with the same username and password as you provided when you deployed the template.

 

Since this is a more complex Terraform template, I invite you to collaborate on it and if you see an opportunity for improvement, please send your suggestions!

 

 

 

gMSA on AKS Terraform template

 

The Terraform deployment has two files. The main.tf file contains the resources to be deployed. The variables.tf file contains the variables used during the deployment. Note that some of the variables’ values are not set in the file, both because you need to define it for the deployment and because some are sensitive, such as passwords.

 

Here is the main.tf file:

 

terraform {

required_providers {

azurerm = {

source = "hashicorp/azurerm"

version = "=3.55.0"

}

}

}

 

data "azurerm_client_config" "current" {}

data "azurerm_subscription" "current" {}

 

provider "azurerm" {

features {

key_vault {

purge_soft_delete_on_destroy = true

recover_soft_deleted_key_vaults = false

}

}

}

 

#Creates Azure Resource Group

resource "azurerm_resource_group" "rg" {

name = var.resource_group

location = var.location

}

 

#Creates Azure User Assigned Managed Identity

resource "azurerm_user_assigned_identity" "managed_identity" {

location = azurerm_resource_group.rg.location

resource_group_name = azurerm_resource_group.rg.name

name = "gmsami"

}

 

#Creates Azure Key Vault

resource "azurerm_key_vault" "akv" {

name = "viniapgmsatest"

location = azurerm_resource_group.rg.location

resource_group_name = azurerm_resource_group.rg.name

tenant_id = data.azurerm_client_config.current.tenant_id

soft_delete_retention_days = 90

purge_protection_enabled = false

sku_name = "standard"

}

 

#Assign reader role to MI on Azure Key Vault

resource "azurerm_role_assignment" "mi_akv_reader" {

scope = azurerm_key_vault.akv.id

role_definition_name = "Reader"

principal_id = azurerm_user_assigned_identity.managed_identity.principal_id

}

 

#Define AKV access policy for MI

resource "azurerm_key_vault_access_policy" "akvpolicy" {

key_vault_id = azurerm_key_vault.akv.id

tenant_id = data.azurerm_client_config.current.tenant_id

object_id = azurerm_user_assigned_identity.managed_identity.principal_id

 

secret_permissions = [

"Get"

]

}

 

#Define AKV access for terraform session

resource "azurerm_key_vault_access_policy" "tfpolicy" {

key_vault_id = azurerm_key_vault.akv.id

tenant_id = data.azurerm_client_config.current.tenant_id

object_id = data.azurerm_client_config.current.object_id

 

secret_permissions = [

"Get",

"List",

"Set"

]

}

 

#Creates the secret on Azure Key Vault (careful: this is the standard user on your AD)

resource "azurerm_key_vault_secret" "gmsa_secret" {

name = "gmsasecret"

value = "${var.netbios_name}\\${var.gmsa_username}:${var.gmsa_userpassword}"

key_vault_id = azurerm_key_vault.akv.id

}

 

#Creates Azure Virtual Network

resource "azurerm_virtual_network" "vnet" {

name = "gmsavnet"

location = azurerm_resource_group.rg.location

resource_group_name = azurerm_resource_group.rg.name

address_space = ["10.0.0.0/16","10.1.0.0/26"]

}

 

#Creates the gMSA Subnet - both pods and Domain Controller will use this subnet

resource "azurerm_subnet" "gmsasubnet" {

name = "gmsasubnet"

resource_group_name = azurerm_resource_group.rg.name

virtual_network_name = azurerm_virtual_network.vnet.name

address_prefixes = ["10.0.0.0/16"]

}

 

#Optional: Creates the Azure Bastion vNEt for RDP into DC01

resource "azurerm_subnet" "AzureBastionSubnet" {

name = "AzureBastionSubnet"

resource_group_name = azurerm_resource_group.rg.name

virtual_network_name = azurerm_virtual_network.vnet.name

address_prefixes = ["10.1.0.0/26"]

}

 

#Creates a vNIC for the DC VM - remove this if you have an existin DC

resource "azurerm_network_interface" "dc01_nic" {

name = "dc01_nic"

location = azurerm_resource_group.rg.location

resource_group_name = azurerm_resource_group.rg.name

 

ip_configuration {

name = "dc01_nic"

subnet_id = azurerm_subnet.gmsasubnet.id

private_ip_address_allocation = "Dynamic"

}

}

 

#Creates the DC VM - remove this if you have an existing VM

#You need to connect to this VM and finish the Active Directory configuration

resource "azurerm_windows_virtual_machine" "dc01" {

name = "DC01"

resource_group_name = azurerm_resource_group.rg.name

location = azurerm_resource_group.rg.location

size = "Standard_D4s_v3"

admin_username = var.win_username

admin_password = var.win_userpass

network_interface_ids = [

azurerm_network_interface.dc01_nic.id

]

 

os_disk {

caching = "ReadWrite"

storage_account_type = "Standard_LRS"

}

 

source_image_reference {

publisher = "MicrosoftWindowsServer"

offer = "WindowsServer"

sku = "2022-Datacenter"

version = "latest"

}

}

 

#Creates AKS cluster with Windows profile and gMSA enabled, and uses existing vNet

#This is dependable on DC01 VM as we need to set up the DNS primary IP for the Windows nodes

resource "azurerm_kubernetes_cluster" "aks" {

name = "ContosoCluster"

location = azurerm_resource_group.rg.location

resource_group_name = azurerm_resource_group.rg.name

dns_prefix = "contosocluster"

 

default_node_pool {

name = "lin"

node_count = var.node_count_linux

vm_size = "Standard_D2_v2"

vnet_subnet_id = azurerm_subnet.gmsasubnet.id

}

 

windows_profile {

admin_username = var.win_username

admin_password = var.win_userpass

gmsa {

dns_server = "10.0.0.4"

root_domain = var.Domain_DNSName

}

}

 

network_profile {

network_plugin = "azure"

service_cidr = "10.240.0.0/16"

dns_service_ip = "10.240.0.10"

}

 

identity {

type = "SystemAssigned"

}

depends_on = [

azurerm_windows_virtual_machine.dc01

]

}

 

#Creates Windows node pool on AKS cluster

resource "azurerm_kubernetes_cluster_node_pool" "win" {

name = "wspool"

kubernetes_cluster_id = azurerm_kubernetes_cluster.aks.id

vm_size = "Standard_D4s_v3"

node_count = var.node_count_windows

os_type = "Windows"

}

 

output "kube_config" {

value = azurerm_kubernetes_cluster.aks.kube_config_raw

sensitive = true

}

 

#Assigns the User assigned Managed Identity to the Windows node pool

resource "null_resource" "identity_assign" {

provisioner "local-exec" {

command = "az vmss identity assign -g MC_${azurerm_resource_group.rg.name}_${azurerm_kubernetes_cluster.aks.name}_${azurerm_resource_group.rg.location} -n aks${azurerm_kubernetes_cluster_node_pool.win.name} --identities /subscriptions/${data.azurerm_subscription.current.subscription_id}/resourcegroups/${azurerm_resource_group.rg.name}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/${azurerm_user_assigned_identity.managed_identity.name}"

}

depends_on = [

azurerm_kubernetes_cluster_node_pool.win

]

}

 

#Update the VMSS instances

resource "null_resource" "vmss_update" {

provisioner "local-exec" {

command = "az vmss update-instances -g MC_${azurerm_resource_group.rg.name}_${azurerm_kubernetes_cluster.aks.name}_${azurerm_resource_group.rg.location} -n aks${azurerm_kubernetes_cluster_node_pool.win.name} --instance-ids *"

}

depends_on = [

null_resource.identity_assign

]

}

 

#Optional: Creates a public IP address for the Azure Bastion host

resource "azurerm_public_ip" "bastion_ip" {

name = "bastionip"

location = azurerm_resource_group.rg.location

resource_group_name = azurerm_resource_group.rg.name

allocation_method = "Static"

sku = "Standard"

}

 

#Optional: Creates a Bastion Host to connect to the DC VM via RDP

resource "azurerm_bastion_host" "gmsa_dc_bastion" {

name = "gmsabastion"

location = azurerm_resource_group.rg.location

resource_group_name = azurerm_resource_group.rg.name

 

ip_configuration {

name = "configuration"

subnet_id = azurerm_subnet.AzureBastionSubnet

public_ip_address_id = azurerm_public_ip.bastion_ip.id

}

}

 

Here is the variables.tf file:

 

variable "resource_group" {

type = string

description = "Resource group name"

default = "58TestRG"

}

 

variable "location" {

type = string

description = "RG and resources location"

default = "East US"

}

 

variable "node_count_linux" {

type = number

description = "Linux nodes count"

default = 1

}

 

variable "node_count_windows" {

type = number

description = "Windows nodes count"

default = 2

}

 

variable "win_username" {

description = "Windows node username"

type = string

sensitive = false

}

 

variable "win_userpass" {

description = "Windows node password"

type = string

sensitive = true

}

 

variable "Domain_DNSName" {

description = "FQDN for the Active Directory forest root domain"

type = string

sensitive = false

}

 

variable "netbios_name" {

description = "NETBIOS name for the AD domain"

type = string

sensitive = false

}

 

variable "SafeModeAdministratorPassword" {

description = "Password for AD Safe Mode recovery"

type = string

sensitive = true

}

 

variable "gmsa_username" {

description = "Username for the standard domain account"

type = string

sensitive = false

}

 

variable "gmsa_userpassword" {

description = "Password for standard domain account"

type = string

sensitive = true

}

 

With the two files in the same folder, you can run:

 

az login

az account set <subscription ID>

terraform init

terraform apply

 

I did not include the -auto-approve flag as you probably want to confirm that everything will run as you expected. Once you have the plan for the deployment, type yes and continue with it.

 

Now, let me go over the details of this template:

 

We start by creating a Resource group. The information about name and location for the RG are in the variables.tf file.

 

Next, we create the auxiliary Azure services (Key Vault and user assigned managed identity). You could use the regular identity from the AKS cluster once it’s deployed. I decided to go with a new one for testing and learning purposes. We then assign the managed identity a reader role to Azure Key Vault, and give it the “Get” permission for secrets. This is what will allow the managed identity to read the standard user account to then connect to AD. We then create the secret on Key Vault. Note that we also give the Terraform session itself list and set permissions to the Key Vault, so it can write the value of the standard user account into the secrets of that Key Vault.

 

Moving on, we create the Azure virtual network, and two subnets. One for the AKS cluster and Domain Controller VM, and another for Azure Bastion. This last one is optional as you might not need it, but I added it just in case.

 

To create the Domain Controller VM, we create a network interface associated with the gMSA subnet, and then create the Windows VM on Azure with the vNIC associated with it. Here you can change the size and disk of the VM - depending on your environment and cost limitations. The image used here is a Windows Server 2022 image. While that’s the recommended version, this deployment would work with Windows Server 2019. Keep in mind that you need to RDP/connect into this VM to finish the Active Directory configuration – this is outside the scope of this template.

 

We then finally create the AKS cluster. This is a standard AKS cluster with a simple default node pool with Linux nodes. Note that the subnet associated with it is the gMSA subnet created earlier. We also use a Windows profile for this cluster and already configure gMSA. IMPORTANT: At this moment, you must indicate the gMSA DNS server and FQDN of the AD root domain. If you have an existing DC that is a DNS server, you should pass on the internal IP address of that machine. This is just like adding a primary (and secondary) DNS server on the IP configuration of a Windows instance. However, if you are using this template for deploying your DC, do not change the DNS Server here. Since the DC VM was the first to be created in the subnet, it gets the first available IP address, which in this case is 10.0.0.4, hence the configuration on the template. For that to work, I set the “depends_on” flag on this resource. (In other words, the AKS cluster is created after the DC VM). Next, the Windows node pool is created with standard configurations. Here you can change the number of Windows nodes and the VM size.

 

The final steps in the template are to assign the managed identity to the Virtual Machine Scale Set (VMSS) of the Windows node pool and then update it. Since the managed identity has access to the Azure Key Vault, and we’re associating the managed identity to the VMSS, all nodes in that VMSS will be able to access the secret and authenticate with AD.

 

 

 

Post installation steps

 

The template does the heavy lifting of creating the Azure resources for the gMSA to work. As mentioned before, there are additional steps, so let me just go over it once again:

 

  • Finish the AD preparation on the DC VM.
    • This includes deploying Active Directory itself and configuring the KDS service.
    • You need to create the gMSA account which will be used in the credential spec.
    • You also need to create the standard user account to be stored in the Axure Key Vault.

    [*]Deploy the credential spec.

    • This is environment and application specific. Just keep in mind that some parameters used in the Terraform template are also needed in the credential spec.

 

 

 

Conclusion

 

It is possible to deploy a gMSA application on Windows containers on an AKS cluster. Automating this process reduces the chances of errors in the future and allows you to set up a CI/CD pipeline. This blog post covered the Terraform deployment of Azure resources for gMSA on AKS to work. It deploys all the Azure resources and configures it, while some environment specific actions are still needed.

 

I hope this is helpful. No doubt you’ll need to modify the template to your environment. Luckly, you can leverage the ITOpsTalk repo to do that – and even let us know if you have any feedback it by submitting a PR! Let us know what you think!

 

Continue reading...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...