Disaster protection for JFrog Artifactory in AKS with Astra Control Service and Azure NetApp Files

We describe how to protect a multi-tier application with multiple components (like JFrog Artifactory) on Azure Kubernetes Service against disasters like the complete loss of a region with NetApp® Astra™ Control Service and Azure NetApp Files. We demonstrate how the use of pre- and post-snapshot execution hooks in Astra Control Service enables us to create application-consistent snapshots and backups across all application tiers and recover the application to a different region in case of a disaster.

Co-authors: Patric Uebele, Sayan Saha

Introduction

NetApp® Astra™ Control is a solution that makes it easy to manage, protect, and move data-rich Kubernetes workloads within and across public clouds and on-premises. Astra Control provides persistent container storage that leverages NetApp’s proven and expansive storage portfolio in the public cloud and on premises, supporting Azure managed disks as storage backend options as well.

Astra Control also offers a rich set of application-aware data management functionality (like snapshot and restore, backup and restore, activity logs, and active cloning) for local data protection, disaster recovery, data audit, and mobility use cases for your modern apps. Astra Control provides complete protection of stateful Kubernetes applications by saving both data and metadata, like deployments, config maps, services, secrets, that constitute an application in Kubernetes. Astra Control can be managed via its user interface, accessed by any web browser, or via its powerful REST API.

Astra Control’s capability of adding execution hooks that can be executed before and/or after snapshots, backups, and before restores enables application consistent snapshots and backups by quiescing the applications before snapshot creation, as well as customized restores. The Verda open-source project hosts a variety of execution hooks for Astra Control.

Astra Control comes in two variants:

Astra Control Service (ACS) – A fully managed application-aware data management service that supports Azure Kubernetes Service (AKS), Azure Disk Storage, and Azure NetApp Files (ANF).
Astra Control Center (ACC) – application-aware data management for on-premises Kubernetes clusters, delivered as a customer-managed Kubernetes application from NetApp.

To showcase Astra Control’s backup and recovery capabilities in AKS, we use JFrog Artifactory, a universal binary and artifact manager that is used in the continuous integration (CI) / continuous delivery (CD) workflow in the DevOps process. JFrog Artifactory expedites application delivery and enables faster software releases.

JFrog Artifactory has two components in its setup. One component is a database (PostgreSQL in our example) that stores the metadata information about artifacts, builds, and binary packages, and the other component is a repository that stores the files as checksums. Both components use persistent volumes to store data in persistent volumes backed by Azure NetApp Files.

Scenario

In the following, we will demonstrate how Astra Control can protect an Artifactory installation by taking application consistent snapshots and backups and test the protection scheme in a disaster recovery simulation across two AKS clusters in separate regions.

Installing Artifactory

We deploy Artifactory on AKS cluster pu-aks-1 in the Azure region westeurope. The cluster is managed by our ACS account already, with Azure NetApp Files in service level standard (storage class netapp-anf-perf-standard) chosen as the default storage class. ACS also automatically installed Astra Trident as storage provisioner for persistent volumes backed by Azure NetApp Files:

To deploy the Artifactory application, we follow the instructions from JFrog, using the appropriate repository and helm chart:

~# helm repo add jfrog Index of jfrog-charts/

~# helm repo update

Hang tight while we grab the latest from your chart repositories...

...Successfully got an update from the "netapp-trident" chart repository

...Successfully got an update from the "jfrog" chart repository

...Successfully got an update from the "bitnami" chart repository

...Successfully got an update from the "azure-marketplace" chart repository

Update Complete. ⎈Happy Helming!⎈

~# helm upgrade --install artifactory --namespace artifactory jfrog/artifactory --create-namespace

Release "artifactory" does not exist. Installing it now.

NAME: artifactory

LAST DEPLOYED: Tue Dec 6 10:21:38 2022

NAMESPACE: artifactory

STATUS: deployed

REVISION: 1

TEST SUITE: None

NOTES:

Congratulations. You have just deployed JFrog Artifactory!

1. Get the Artifactory URL by running these commands:

NOTE: It may take a few minutes for the LoadBalancer IP to be available.

You can watch the status of the service by running 'kubectl get svc --namespace artifactory -w artifactory-artifactory-nginx'

export SERVICE_IP=$(kubectl get svc --namespace artifactory artifactory-artifactory-nginx -o jsonpath='{.status.loadBalancer.ingress[0].ip}')

echo http://$SERVICE_IP/

2. Open Artifactory in your browser

Default credential for Artifactory:

user: admin

password: password

After some minutes, all pods and services are up:

~# kubectl get all,pvc -n artifactory

NAME READY STATUS RESTARTS AGE

pod/artifactory-0 1/1 Running 0 11m

pod/artifactory-artifactory-nginx-5cb99466fd-wvh8j 1/1 Running 1 (3m13s ago) 11m

pod/artifactory-postgresql-0 1/1 Running 0 11m

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE

service/artifactory ClusterIP 10.0.245.232 <none> 8082/TCP,8081/TCP 11m

service/artifactory-artifactory-nginx LoadBalancer 10.0.46.129 20.103.196.178 80:30656/TCP,443:30734/TCP 11m

service/artifactory-postgresql ClusterIP 10.0.226.193 <none> 5432/TCP 11m

service/artifactory-postgresql-headless ClusterIP None <none> 5432/TCP 11m

NAME READY UP-TO-DATE AVAILABLE AGE

deployment.apps/artifactory-artifactory-nginx 1/1 1 1 11m

NAME DESIRED CURRENT READY AGE

replicaset.apps/artifactory-artifactory-nginx-5cb99466fd 1 1 1 11m

NAME READY AGE

statefulset.apps/artifactory 1/1 11m

statefulset.apps/artifactory-postgresql 1/1 11m

NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE

persistentvolumeclaim/artifactory-volume-artifactory-0 Bound pvc-48307448-6f2f-4a3e-ace1-b7b1e9cfeed5 100Gi RWO netapp-anf-perf-standard 11m

persistentvolumeclaim/data-artifactory-postgresql-0 Bound pvc-d153c0d3-cf57-4207-8c7a-63f482c1dd89 200Gi RWO netapp-anf-perf-standard 11m

and we can set the SERVICE_IP:

~# export SERVICE_IP=$(kubectl get svc --namespace artifactory artifactory-artifactory-nginx -o jsonpath='{.status.loadBalancer.ingress[0].ip}')

~# echo http://$SERVICE_IP/

http://20.103.196.178/

Then we add the SERVICE_IP to the FQDN arti1.astrarocks.pu-store.de in our test domain astrarocks.pu-store.de for easier access to the Artifactory service:

~# nslookup arti1.astrarocks.pu-store.de

Server: 192.168.178.73

Address: 192.168.178.73#53

Non-authoritative answer:

Name: arti1.astrarocks.pu-store.de

Address: 20.103.196.178

Now we can connect to the Artifactory instance with the initial admin credentials created during the installation, using the FQDN arti1.astrarocks.pu-store.de and start its configuration:

After entering the Artifactory (trial) license key, we first change the admin password and then create a second user:

In the next step, let’s add a Docker and a Helm repository:

Protecting Artifactory with Astra Control Service

After this initial configuration of Artifactory, we can start to manage and protect it with ACS. Switching to the ACS UI and checking for discovered namespaces in Applications -> Namespaces, we see that ACS already discovered the artifactory namespace in which we deployed the Artifactory instance. We define the complete namespace as application artifactory directly from the Actions menu:

Navigating to the application details, we see that the artifactory application is not protected yet:

Execution hooks for PostgreSQL database

To ensure that snapshots and backups are created in an application-consistent way, we utilize pre- and post-snapshot hooks for the PostgreSQL database. The Verda open-source project hosts a variety of execution hooks for Astra Control, including hooks for PostgreSQL, quiescing the database before taking any snapshots and backups. For use with Astra Control, we can upload the needed hook scripts from our workstation into the Astra Control account in Accounts -> Scripts:

With the hook script for PostgreSQL uploaded to our ACS account, we can now add it to the artifactory application in the app’s details -> Execution hooks:

We start with the pre-snapshot hook:

The containers in which the hooks will be run are selected based on container image names – regular expressions can be used. To find the container image names used for PostgreSQL, use kubectl describe:

~# kubectl describe pod/artifactory-postgresql-0 -n artifactory | grep Image

Image: releases-docker.jfrog.io/bitnami/postgresql:13.4.0-debian-10-r39

Image ID: releases-docker.jfrog.io/bitnami/postgresql@sha256:abfb7efd31afc36a8b16aa077bb9dd165c4f635412affef37c7859605fda762c

And then add the post-snapshot hook for PostgreSQL:

Finally, we check the container image matches and confirm that the hooks will be executed in the PostgreSQL containers:

Check the ACS documentation, the Verda documentation, and this blog post to learn more about execution hooks in Astra Control.

Protecting the application

With execution hooks for PostgreSQL configured, we can no begin to protect the artifactory application with snapshots and backups. Let’s first take an on-demand snapshot to test the proper execution of the hooks from the Data protection tab in the app’s details:

We accept the default snapshot name and start the snapshot creation:

As Azure NetApp Files' snapshots are based on the proven ONTAP snapshot technology, the snapshot creation is fast and efficient. In Astra Control’s Activity view, we can follow the steps of the snapshot creation and confirm that the pre- and post-snapshot hooks were executed correctly:

To protect the application on a regular basis, we now configure a protection policy in the application’s Data protection tab, where we can also see the just created on-demand snapshot listed:

We configure a protection schedule with an hourly snapshot on the 30th minute, keeping the last four snapshots, and a daily backup at 12:00 UTC, keeping one backup:

Once the first scheduled backup and snapshot are created, the application’s protection status in Astra Control changes to Fully protected, as it’s now protected with snapshots and backups regularly:

Simulating disaster and recover the application to another cluster in a different region

In the next step, we want to test the recovery of the Artifactory platform after a simulated disaster. To simulate the complete loss of the cluster pu-aks-test-1 hosting the artifactory application, we delete the cluster and its resources using a little script:

~#./AKS_compute.sh delete pu-aks-test-1 westeurope rg-patricu-westeu

./AKS_compute.sh: Checking Azure login

./AKS_compute.sh: Getting AKS credentials

Merged "pu-aks-test-1" as current context in /root/.kube/config

NAME STATUS ROLES AGE VERSION

aks-nodepool1-33509899-vmss000000 Ready agent 23h v1.23.12

aks-nodepool1-33509899-vmss000001 Ready agent 23h v1.23.12

aks-nodepool1-33509899-vmss000002 Ready agent 23h v1.23.12

./AKS_compute.sh: Getting node resource group ...MC_rg-patricu-westeu_pu-aks-test-1_westeurope

./AKS_compute.sh: Getting vnet name, please be patient

./AKS_compute.sh: vnet = aks-vnet-18022328

./AKS_compute.sh: Deleting non-system namespaces

./AKS_compute.sh: Deleting namespace artifactory

namespace "artifactory" deleted

./AKS_compute.sh: Wait until all PVs are deleted, please be patient

./AKS_compute.sh: Waiting for 2 PVs to be deleted for 1 min....

No resources found

./AKS_compute.sh: Waiting for 0 PVs to be deleted for 2 min....

Setting ANF subnet name ANF_SUBNET=anf-sso-subnet-pu-pu-aks-test-1

./AKS_compute.sh: Deleting subnet anf-sso-subnet-pu-pu-aks-test-1 from ANF

./AKS_compute.sh: Deleting AKS cluster pu-aks-test-1 in resource group rg-patricu-westeu

./AKS_compute.sh: Cleaning up

./AKS_compute.sh: Deleting context pu-aks-test-1

warning: this removed your active context, use "kubectl config use-context" to select a different one

deleted context pu-aks-test-1 from /root/.kube/config

./AKS_compute.sh: Deleting cluster pu-aks-test-1

deleted cluster pu-aks-test-1 from /root/.kube/config

ACS will detect that both the application and the cluster are not reachable anymore after a short while:

And ACS puts both the cluster and the application in state Unavailable:

As the snapshots are stored locally and hence are not available anymore after deleting all the cluster resources, the application protection status is now Partially protected:

The backups are stored in object storage and we can add buckets with a very high level of redundancy to Astra Control (see the ACS documentation and this blog post for instructions on how to add additional buckets to Astra Control for storing your backups), the backups will be available even after the loss of a region and we can recover the application in such a scenario from an existing backup, as we’ll show further down.

To recover the application from our simulated loss of a complete Azure region, we bring up a new AKS cluster pu-aks-test-2 in the Azure region northeurope and add it to ACS. As in our example we’re working with the same Azure subscription, we can use the existing Service Principal in Clusters -> Add to discover and manage the newly deployed AKS cluster:

We selected the same default storage class (netapp-anf-perf-standard) as in the original cluster:

Astra Control now manages the new cluster in the northeurope region:

Now we can initiate the restore of the artifactory application from backup to pu-aks-test-2 in region northeurope from the scheduled backup. From the Data protection tab in the application’s details, we can initiate the restore directly from the Actions menu next to the backup:

To be able to restore to a different cluster, select Restore to new namespace, select the destination cluster pu-aks-test-2 from the dropdown menu and enter the namespace name for the restore – we’re simply using the same namespace artifactory:

Next, we confirm the restore source:

After reviewing the restore information, we can start the restore process:

With kubectl, we can follow the creation of the artifactory namespace on the destination cluster:

~# kubectl config use-context pu-aks-test-2

Switched to context "pu-aks-test-2".

~# kubectl get ns

NAME STATUS AGE

artifactory Active 2m15s

default Active 3h44m

kube-node-lease Active 3h44m

kube-public Active 3h44m

kube-system Active 3h44m

trident Active 5m48s

and can see Astra Control’s restore processes:

~# kubectl get all,pvc -n artifactory

NAME READY STATUS RESTARTS AGE

pod/r-artifactory-volume-artifactory-0-fgswh 0/1 Pending 0 2m35s

pod/r-data-artifactory-postgresql-0-rk5c5 0/1 Pending 0 2m34s

NAME COMPLETIONS DURATION AGE

job.batch/r-artifactory-volume-artifactory-0 0/1 2m35s 2m35s

job.batch/r-data-artifactory-postgresql-0 0/1 2m34s 2m34s

NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE

persistentvolumeclaim/artifactory-volume-artifactory-0 Pending netapp-anf-perf-standard 2m36s

persistentvolumeclaim/data-artifactory-postgresql-0 Pending netapp-anf-perf-standard 2m36s

Once the data transfer from the backup finishes, Astra Control recreates the rest of the application resources, and the pods and services will come up.

~# kubectl get all,pvc -n artifactory

NAME READY STATUS RESTARTS AGE

pod/artifactory-0 1/1 Running 0 3m40s

pod/artifactory-artifactory-nginx-5cb99466fd-h6zsl 1/1 Running 0 3m38s

pod/artifactory-postgresql-0 1/1 Running 0 3m40s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE

service/artifactory ClusterIP 10.0.3.17 <none> 8082/TCP,8081/TCP 3m37s

service/artifactory-artifactory-nginx LoadBalancer 10.0.98.142 20.166.200.14 80:30947/TCP,443:32154/TCP 3m35s

service/artifactory-postgresql ClusterIP 10.0.27.131 <none> 5432/TCP 3m38s

service/artifactory-postgresql-headless ClusterIP None <none> 5432/TCP 3m38s

NAME READY UP-TO-DATE AVAILABLE AGE

deployment.apps/artifactory-artifactory-nginx 1/1 1 1 3m38s

NAME DESIRED CURRENT READY AGE

replicaset.apps/artifactory-artifactory-nginx-5cb99466fd 1 1 1 3m38s

NAME READY AGE

statefulset.apps/artifactory 1/1 3m41s

statefulset.apps/artifactory-postgresql 1/1 3m40s

NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE

persistentvolumeclaim/artifactory-volume-artifactory-0 Bound pvc-d01ef1a7-dd30-4d46-85ec-49ec3eeb49b7 100Gi RWO netapp-anf-perf-standard 9m25s

persistentvolumeclaim/data-artifactory-postgresql-0 Bound pvc-fdf18583-fc43-4daf-963b-63a8153319fa 200Gi RWO netapp-anf-perf-standard 9m25s

As the LoadBalancer service will come up with a new external IP, we reconfigure the FQDN arti1.astrarocks.pu-store.de with the new public IP address in our domain service:

~ % nslookup arti1.astrarocks.pu-store.de

Server: 192.168.178.73

Address: 192.168.178.73#53

Non-authoritative answer:

Name: arti1.astrarocks.pu-store.de

Address: 20.166.200.14

So now we can login again to the restored Artifactory service, now running on the AKS cluster pu-aks-test-2 in the region northeurope:

Login with the user created during the Artifactory configuration works and the Docker and Helm repositories are available:

Summary

In this article we described how we can make JFrog Artifactory running on AKS using Azure Disk Storage and Azure NetApp Files resilient to disasters, enabling us to provide business continuity for the platform. NetApp® Astra™ Control makes it easy to protect business-critical AKS workloads (stateful and stateless) with just a few clicks. Get started with Astra Control Service today with a free plan.

Additional information

Quote

Sign In

Disaster protection for JFrog Artifactory in AKS with Astra Control Service and Azure NetApp Files

Featured Replies

Join the conversation

Account

Navigation

Search