Accelerating Java Applications on Azure Kubernetes Service with CRaC

  • Thread starter Thread starter Xiaoyun_Ding
  • Start date Start date
X

Xiaoyun_Ding

Overview​

Java applications often face startup delays due to their runtime initialization and class loading processes. In the cloud-native era, applications start and stop more frequently, with an increasing need for scale-out to accommodate dynamic traffic demands, making this issue even more prominent. To mitigate this, CRaC (Coordinated Restore at Checkpoint) offers a solution to this challenge by allowing applications to be checkpointed and restored, thus avoiding lengthy startup time after the first initialization. Based on the experiment on the Spring PetClinic project project, we observed a 7x improvement in startup speed after enabling CRaC on Azure Kubernetes Service.



In the final section, we will discuss CRaC's limitations and potential future developments. We welcome your feedback, which will help us continue improving and optimizing Java on Azure. Feel free to share your thoughts in the comments section at the end of this article.



Next, we will walk through how to:



1. Package and containerize a Java application locally.
2. Deploy it to Azure Kubernetes Service (AKS).
3. Utilize CRaC to create a checkpoint.
4. Create a new application to restore from the checkpoint.
5. Compare the startup performance between the original and restored applications.



Packaging a Java Application​

Before we deploy our Java application to AKS, we need to package it and create a container image. Follow these steps to clone and package the application:

1. Clone the Repository and build the Application:​

For this example, we will use the popular Spring PetClinic project, which can be found on GitHub.

Code:
 git clone -b crac-poc https://github.com/leonard520/spring-petclinic.git
 cd spring-petclinic

Note, this repo is a fork of the official Spring PetClinic project. The only modification made is the addition of Spring CRaC dependencies.

Code:
<dependency>
       <groupId>org.crac</groupId>
       <artifactId>crac</artifactId>
       <version>1.4.0</version>
</dependency>
For more details, please refer to JVM Checkpoint Restore :: Spring Framework

2. Create a Dockerfile:​

Create a Dockerfile to define how your application will be containerized. Note, the Zulu JVM, which offers good support for CRaC, is used here. In the Java startup parameters, the location where the checkpoint image will be stored has been added.



Code:
FROM azul/zulu-openjdk:17-jdk-crac-latest as builder
   
WORKDIR /home/app
ADD . /home/app/spring-petclinic
RUN cd spring-petclinic && ./mvnw -Dmaven.test.skip=true clean package

FROM azul/zulu-openjdk:17-jdk-crac-latest

WORKDIR /home/app
EXPOSE 8080
COPY --from=builder /home/app/spring-petclinic/target/*.jar petclinic.jar
ENTRYPOINT ["java", "-XX:CRaCCheckpointTo=/test", "-jar", "petclinic.jar"]



3. Build the Docker Image:​

Use Docker to build the image:

docker build -t spring-petclinic:crac .



Creating a Deployment on Azure Kubernetes Service​

With the application containerized, we can now deploy it on AKS. Follow these steps:



1. Create an AKS Cluster:​

If you don't have an AKS cluster, create one using the Azure CLI:

az aks create --resource-group myResourceGroup --name myAKSCluster --node-count 1 --enable-addons monitoring --generate-ssh-keys



2. Push the Docker Image to Azure Container Registry (ACR):​

If you are using **Azure Container Registry**, tag the image and push it to ACR:

Code:
docker tag spring-petclinic:crac <acr-name>.azurecr.io/spring-petclinic:crac
docker push <acr-name>.azurecr.io/spring-petclinic:crac

3. Create an image pull secret to your ACR​


kubectl create secret docker-registry regcred --docker-server=<acr-name>.azurecr.io --docker-username=<acr-name> --docker-password=<acr-key>



4. Create Azure File to mount to the deployment​

Note, since the speed of restoring from a checkpoint is closely related to disk performance, it is highly recommended to use Azure Storage in the same region.

Code:
az storage account create --name mystorageaccount --resource-group myResourceGroup --location eastus --kind FileStorage --sku Premium_LRS
az storage share-rm create --resource-group myResourceGroup --storage-account mystorageaccount --name myfileshare
az storage account keys list --resource-group myResourceGroup --account-name mystorageaccount
   
kubectl create secret generic azure-secret --from-literal=azurestorageaccountname=mystorageaccount --from-literal=azurestorageaccountkey=<storage-account-key>



5. Create a Kubernetes Deployment:​

Create a deployment YAML file (`deployment.yaml`) for your application:



Code:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 1
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: myapp
        image: <acr-name>.azurecr.io/spring-petclinic:crac
        ports:
        - containerPort: 8080
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            add: # The two capabilities are required to to checkpoint
            - SYS_PTRACE
            - CHECKPOINT_RESTORE
          privileged: false
        volumeMounts:
        - name: crac-storage
          mountPath: /test
      volumes:
      - name: crac-storage
        csi:
          driver: file.csi.azure.com
          volumeAttributes:
            secretName: azure-secret
            shareName: myfileshare
            mountOptions: 'dir_mode=0777,file_mode=0777,cache=strict,actimeo=30,nosharesock,nobrl'
      imagePullSecrets:
      - name: regcred



6. Deploy to AKS:​

Apply the deployment to your AKS cluster:

kubectl apply -f deployment.yaml

7. Check start up logs and duration:​


Code:
kubectl logs -l app=myapp


              |\      _,,,--,,_
             /,`.-'`'   ._  \-;;,_
  _______ __|,4-  ) )_   .;.(__`'-'__     ___ __    _ ___ _______
 |       | '---''(_/._)-'(_\_)   |   |   |   |  |  | |   |       |
 |    _  |    ___|_     _|       |   |   |   |   |_| |   |       | __ _ _
 |   |_| |   |___  |   | |       |   |   |   |       |   |       | \ \ \ \
 |    ___|    ___| |   | |      _|   |___|   |  _    |   |      _|  \ \ \ \
 |   |   |   |___  |   | |     |_|       |   | | |   |   |     |_    ) ) ) )
 |___|   |_______| |___| |_______|_______|___|_|  |__|___|_______|  / / / /
 ==================================================================/_/_/_/

:: Built with Spring Boot :: 3.3.0


2024-09-26T14:59:41.464Z  INFO 129 --- [           main] o.s.s.petclinic.PetClinicApplication     : Starting PetClinicApplication v3.3.0-SNAPSHOT using Java 17.0.12 with PID 129 (/home/app/petclinic.jar started by root in /home/app)
2024-09-26T14:59:41.470Z  INFO 129 --- [           main] o.s.s.petclinic.PetClinicApplication     : No active profile set, falling back to 1 default profile: "default"
2024-09-26T14:59:42.994Z  INFO 129 --- [           main] .s.d.r.c.RepositoryConfigurationDelegate : Bootstrapping Spring Data JPA repositories in DEFAULT mode.
2024-09-26T14:59:43.071Z  INFO 129 --- [           main] .s.d.r.c.RepositoryConfigurationDelegate : Finished Spring Data repository scanning in 66 ms. Found 2 JPA repository interfaces.
2024-09-26T14:59:44.125Z  INFO 129 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat initialized with port 8080 (http)
2024-09-26T14:59:44.134Z  INFO 129 --- [           main] o.apache.catalina.core.StandardService   : Starting service [Tomcat]
2024-09-26T14:59:44.135Z  INFO 129 --- [           main] o.apache.catalina.core.StandardEngine    : Starting Servlet engine: [Apache Tomcat/10.1.24]
2024-09-26T14:59:44.176Z  INFO 129 --- [           main] o.a.c.c.C.[Tomcat].[localhost].[/]       : Initializing Spring embedded WebApplicationContext
2024-09-26T14:59:44.178Z  INFO 129 --- [           main] w.s.c.ServletWebServerApplicationContext : Root WebApplicationContext: initialization completed in 2595 ms
2024-09-26T14:59:44.560Z  INFO 129 --- [           main] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Starting...
2024-09-26T14:59:44.779Z  INFO 129 --- [           main] com.zaxxer.hikari.pool.HikariPool        : HikariPool-1 - Added connection conn0: url=jdbc:h2:mem:131e3017-7e28-4a31-b704-5d3840cd46d6 user=SA
2024-09-26T14:59:44.781Z  INFO 129 --- [           main] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Start completed.
2024-09-26T14:59:45.011Z  INFO 129 --- [           main] o.hibernate.jpa.internal.util.LogHelper  : HHH000204: Processing PersistenceUnitInfo [name: default]
2024-09-26T14:59:45.073Z  INFO 129 --- [           main] org.hibernate.Version                    : HHH000412: Hibernate ORM core version 6.5.2.Final
2024-09-26T14:59:45.113Z  INFO 129 --- [           main] o.h.c.internal.RegionFactoryInitiator    : HHH000026: Second-level cache disabled
2024-09-26T14:59:45.451Z  INFO 129 --- [           main] o.s.o.j.p.SpringPersistenceUnitInfo      : No LoadTimeWeaver setup: ignoring JPA class transformer
2024-09-26T14:59:46.466Z  INFO 129 --- [           main] o.h.e.t.j.p.i.JtaPlatformInitiator       : HHH000489: No JTA platform available (set 'hibernate.transaction.jta.platform' to enable JTA platform integration)
2024-09-26T14:59:46.468Z  INFO 129 --- [           main] j.LocalContainerEntityManagerFactoryBean : Initialized JPA EntityManagerFactory for persistence unit 'default'
2024-09-26T14:59:46.826Z  INFO 129 --- [           main] o.s.d.j.r.query.QueryEnhancerFactory     : Hibernate is in classpath; If applicable, HQL parser will be used.
2024-09-26T14:59:48.666Z  INFO 129 --- [           main] o.s.b.a.e.web.EndpointLinksResolver      : Exposing 14 endpoints beneath base path '/actuator'
2024-09-26T14:59:48.778Z  INFO 129 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port 8080 (http) with context path '/'
2024-09-26T14:59:48.810Z  INFO 129 --- [           main] o.s.s.petclinic.PetClinicApplication     : Started PetClinicApplication in 8.171 seconds (process running for 8.862)
As you can see, the startup typically takes a little over 8 seconds.

Creating a Checkpoint with CRaC​

With the application running, the next step is to create a checkpoint using CRaC.



1. Create the Checkpoint:​

Once the application reaches the desired state (e.g., after fully initializing), issue a checkpoint command. CRaC will capture the application's state, which can later be restored for fast startups. The image will be stored in the external volumes in the Azure Storage file share created just before.

kubectl exec -it <pod-name> -- jcmd petclinic JDK.checkpoint



Restoring from the Checkpoint​

Now that we have created a checkpoint, we can package this state into a new Docker image and deploy it for fast restores.



1. Update deployment to restore Image in AKS:​

Modify your deployment YAML to use the restored command when start the container:

Code:
containers:
- command:
  - java
  - -XX:CRaCRestoreFrom=/test



Apply the changes:



kubectl apply -f deployment.yaml





2. Check startup time​


Code:
kubectl logs -l app=myapp

2024-09-26T15:01:42.400Z  INFO 129 --- [Attach Listener] o.s.c.support.DefaultLifecycleProcessor  : Restarting Spring-managed lifecycle beans after JVM restore
2024-09-26T15:01:42.396Z  WARN 129 --- [l-1 housekeeper] com.zaxxer.hikari.pool.HikariPool        : HikariPool-1 - Thread starvation or clock leap detected (housekeeper delta=4m9s910ms846?s988ns).
2024-09-26T15:01:42.473Z  INFO 129 --- [Attach Listener] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port 8080 (http) with context path '/'
2024-09-26T15:01:42.474Z  INFO 129 --- [Attach Listener] o.s.c.support.DefaultLifecycleProcessor  : Spring-managed lifecycle restart completed (restored JVM running for 1009 ms)
This time, the startup took just over one second!



Performance Comparison​

The final step is to compare the startup times of the original and restored versions of the application.



1. Measure Startup Time:​

For both the original and restored applications, measure the time it takes from container start to application readiness. Compared to the original startup, which took over 8 seconds, restoring from the checkpoint reduced the startup time to just over 1 second—a 7x improvement. What's more, this significant boost only requires adding the CRaC dependency, without any additional code modifications.



2. Compare Results:​

Besides, the CRaC-enabled application should demonstrate significantly faster startup times due to restoring from the pre-initialized checkpoint. You can achieve this by creating the checkpoint after giving your Java application sufficient time to warm up.



Conclusion​

In this post, we walked through how to leverage CRaC to accelerate the startup of a Java application running on Azure Kubernetes Service. By checkpointing a fully-initialized application and restoring it later, we can drastically reduce startup times, improving performance for both cold and warm starts in containerized environments. CRaC is a promising technology, especially in environments where fast application startup is critical, such as serverless platforms or microservices architectures.
As a comparison, Spring Native is another way to improve performance. Spring Native enables developers to compile Spring applications into native binaries using GraalVM, offering extremely fast startup and low memory usage, which is ideal for short-lived, stateless services. CRaC maintains full JVM capabilities, while Spring Native may require code adjustments and has longer build times.



However, as a relatively new technology, CRaC has its own limitations. For instance, many third-party libraries do not yet support CRaC. Currently, Spring Boot, Quarkus, and Micronaut all support CRaC, but there are still many frameworks and libraries that need to be adapted for CRaC compatibility. Additionally, it requires that the application closes all open file handles before capturing the checkpoint. You may refer to docs/fd-policies.md at master · CRaC/docs for more details. CRaC also demands that the environment at the time of checkpoint creation closely matches the environment during restore.
We will continue to closely monitor these limitations and work alongside the community to improve its broader applicability.



We would also love to hear your thoughts on this technology. Your feedback will help us improve how Java runs on Azure. Feel free to share your thoughts in the comments section at the end of this article.

Continue reading...
 
Back
Top