Guest amvin87 Posted January 17, 2023 Posted January 17, 2023 One of the new features introduced with SQL Server 2022 is the ability to connect to any S3-compatible object storage and SQL Server supports both Backup/Restore and data lake virtualization with Polybase integration. In this blog, we will demonstrate both of these features for SQL Server 2022 Containers running on Kubernetes. As usual, I will use the Azure Kubernetes Service as my Kubernetes environment :) . To know more about these features, refer SQL Server backup to URL for S3-compatible object storage - SQL Server | Microsoft Learn and Access external data: S3-compatible object storage - PolyBase - SQL Server | Microsoft Learn. Before we enable Polybase for SQL Server containers, let's go over the prerequisites for this demo: Pre-requisites: You need to have an S3 – Object storage that you can connect to and in this blog, I have configured a MinIO based S3-object storage server on Ubuntu with TLS configured. For details on how to configure this refer: Deploy MinIO: Single-Node Single-Drive — MinIO Object Storage for Linux and Network Encryption (TLS) — MinIO Object Storage for Linux. MinIO setting details: Minio Console is configured at : https://10.0.0.82:9001/ and the Server endpoint is configured at : https://10.0.0.82:9000/ On the MinIO server, I have enabled ports 9000 and 9001 in the firewall. The public certificate that I used to configure TLS on the MinIO server is called "public.crt," and I have it downloaded and saved on my Windows client. [*]Customized SQL Server Container: Let's create the customized SQL Server container image, that'll be used to deploy the SQL Server containers on the Kubernetes cluster. Here are the steps for creating the customised SQL Server containers; for this demonstration, I created the custom container image for this demo on the same Ubuntu-based Azure VM that I used to configure the MinIO Server. ## Step 1: Create a DockerFile with the below content # Base OS layer: Latest Ubuntu LTS FROM mcr.microsoft.com/mssql/server:2022-latest USER root # Install prerequistes since it is needed to get repo config for SQL server RUN apt-get update && \ apt-get install -y software-properties-common && \ rm -rf /var/lib/apt/lists/* RUN add-apt-repository "$(wget -qO- https://packages.microsoft.com/config/ubuntu/20.04/mssql-server-2022.list)" && \ apt-get install -y mssql-server-polybase EXPOSE 1433 USER mssql # Run SQL Server process CMD ["/opt/mssql/bin/sqlservr"] # Step 2: Let's build the container image sudo docker build -t mssql-pb-update . # Step 3: Tag the image and push to your container registry, in this case I am using Azure Container Registry(ACR)that I have. sudo docker tag sql-pb-update amvinacr.azurecr.io/sql-pb-update:latest sudo docker tag mssql-pb-update amvinacr.azurecr.io/sql-pb-update:latest # To know more about the ACR refer: Tutorial - Quick container image build - Azure Container Registry [*]Azure Kubernetes Service based cluster creation and attaching it to the Azure Container Registry(ACR) In this blog, I am deploying SQL Server containers on Azure Kubernetes Service(AKS) based Kubernetes cluster called "sqlk8s", I have integrated my AKS cluster with the above Azure Container Registry(ACR) to which I pushed my images in the above sample code. For detailed steps on AKS cluster creation and attaching it to your choice of container registry in Azure please refer the below two articles: Quickstart: Deploy an AKS cluster by using Azure CLI - Azure Kubernetes Service | Microsoft Learn Integrate Azure Container Registry with Azure Kubernetes Service - Azure Kubernetes Service | Microsoft Learn Let's get started With the pre-requisties done, now we can start deploying SQL Server containers on our AKS cluster using the custom image that we created, that installs polybase. But, before we start deploying SQL Server containers, we need two configmaps and one secret object in the AKS cluster, with the details as shown below, I am running all these commands from my Windows client which has kubectl installed and is able to connect to the AKS cluster 'sqlk8s' that I created in previous steps. As the endpoint will be validated by the certificate, we will use one configmap object to load the public.crt certificate required for configuring the secure channel between SQL Server and the MinIO Storage. You can use the command below to generate the configmap from the public.crt file. I have copied the certificate public.crt to the following location on my Windows machine: "C:\pb\demo\certs\public.crt" # Use the below command to create the configmap using the file "public.crt" kubectl create configmap mssql-tls --from-file="C:\pbdemo\certs\public.crt" # You can verify that the, the configmap is created using the command below and the sample output is shown below with obfuscated lines in the output. C:\>kubectl describe configmap mssql-tls Name: mssql-tls Namespace: default Labels: <none> Annotations: <none> Data ==== public.crt: ---- -----BEGIN CERTIFICATE----- MIIDjjCCAnagAwIBAgIUf/RP+g+9uFAKD4TB0R2ot6jZMJAwDQYJKoZIhvcNAQEL BQAwZjELMAkGA1UEBhMCVVMxCzAJBgNVBAgMAlZBMRIwEAYDVQQHDAlTb21ld2hl cmUxDjAMBgNVBAoMBU15T3JnMQ0wCwYDVQQLDARNeU9VMRcwFQYDVQQDDA5taW5p b3NlcnZlcnVidTAeFw0yMzAxMDYxNTA0MjhaFw0zMzAxMDMxNTA0MjhaMGYxCzAJ BgNVBAYTAlVTMQswCQYDVQQIDAJWQTESMBAGA1UEBwwJU29tZXdoZXJlMQ4wDAYD VQQKDAVNeU9yZzENMAsGA1UECwwETXlPVTEXMBUGA1UEAwwObWluaW9zZXJ2ZXJ1 YnUwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQCcC0irpVFJQaVqiuKP LuRQUdYxehTvFCuCJLpGhS03XczCWKzemd0a3W/UC01N+tbFzpmCi0/l+R5SFsVC 3tRPKlngtDOICnYPpKSaWz03o3BgwNqtuKtucboXp3Qo+5DSd7XDi1qY+plY6LwC vbjLtjNBfyWlwS6nhTjDebvUY6w0xNmG8w1H9fPZcq/gsSGUnPyvywEENYmk5R6X xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 1J/rLOJUVeKRvpbLraN25VoEoS8hz07DT3gmtVyJgpEK1oJ+mEs94ae76ujr5MGm M4HgV4+o9wxqjWdPyFDny/1EPsv30VqbcwmbkuPE7UHG4JH/5gxqxnXJcPnr3BQy 4GU= -----END CERTIFICATE----- The second configmap object is used to load mssql.conf, which contains the polybase Trace flag 13702, which is required for SQL Server on Linux. This configmap is generated using the yaml script shown below: # Create a yaml file called: mssql_config.yaml and save it to the location "C:\pbdemo\certs\mssql_config.yaml" kind: ConfigMap apiVersion: v1 metadata: name: mssql data: mssql.conf: | [EULA] accepteula = Y accepteulaml = Y [coredump] captureminiandfull = true coredumptype = full [language] lcid = 1033 [filelocation] defaultdatadir = /var/opt/mssql/userdata defaultlogdir = /var/opt/mssql/userlog [traceflag] traceflag0 = 13702 # We are ready to create the configmap using the command as shown below: Kubectl apply -f “C:\pbdemo\certs\ mssql_config.yaml” # You can inspect the configmap using the command shown below with the sample output as well C:\>kubectl describe configmap mssql Name: mssql Namespace: default Labels: <none> Annotations: <none> Data ==== mssql.conf: ---- [EULA] accepteula = Y accepteulaml = Y [coredump] captureminiandfull = true coredumptype = full [language] lcid = 1033 [filelocation] defaultdatadir = /var/opt/mssql/userdata defaultlogdir = /var/opt/mssql/userlog [traceflag] traceflag0 = 13702 Events: <none> The secret object in the cluster is used to save the sa password for SQL Server, the command to create the secret is as shown below kubectl create secret generic mssql --from-literal=MSSQL_SA_PASSWORD="MyC0m9l&xP@ssw0rd" Deploy SQL Server Containers We are now ready to deploy SQL Server containers, below is the deployment yaml called "sqldeploy.yaml" that is used to deploy one SQL Server container as a statefulset deployment using the Azure disk as the storage class. Using this YAML we deploy: One SQL Server instance, with separate PV(persistent volumes) and Persistent volume claims (PVCs) for User database, user log and tempdb files. We are loading the public.crt certificate to the folder: /var/opt/mssql/security/ca-certificates folder inside the container. We are loading the mssql.conf file with various parameter and the required trace flag 13702. At the end, we create the external load balancer Service to be able to connect to the SQL Server. # Create the sqldeploy.yaml at the location "C:\pbdemo\certs\sqldeploy.yaml", and paste the content below, you can modify it according to your requirements. kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: azure-disk provisioner: kubernetes.io/azure-disk parameters: storageaccounttype: Standard_LRS kind: Managed --- apiVersion: apps/v1 kind: StatefulSet metadata: name: mssql labels: app: mssql spec: serviceName: "mssql" replicas: 1 selector: matchLabels: app: mssql template: metadata: labels: app: mssql spec: securityContext: fsGroup: 10001 containers: - name: mssql command: - /bin/bash - -c - cp /var/opt/config/mssql.conf /var/opt/mssql/mssql.conf && /opt/mssql/bin/sqlservr image: amvinacr.azurecr.io/sql-pb-update:latest ports: - containerPort: 1433 env: - name: ACCEPT_EULA value: "Y" - name: MSSQL_ENABLE_HADR value: "1" - name: MSSQL_SA_PASSWORD valueFrom: secretKeyRef: name: mssql key: MSSQL_SA_PASSWORD volumeMounts: - name: mssql mountPath: "/var/opt/mssql" - name: userdata mountPath: "/var/opt/mssql/userdata" - name: userlog mountPath: "/var/opt/mssql/userlog" - name: tempdb mountPath: "/var/opt/mssql/tempdb" - name: mssql-config-volume mountPath: "/var/opt/config" - name: mssql-tls-configmap mountPath: "/var/opt/mssql/security/ca-certificates/" volumes: - name: mssql-config-volume configMap: name: mssql - name: mssql-tls-configmap configMap: name: mssql-tls volumeClaimTemplates: - metadata: name: mssql spec: accessModes: - ReadWriteOnce resources: requests: storage: 8Gi - metadata: name: userdata spec: accessModes: - ReadWriteOnce resources: requests: storage: 8Gi - metadata: name: userlog spec: accessModes: - ReadWriteOnce resources: requests: storage: 8Gi - metadata: name: tempdb spec: accessModes: - ReadWriteOnce resources: requests: storage: 8Gi --- apiVersion: v1 kind: Service metadata: name: mssql-0 spec: type: LoadBalancer selector: statefulset.kubernetes.io/pod-name: mssql-0 ports: - protocol: TCP port: 1433 targetPort: 1433 name: tcpsql # You can run the below command to deploy SQL Server conatiner based on the deployment yaml file called sqldeploy.yaml Kubectl apply -f “C:\pbdemo\certs\sqldeploy.yaml” # You can run the kubectl get command to see all the SQL Server container deployed, sample command and output shown below for reference C:\>kubectl get all NAME READY STATUS RESTARTS AGE pod/mssql-0 1/1 Running 0 92m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/kubernetes ClusterIP 10.0.0.1 <none> 443/TCP 16h service/mssql-0 LoadBalancer 10.0.252.15 20.235.55.7 1433:31551/TCP 13s NAME READY AGE statefulset.apps/mssql 1/1 97m Connect to SQL Server - Demo time We are ready to connect to the SQL Server, feel free to use your favorite SQL Server tool .viz SQL Server Management Studio(SSMS) or Azure Data Studio (ADS) and connect to the SQL Server and run the below commands to virtualize data and access CSV or parquet files and/or to take backups of the databases on the S3 based object storage. I'm using the SSMS tool and running the below queries. I connect to the SQL Server and confirm that that TLS certificate public.crt is loaded by SQL Server, I can do that from the errorlog using the command : sp_readerrorlog from the T-SQL edition and the sample output is shown below with the certificate loaded: Backup/Restore to S3-based object storage: To take S3 based backups, I need to first create bucket on the storage in this case, I connect to the MinIO portal and create the bucket called "sqltest" as shown below and then create a credential to access the storage. Now, use the below sample T-SQL script to create the credential and running the backup, the backup should be successful as shown below CREATE CREDENTIAL [s3://10.0.0.82:9000/sqltest] WITH IDENTITY = 'S3 Access Key', SECRET = 'amit:mypass12345'; BACKUP DATABASE S3_backup_demo TO URL = 's3://10.0.0.82:9000/sqltest/S3_backup_demo_db.bak' WITH STATS = 10 You can view the backup file created on the MinIO console as shown below: Data Virtualization- Polybase Here are the steps to configure data virtualization, below T-SQL sample script is used to configure the following steps: We verify the polybase feature is installed, then configure and enable polybase feature for SQL Server. Create the database scoped credential to access the object storage Now, we create the external data source and link it with the credential that we created in the preceding step. We are now ready to query the data from the external data source -- Verify the Polybase feature is installed: SELECT SERVERPROPERTY ('IsPolyBaseInstalled') AS IsPolyBaseInstalled; -- Enable polybase feature using the commmands below: exec sp_configure @configname = 'polybase enabled', @configvalue = 1; reconfigure with override exec sp_configure @configname = 'polybase enabled' --Next, let’s create a database and database scoped credential to access the object storage. create database pb_demo USE pb_demo CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'mypass123@'; CREATE DATABASE SCOPED CREDENTIAL s3_dc WITH IDENTITY = 'S3 Access Key', SECRET = 'amit:mypass12345' ; -- To verify, the credential is created you can run the below command: SELECT * FROM sys.database_scoped_credentials; -- Now go ahead and try creating the External data source: CREATE EXTERNAL DATA SOURCE s3_ds WITH (LOCATION = 's3://10.0.0.82:9000/',CREDENTIAL = s3_dc) -- When creating the external data source, if you see the below error, then please restart the container and run the same command again that should solve the below error Msg 46530, Level 16, State 11, Line 20 External data sources are not supported with type GENERIC. --We now have the credential and the external data source created to query files on the object storage to query a CSV based file that is already present on the object storage you can run the command as shown below SELECT * FROM OPENROWSET( BULK '/sqltest/test.csv', FORMAT = 'CSV', DATA_SOURCE = 's3_ds') WITH ( c1 varchar(50) ) AS [Test1] -- To query a parquet file that is available on the object storage you can run the below query, I copied a sample parquet file that is available here: kylo/samples/sample-data/parquet at master · Teradata/kylo to my Minio object storage. SELECT * FROM OPENROWSET ( BULK '/sqltest/userdata1.parquet' , FORMAT = 'PARQUET' , DATA_SOURCE = 's3_ds' ) AS [Test12] Note: You can also use Create External Table as (CETAS) to access both parquet and CSV files for SQL Server on Linux. I hope you find this helpful to get started with polybase and S3 storage integration with SQL Server 2022 containers. Special thanks to my colleagues from the Polybase team for helping me with this demo. Happy learning! Continue reading... Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.