Deploying availability groups for SQL Server with HPE Serviceguard for Linux - The Ansible Way!!

  • Thread starter Thread starter amvin87
  • Start date Start date
A

amvin87

It's time for a new blog on how Ansible can simplify SQL Server deployment, configuration, and availability. If you've read my previous blogs on Ansible for SQL Server installation and configuration, and the pacemaker-based Always On availability group, you know how powerful Ansible can be. Now, let's talk about HPE Serviceguard for Linux (SGLX), a high availability/disaster recovery solution that provides business continuity for critical applications like SQL Server.



Deploying SQL Server Always On availability groups on HPE SGLX is a fully supported solution for production workloads.



Today, let's look at how you can configure Always On availability group based on HPE SGLX via Ansible. We have collaborated with our friends in HPE to enable the Ansible based deployment for HPE SGLX with SQL Server. This feature is now available for HPE SGLX 15.10. For this demonstration, you can download the evaluation bits from the 'My HPE Software Center'. The Ansible bits with the scripts are available on GitHub:



Let’s look at what’s required to get this going.



Pre-requisites:

  1. Three virtual machines (VMs) running SQL Server. I am using Azure VMs running Red Hat Enterprise Linux 8 (RHEL 8)
    and installed SQL Server. Here are the VMs that I will use for this demo.
    • Rhel1test: This VM is also my controller node for Ansible and will be assigned as primary replica for the SQL Server Always On availability groups.
    • Rhel2test: This VM is the secondary replica and configured as synchronous replica part of the Always On availability groups.
    • Rhelqs: This VM is the quorum server and will act as the quorum server, joining the Always On availability groups as a witness.
  • Ensure that you have a OpenJDK Java runtime and python3 installed on all the VMs.
    1. You can install the openjdk using a command:
      sudo yum install java-1.8.0-openjdk
    2. You can install Python3 using the command:
      sudo yum install python39
      cd /usr/bin/python3
      sudo ln -sf /usr/bin/python3 /usr/bin/python
  • On the controller node, go ahead install Ansible. Please ensure that you install the ansible 2.8 and later versions. This needs to be done on the controller node only, in my case rhel1test VM is the controller node.
  • Also, on the controller node, please download the Ansible bits with the scripts available from GitHub:

That is all that is needed. Now, you can go ahead and install SQL Server and configure AG as described below:



Installing SQL Server and configuring Always On availability group:


After creating the Azure VMs, ensure that you configure the logical volume mounts before installing SQL Server. You can do this via Cockpit or the command line, as described below. SQL Server data files must be deployed on logical volume mounts (LVMs) to work with HPE SGLX.



I used the cockpit available with Red Hat Enterprise Linux 8 to configure the LVMs on a drive and mounted as “/var/opt/mssql” as shown below:

large?v=v2&px=999.png

large?v=v2&px=999.png

Do note, that when you mount the LVMs using this technique, please ensure you edit the fstab entry and comment out the entry for this LVM mount, as this is taken care by HPE SGLX. That is why, you see the warning in the Cockpit, informing you that after a reboot the directory /var/opt/mssql will not get automounted.



Here is how the entry in fstab should look like on all the three nodes, notice the line commented out for /var/opt/mssql.



[amvin@rhel1test ansible-sglx]$ cat /etc/fstab
#
# /etc/fstab
# Created by anaconda on Mon May 29 14:37:54 2023
#
# Accessible filesystems, by reference, are maintained under '/dev/disk/'.
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info.
#
# After editing this file, run 'systemctl daemon-reload' to update systemd
# units generated from this file.
#
/dev/mapper/rootvg-rootlv / xfs defaults 0 0
UUID=1e30dc3b-c8a7-4089-b325-70f5a357cc6d /boot xfs defaults 0 0
UUID=8BEA-2ABD /boot/efi vfat defaults,uid=0,gid=0,umask=077,shortname=winnt 0 2
/dev/mapper/rootvg-homelv /home xfs defaults 0 0
/dev/mapper/rootvg-tmplv /tmp xfs defaults 0 0
/dev/mapper/rootvg-usrlv /usr xfs defaults 0 0
/dev/mapper/rootvg-varlv /var xfs defaults 0 0
#UUID=4a4bc6e7-7fbc-42b7-80c5-cfa4d6ece557 /var/opt/mssql auto x-parent=yXuRwb-ecd0-YBG2-FxLl-VMyC-fQdC-lm4UHX 0 0
/dev/disk/cloud/azure_resource-part1 /mnt auto defaults,nofail,x-systemd.requires=cloud-init.service,_netdev,comment=cloudconfig 0 2










After you create the required LVMs, go ahead and restart, confirm that the automount does not happen. You can then manually mount using the following commands, you need to run these commands on all the nodes.

  • Confirm the mount /var/opt/mssql is not visible when you run the command: df -h
  • Activate the Volume Group sqlvg by running the command: vgchange -ay sqlvg
  • Then go ahead and mount using the command: mount /dev/mapper/sqlvg-sqllv /var/opt/mssql

Now the mount point should be seen using the command:



[root@rhel1test ansible-sglx]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs 7.8G 0 7.8G 0% /dev/shm
tmpfs 7.8G 41M 7.7G 1% /run
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
/dev/mapper/rootvg-rootlv 2.0G 1.7G 324M 85% /
/dev/mapper/rootvg-usrlv 10G 2.7G 7.4G 27% /usr
/dev/mapper/rootvg-varlv 8.0G 907M 7.2G 12% /var
/dev/mapper/rootvg-homelv 1014M 356M 659M 36% /home
/dev/mapper/rootvg-tmplv 2.0G 47M 2.0G 3% /tmp
/dev/sda1 496M 162M 335M 33% /boot
/dev/sda15 495M 5.8M 489M 2% /boot/efi
/dev/sdb1 32G 49M 30G 1% /mnt
tmpfs 1.6G 0 1.6G 0% /run/user/1000
/dev/mapper/sqlvg-sqllv 40G 444M 40G 2% /var/opt/mssql







Configure the Storage Layout via Command Line:


The below steps are required only if you have not configured the storage via cockpit and would like to configure the storage via Command Line, you need to run these commands on all the nodes:



  • Create an LVM Volume Group sqlvg, with a File System Mount Point at /var/opt/mssql to install SQL Server on linux.
  • Create a Physical Volume Based on the Block Device that you want to use : pvcreate -f /dev/mapper/disk
  • Validate the Physical Volume created by running the command: pvs
  • Create a Volume Group based on the Physical volume that we just created : vgcreate /dev/mapper/sqlvg /dev/mapper/disk
  • Validate the Volume group created by running the command: vgs
  • Activate the Volume Group sqlvg by running the command: vgchange -ay sqlvg
  • Create a Logical Volume on the sqlvg by running the command: lvcreate -n sqllv -l 100%FREE sqlvg
  • Create a File System on top of the sqllv by running the command: mkfs -t xfs /dev/ mapper/sqlvg-sqllv
  • Then go ahead and mount using the command: mount /dev/mapper/sqlvg-sqllv /var/opt/mssql
  • Now the mount point should be seen using the command:
    [root@rhel1test ansible-sglx]# df -h
    Filesystem Size Used Avail Use% Mounted on
    devtmpfs 7.8G 0 7.8G 0% /dev
    tmpfs 7.8G 0 7.8G 0% /dev/shm
    tmpfs 7.8G 41M 7.7G 1% /run
    tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
    /dev/mapper/rootvg-rootlv 2.0G 1.7G 324M 85% /
    /dev/mapper/rootvg-usrlv 10G 2.7G 7.4G 27% /usr
    /dev/mapper/rootvg-varlv 8.0G 907M 7.2G 12% /var
    /dev/mapper/rootvg-homelv 1014M 356M 659M 36% /home
    /dev/mapper/rootvg-tmplv 2.0G 47M 2.0G 3% /tmp
    /dev/sda1 496M 162M 335M 33% /boot
    /dev/sda15 495M 5.8M 489M 2% /boot/efi
    /dev/sdb1 32G 49M 30G 1% /mnt
    tmpfs 1.6G 0 1.6G 0% /run/user/1000
    /dev/mapper/sqlvg-sqllv 40G 444M 40G 2% /var/opt/mssql

Once, you have the mount points configured install SQL Server and then configure Always on availability group as documented here.



Install the HPE quorum server on the rhelqs node:


Since rhelqs is my quorum VM, I am going ahead and copying the evaluation bits of the HPE SGLX that is the 'DVD_HPE_SGLX_V15_SGLX_151000.iso' on this server that is available here.

  1. First mount the ISO using the command:
    sudo mount -o loop DVD_HPE_SGLX_V15_SGLX_151000.iso /tmp/qs
    Where the /tmp/qs is the path that you need to pre-create.
  2. Switch the directory '/tmp/qs/RedHat/RedHat8/QuorumServer/x86_64' this is the location of the quorum server: serviceguard-qs-A.15.10.00-0.el8.x86_64.rpm. Install the quorum using the command :
    sudo yum install ./serviceguard-qs-A.15.10.00-0.el8.x86_64.rpm
    Once you install the quorum server, it’s time to setup the authfile that is located at : '/usr/local/qs/conf/' and named as : qs_authfile
  3. Add a single entry that looks like this, you can choose your choice of text editor to add the below line. Note the space between the two plus signs.
    + +
  4. Once edited the file should like as shown below
    [amvin@rhelqs x86_64]$ cat /usr/local/qs/conf/qs_authfile
    + +
  5. Start the quorum service using the command and open the 1238 port:
    # start the quroum service.
    sudo systemctl start qs
    # Also, on the quroum server open the port 1238 using the command:
    sudo firewall-cmd --zone=public --add-port=1238/tcp --permanent
    sudo firewall-cmd --reload


Deploy Always On availability group (AG) and HPE SGLX using ansible:


  • Configure the host file that was copied with the Ansible scripts that you downloaded on the controller node. In this demo, rhel1test is my controller node and I downloaded the ansible scripts from GitHub at this location : '/home/amvin/serviceguard/ansible-sglx/'

Go ahead and edit and update the host file as shown below, I am adding the rhel1test and rhel2test as the nodes of the cluster.



[amvin@rhelvm1 ansible]$ cat hosts
[sap-app-hosts]

[sap-hana-hosts]
#sgxhost2
#sgxhost3

[oracle-si-hosts]

[oracle-dg-hosts]

[mssql-aofi-hosts]

[mssql-aoai-hosts]
rhel1test
rhel2test

[custom-app-hosts]

[sglx-storage-flex-add-on-hosts:children]
sap-app-hosts
sap-hana-hosts
oracle-si-hosts
mssql-aofi-hosts
custom-app-hosts

[sglx-cluster-hosts:children]
sap-app-hosts
sap-hana-hosts
oracle-si-hosts
oracle-dg-hosts
mssql-aofi-hosts
mssql-aoai-hosts
custom-app-hosts










  • Update the 'all.yml' as shown below, notice I am updating the following variables:
    1. Pointing the iso_location to the location where the DVD_HPE_SGLX_V15_SGLX_151000.iso was downloaded on the controller node, in my case that was at : /home/amvin/, so I set the iso_location to : /home/amvin/DVD_HPE_SGLX_V15_SGLX_151000.iso
    2. Set the sglx_addon to mssql.
    3. Setting the sglx_quorum_server to 'rhelqs
    4. Updating the sglx_sgmgr_password: "P@ssW0rd1!" I am using plain text, but you can use the vault option if needed.
      cat /opt/hpe/ansible/group_vars/all.yml
      #####################################################################
      # HPE Serviceguard for Linux 15.1099.0001 (Under development version)
      #####################################################################

      # vars file used by all
      #vault_sglx_sgmgr_password: !vault |
      # $ANSIBLE_VAULT;1.1;AES256
      # 34363834323266326237363636613833396665333061653138623431626261343064373363656165
      # 6639383863383633643035656336336639373161323663380a303331306337396435366535313663
      # 31336636333862303462346234336138393135393363323739633661653534306162323565646561
      # 6662396366333534350a663033303862646331613765306433353632316435306630343761623237
      # 3863
      vault_esxi_password: !vault |
      $ANSIBLE_VAULT;1.1;AES256
      34633130346365643739373631333462653235653039376537636236353337326231613339643332
      3732616333646430313264383465626539623263303131630a373335353837666335366333643532
      63643166363833383662613665616235313234633331353161393235393237613634303734316538
      3865613764663631380a316531663231303237316532316334393761363863343436626365636638
      3961



      sglx_version: 15.00.00

      # values can be ('iso', 'repo')
      # iso - iso based installation
      # repo - HPE SDR based installation

      sglx_inst_upg_mode: iso

      sglx_inst_upg_additional_params:
      # not considered if mode is iso. To be filled only when mode is repo
      repo_params:
      repo_server: SDR_PATH
      repo_username: USERNAME
      repo_usertoken: USER_TOKEN
      iso_params:
      iso_location: /home/amvin/DVD_HPE_SGLX_V15_SGLX_151000.iso

      sglx_add_on_inst_upg_params:
      # Workload is defined as parameters
      # sap - SAP specific integration
      # oracle - Oracle specific integration
      # mssql - Microsoft SQL Server specific integration
      # DB2 - DB2 specific integration
      # nfs - NFS specific integration
      # custom - No workload integrations will be installed
      # kvm - KVM specific integration.
      # storage-plugins - storage replication supported integrations will be installed

      sglx_addon: mssql

      # Storage plugin install Array specific integrations supported by Serviceguard.
      sglx_storage_plugin: yes

      storage_flex_add_on:
      # value can be ('yes', 'no')
      install_upgrade: no
      # values can be ('iso', 'repo')
      install_upg_mode: iso
      iso_params:
      iso_location: nfs://palio2.in.rdlabs.hpecorp.net/root/SGLX_DRBD_12_80_00
      repo_params:
      repo_server: SDR_PATH
      repo_username: USERNAME
      repo_usertoken: USER_TOKEN
      sglx_add_on_rpms: storage_flex_add_on

      #values can be ('locklun' ,'qs')
      sglx_arbitation_method: qs

      # shortname of the hostname to be used as arbitration
      sglx_quorum_server: rhelqs

      # value can be ('yes', 'no')
      sglx_smart_quorum: no


      #sglx_sgmgr_password: "{{ vault_sglx_sgmgr_password }}"
      sglx_sgmgr_password: "P@ssW0rd1!"

      # values can ('vcenter', 'esxi')
      sglx_vm_mgmt_mode: vcenter
      sglx_vm_mgmt_options:
      vcenter_password: "12iso*help"
      esxi_options:
      - esxi_host: ESXi_host1
      esxi_password: "{{ vault_esxi_password }}"
      # hostname of the vms that are controlled by ESXi
      esxi_vm_nodes: []
      - esxi_host: ESXi_host2
      esxi_password: "{{ vault_esxi_password }}"
      # hostname of the vms that are controlled by ESXi
      esxi_vm_nodes: []

      #http(s)_proxy required for YUM
      proxy_env:
      https_proxy: "http://web-proxy.in.hpecorp.net:8080""
      http_proxy: "http://web-proxy.in.hpecorp.net:8080""

      domain: ""







  • Now, finally update the mssql-aoai-hosts.yml as shown below. In this script, I set the following variables, that are
    1. mssql_db_sid: "testdb" This is the name of the database or databases added to the AG.
    2. mssql_db_write_ip_subnet: "10.0.0.0" This is the subnet of the Always on listener that was previously configured
    3. mssql_db_write_ip: "10.0.0.94" This is the IP Address on which the Always on Listener is listening.
    4. mssql_ag: "ag1" This is the name of the AG that was configured previously.
    5. mssql_db_user_password: "myP@ssw0rdforsql" This is the sa user password using which SQL is connected to, please use the vault option to avoid typing the password in plain text.
      #####################################################################
      # HPE Serviceguard for Linux 15.10.00
      #####################################################################

      # Current Encoded vault password is for "VAULT-PASSWORD”

      vault_mssql_db_user_password: !vault |
      $ANSIBLE_VAULT;1.1;AES256
      36343461353366353337363736393139386366613232333466303864353035613166323234626134
      6132353730333363373236363664316431626561646161320a633635616133386635316164616130
      63386135393837343534356431613937663166663130343763336362326432316338363263326564
      3362646466353338640a393837623161626361653130633266323562393430363565376139353565
      3233

      # Will be used only when sites_required is set to yes
      sites_config:
      - name: “primary”
      nodes: [ ]
      - name: “secondary”
      nodes: [ ]

      mssql_db_sid: “testdb”
      mssql_db_write_ip_subnet: “10.0.0.0”
      mssql_db_write_ip: “10.0.0.94”
      mssql_ag: “ag1”

      #mssql_db_user_password: “{{ vault_mssql_db_user_password }}”
      mssql_db_user_password: “myP@ssw0rdforsql”
      #
      #values can be ‘yes’ or ‘no’
      mssql_sites_required: no





You are now ready to deploy the Serviceguard and create resources for the SQL Server Ag, the command you got to run is :



ansible-playbook -l mssql-aoai-hosts -iI hosts site.yml -vvv







This will install and configure HPE SGLX and will also deploy the SQL Server packages in to the context of Serviceguard and it takes about 5-8 minutes for the command to finish, after which you should have a SQL Server AG that is configured for High Availability, fully automatic failover capable via HPE Serviceguard for Linux.

Switch to the superuser mode and you can view the cluster as shown below:



[root@rhel2test amvin]# cmviewcl
CLUSTER STATUS
rhel1test_cluster up
NODE STATUS STATE
rhel1test up running
PACKAGE STATUS STATE AUTO_RUN NODE
AOAI1_ag1_WRITE_PKG1 up running enabled rhel1test
NODE STATUS STATE
rhel2test up running
MULTI_NODE_PACKAGES
PACKAGE STATUS STATE AUTO_RUN SYSTEM
Microsoft_SQL_Server up running enabled no
AOAI1_ag1_PKG1 up running enabled no







Test automatic failover:

  1. View the cluster using the below command, notice that rhel1test is the primary, as part of our failover test, we will stop the cluster service on rhel1test. Notice, the quorum server is not shown as part of the cluster. And one quorum node can server multiple clusters.
    [root@rhel1test ansible-sglx]# cmviewcl
    CLUSTER STATUS
    rhel1test_cluster up

    NODE STATUS STATE
    rhel1test up running

    PACKAGE STATUS STATE AUTO_RUN NODE
    AOAI1_ag1_WRITE_PKG1 up running enabled rhel1test

    NODE STATUS STATE
    rhel2test up running

    MULTI_NODE_PACKAGES
    PACKAGE STATUS STATE AUTO_RUN SYSTEM
    Microsoft_SQL_Server up running enabled no
    AOAI1_ag1_PKG1 up running enabled no
  2. Ensure you are on node rhel1test current primary, and go ahead and stop the cluster service using the command cmhaltnode as shown below with -f (force)
    root@rhel1test ansible-sglx]# cmhaltnode -f
    Disabling all packages from starting on nodes to be halted.
    Warning: Do not modify or enable packages until the halt operation is completed.
    Disabling automatic failover for failover packages to be halted.
    Halting package AOAI1_ag1_WRITE_PKG1
    Successfully halted package AOAI1_ag1_WRITE_PKG1
    Halting package AOAI1_ag1_PKG1 on node rhel1test
    Successfully halted package AOAI1_ag1_PKG1 on node rhel1test
    Halting package Microsoft_SQL_Server on node rhel1test
    Successfully halted package Microsoft_SQL_Server on node rhel1test
    Waiting for nodes to halt ..... done
    Successfully halted all nodes specified.
    Halt operation complete.
  3. Cluster service is stopped and you will notice the packages are not running on all the 2 nodes and only running on rhel2test and also you will notice that the write package for ag is now on rhel2test which is the current primary, it failed over after forming quorum with the quorum server.
    [root@rhel1test ansible-sglx]# cmviewcl
    CLUSTER STATUS
    rhel1test_cluster up

    NODE STATUS STATE
    rhel1test down halted
    rhel2test up running

    PACKAGE STATUS STATE AUTO_RUN NODE
    AOAI1_ag1_WRITE_PKG1 up running enabled rhel2test

    MULTI_NODE_PACKAGES
    PACKAGE STATUS STATE AUTO_RUN SYSTEM
    Microsoft_SQL_Server up (1/2) running enabled no
    AOAI1_ag1_PKG1 up (1/2) running enabled no
  4. You will notice on the node rhel1test the sql server mount /var/opt/mssql is now unmounted, this is unmounted by HPE SGLX to ensure that it does not become primary.
    [root@rhel1test ansible-sglx]# df -h
    Filesystem Size Used Avail Use% Mounted on
    devtmpfs 7.8G 0 7.8G 0% /dev
    tmpfs 7.8G 0 7.8G 0% /dev/shm
    tmpfs 7.8G 41M 7.7G 1% /run
    tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
    /dev/mapper/rootvg-rootlv 2.0G 1.7G 324M 85% /
    /dev/mapper/rootvg-usrlv 10G 2.7G 7.4G 27% /usr
    /dev/mapper/rootvg-varlv 8.0G 913M 7.1G 12% /var
    /dev/mapper/rootvg-homelv 1014M 356M 659M 36% /home
    /dev/mapper/rootvg-tmplv 2.0G 47M 2.0G 3% /tmp
    /dev/sda1 496M 162M 335M 33% /boot
    /dev/sda15 495M 5.8M 489M 2% /boot/efi
    /dev/sdb1 32G 49M 30G 1% /mnt
    tmpfs 1.6G 0 1.6G 0% /run/user/1000
  5. Now bring the node up on rhel1test using the command cmrunnode, and once the node is up, it will also bring the SQL Server service up and you can view the cluster, it joins as the secondary and also notice that the sql mount point is auto mounted by the HPE SGLX
    [root@rhel1test ansible-sglx]# cmrunnode
    cmrunnode: Validating network configuration...
    cmrunnode: Network validation complete
    Checking for license.........
    Waiting for nodes to join .... done
    Cluster successfully formed.
    Check the syslog files on all nodes in the cluster to verify that no warnings occurred during startup.

    [root@rhel1test ansible-sglx]# cmviewcl

    CLUSTER STATUS
    rhel1test_cluster up

    NODE STATUS STATE
    rhel1test up running
    rhel2test up running

    PACKAGE STATUS STATE AUTO_RUN NODE
    AOAI1_ag1_WRITE_PKG1 up running enabled rhel2test

    MULTI_NODE_PACKAGES

    PACKAGE STATUS STATE AUTO_RUN SYSTEM
    Microsoft_SQL_Server up running enabled no
    AOAI1_ag1_PKG1 up running enabled no


    Once, the node is up the mount volume /var/opt/sql is also up and running as shown below:

    [root@rhel1test ansible-sglx]# df -h
    Filesystem Size Used Avail Use% Mounted on
    devtmpfs 7.8G 0 7.8G 0% /dev
    tmpfs 7.8G 0 7.8G 0% /dev/shm
    tmpfs 7.8G 41M 7.7G 1% /run
    tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
    /dev/mapper/rootvg-rootlv 2.0G 1.7G 324M 85% /
    /dev/mapper/rootvg-usrlv 10G 2.7G 7.4G 27% /usr
    /dev/mapper/rootvg-varlv 8.0G 917M 7.1G 12% /var
    /dev/mapper/rootvg-homelv 1014M 356M 659M 36% /home
    /dev/mapper/rootvg-tmplv 2.0G 47M 2.0G 3% /tmp
    /dev/sda1 496M 162M 335M 33% /boot
    /dev/sda15 495M 5.8M 489M 2% /boot/efi
    /dev/sdb1 32G 49M 30G 1% /mnt
    tmpfs 1.6G 0 1.6G 0% /run/user/1000
    /dev/mapper/sqlvg-sqllv 40G 445M 40G 2% /var/opt/mssql

    That’s it! Hope you give this a try and let us know if you have any feedback for us or you’d like to see improvements.

Continue reading...
 
Back
Top