I
Infant_Arockiaraj
Based on our extensive experience with customers using SQL on Linux, we have compiled a guide outlining fundamental troubleshooting steps and available tools to assist in resolving SQL on Linux issues. This guide aims to make it easier for SQL DBAs who have primarily worked on Windows operating systems over the years.
In this article, we will normalize three checks in Linux that we typically perform in Windows to start troubleshooting issues with SQL Server:
Note: The screenshots in the below examples are taken from RHEL and Ubuntu machines, and while the Linux flavors are different, the commands are similar in both environments.
System logs:
It contains the log messages that the system processes and applications, and these messages are written as plain text log files.
The rsyslog service keeps various log files in the
In RHEL and SLES, they are called messages, while in Ubuntu, you would find them as syslog.
RHEL and SLES: The primary system log file is located at
Ubuntu: The main system log file is
Log files and journals are crucial to a system administrator's work. They reveal a great deal of information about a system and are instrumental during troubleshooting and auditing.
For example, in RHEL, to display boot and other kernel messages, view
Use
Check the
Similarly in Ubuntu, we have
What does the syslog or messages file contain? how do we read them?
These log files contain events and messages generated by the kernel, applications, and users that log into the system.
The logs are written in the below format.
Since the message/syslog could contain thousands/millions of lines as log entries, using 'cat' command alone might not be the right choice at times. I suggest using grep for filtering text through the files.
Using tail -f command allows you to read the current log file in real time. You may combine it with 'grep' to filter on desired text.
An alternate method to validate the system events is via the journald
Systemd-journald :
journald is a component of systemd responsible for handling logging. It captures logs, records them, and makes them easy to find. Unlike traditional syslog implementations, journald offers features like structured logging, indexing for fast search, access control, and signed messages.
The systemd-journald service does not keep separate files, as rsyslog does. The idea is to avoid checking different files for issues. Systemd-journald saves the events and messages in a binary format that cannot be read with a text editor. You can query the journal with the
journald stores logs in memory (RAM) without persistent storage (by default), while the Traditional syslog (e.g., /var/log/messages) persists log data to flat files.
To show all event messages, use:
To view journal entries for today, use:
To check for messages related to the sql server service for the past hour, you can run:
(Keep in mind that you will require superuser permissions to traverse these folders and logs.)
This is also the default location for XEvents, dump files and trace files.
Consider the following scenario: We receive an alert indicating that the application is unable to connect to SQL following a maintenance activity over the weekend. We log into the server to verify if SQL Server is operational.
Upon inspection, we find that the SQL process is not running. We then check the status and attempt to start the SQL Server to see if it comes online.
On restarting SQL Server, we observe the following information in the error logs. Next, we need to examine the system event logs, specifically the
By validating the
From the logs, we identify a permission issue on a specific folder or file that is preventing SQL Server from starting. Granting the necessary permissions should resolve the issue.
Where is the Task Manager in Linux? There are command-line utilities that provide similar information to what we see in Windows. We will explore the
The first line of numbers on the dashboard includes the time, how long your computer has been running, the number of people logged in, and what the load average has been for the past one, five, and 15 minutes. The second line shows the number of tasks and their states: running, stopped, sleeping, or zombie.
The next 3 lines describe CPU, Memory utilization, Swap Memory of the server. This article describes the output in detail.
The column details of the process are tabulated as below.
To get into the details of a particular process, (In our case SQL Server) we can use the PID to get the further details of the SQL and the tasks that SQL is currently running.
To get the child PID of SQLServer, use this short command and run the top command on the pid.
We can also get them into single command as shown below,
We see the utilization of various tasks within SQL Server, that are consuming the resources.
Another utility that provides similar information is
While
References:
SLES: https://documentation.suse.com/sles/12-SP5/html/SLES-all/cha-util.html#
RHEL:https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/performance_tuning_guide/sect-red_hat_enterprise_linux-performance_tuning_guide-performance_monitoring_tools-built_in_command_line_tools
Ubuntu: https://manpages.ubuntu.com/manpages/jammy/man1/top.1.html#2.%20summary%20display
I hope this serves as a helpful introduction to basic troubleshooting for SQL on Linux.
Continue reading...
In this article, we will normalize three checks in Linux that we typically perform in Windows to start troubleshooting issues with SQL Server:
- System logs
- SQL Server logs
- Task Manager
Note: The screenshots in the below examples are taken from RHEL and Ubuntu machines, and while the Linux flavors are different, the commands are similar in both environments.
System logs:
It contains the log messages that the system processes and applications, and these messages are written as plain text log files.
The rsyslog service keeps various log files in the
/var/log
directory. You can open these files using native commands such as tail
,head
,more
,less
,cat
, and so forth, depending on what you are looking for.In RHEL and SLES, they are called messages, while in Ubuntu, you would find them as syslog.
RHEL and SLES: The primary system log file is located at
/var/log/messages
. Ubuntu: The main system log file is
/var/log/syslog
.Log files and journals are crucial to a system administrator's work. They reveal a great deal of information about a system and are instrumental during troubleshooting and auditing.
For example, in RHEL, to display boot and other kernel messages, view
/var/log/messages
:[server]$ cat /var/log/messages
Use
grep
and other filtering tools to gather more specific events from a file. You can also use tail
to view files as they are updated.[server]$ tail -f /var/log/messages
Check the
/var/log/secure
file to view users and their activities:[server]$ tail -f /var/log/secure
Similarly in Ubuntu, we have
/var/log/syslog
What does the syslog or messages file contain? how do we read them?
These log files contain events and messages generated by the kernel, applications, and users that log into the system.
The logs are written in the below format.
The timestamp indicates the time when a log entry was created in the format MMM dd HH:mm:ss. Notice that this format does not include a year.
Hostname is the host or system that originally create the message.
Application is the application that created the message.
Message contains the actual details of an event.
Since the message/syslog could contain thousands/millions of lines as log entries, using 'cat' command alone might not be the right choice at times. I suggest using grep for filtering text through the files.
Using tail -f command allows you to read the current log file in real time. You may combine it with 'grep' to filter on desired text.
An alternate method to validate the system events is via the journald
Systemd-journald :
journald is a component of systemd responsible for handling logging. It captures logs, records them, and makes them easy to find. Unlike traditional syslog implementations, journald offers features like structured logging, indexing for fast search, access control, and signed messages.
The systemd-journald service does not keep separate files, as rsyslog does. The idea is to avoid checking different files for issues. Systemd-journald saves the events and messages in a binary format that cannot be read with a text editor. You can query the journal with the
journalctl
command.journald stores logs in memory (RAM) without persistent storage (by default), while the Traditional syslog (e.g., /var/log/messages) persists log data to flat files.
To show all event messages, use:
[server]$ journalctl
To view journal entries for today, use:
[server]$ journalctl --since today
To check for messages related to the sql server service for the past hour, you can run:
journalctl --unit mssql-server.service --since "1 hour ago"
SQL Server Error logs:
The error log contains informational messages, warnings, and information about critical events. The error log also contains information about user-generated messages and auditing information such as logon events (success and failure).In Linux the default SQL Errorlog location is/var/opt/mssql/log
(Keep in mind that you will require superuser permissions to traverse these folders and logs.)
This is also the default location for XEvents, dump files and trace files.
Comparing Error and System Logs Output
You can use both the SQL Server error log and the system logs to identify the cause of problems. For example, while monitoring the SQL Server error log, you may encounter error messages that do not contain cause information. By comparing the dates and times for events between these logs, you can narrow the list of probable causes.Consider the following scenario: We receive an alert indicating that the application is unable to connect to SQL following a maintenance activity over the weekend. We log into the server to verify if SQL Server is operational.
Upon inspection, we find that the SQL process is not running. We then check the status and attempt to start the SQL Server to see if it comes online.
On restarting SQL Server, we observe the following information in the error logs. Next, we need to examine the system event logs, specifically the
messages
file in RHEL & SLES or the syslog
file in Ubuntu, where events are recorded.We can use the following command to retrieve service logs from the recent boot:journalctl --unit mssql-server.service --boot --no-pager
By validating the
messages
or syslog
file within the same timeframe, we gather additional information.From the logs, we identify a permission issue on a specific folder or file that is preventing SQL Server from starting. Granting the necessary permissions should resolve the issue.
Task Manager:
Where is the Task Manager in Linux? There are command-line utilities that provide similar information to what we see in Windows. We will explore the
top
and htop
utilitiestop
The first line of numbers on the dashboard includes the time, how long your computer has been running, the number of people logged in, and what the load average has been for the past one, five, and 15 minutes. The second line shows the number of tasks and their states: running, stopped, sleeping, or zombie.
The next 3 lines describe CPU, Memory utilization, Swap Memory of the server. This article describes the output in detail.
The column details of the process are tabulated as below.
To get into the details of a particular process, (In our case SQL Server) we can use the PID to get the further details of the SQL and the tasks that SQL is currently running.
To get the child PID of SQLServer, use this short command and run the top command on the pid.
pidof sqlservr | cut -d' ' -f1
top -p (pid output of the above command)
We can also get them into single command as shown below,
top -p $(pidof sqlservr | cut -d' ' -f1)
We see the utilization of various tasks within SQL Server, that are consuming the resources.
Another utility that provides similar information is
htop
. The key difference is that htop
offers a more user-friendly experience with its use of colors and graphs, compared to the top
command.While
top
and htop
provide valuable information, there are additional command-line tools such as vmstat
, System Activity Information (sar
), iostat
, and others. I suggest running these commands on your test machines and monitoring the output to become familiar with checking performance metrics on a Linux server.References:
SLES: https://documentation.suse.com/sles/12-SP5/html/SLES-all/cha-util.html#
RHEL:https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/performance_tuning_guide/sect-red_hat_enterprise_linux-performance_tuning_guide-performance_monitoring_tools-built_in_command_line_tools
Ubuntu: https://manpages.ubuntu.com/manpages/jammy/man1/top.1.html#2.%20summary%20display
I hope this serves as a helpful introduction to basic troubleshooting for SQL on Linux.
Continue reading...