Jump to content

Featured Replies

Posted

Due to the potential impact on performance and storage costs, Azure Databricks clusters don't capture networking logs by default. Follow the below instructions if you need to capture tcpdump to investigate multiple networking issues related to the cluster. These steps will capture a TCP dump on each cluster node--both driver and workers during the entire lifetime of the cluster.

 

 

 

IMPORTANT: Make sure to remove the tcpdump init script from the cluster once you generate the tcpdump to avoid performance and additional cost.

 

 

 

Configure a cluster-scoped init script for the cluster (s) in question

 

  1. Run a cluster-scoped init script in the notebook from the cluster.
  2. Once the notebook runs, it will generate a file under the location dbfs:/databricks/init-scripts/tcpdump_pypi_repo.sh.
  3. Add the init script location to the cluster configuration page, Advanced Options toggle and click Init Scripts tab to point the init script directory location and enable cluster logging path.
  4. Restart the cluster.
  5. Then, once you run the job in that cluster, the TCP dump will be generated in this path /dbfs/databricks/tcpdump/${DB_CLUSTER_ID}.

 

Step 1 : Run a cluster-scoped init script in the notebook from the cluster.

 

 

 

 

 

 

%scala

dbutils.fs.put("dbfs:/databricks/init-scripts/tcpdump_pypi_repo.sh", """

#!/bin/bash

sleep 5s

set -x

if [[ $DB_IS_DRIVER = "TRUE" ]]; then

 

MYIP=$(hostname -I | sed 's/ *$//g')

echo "initiating tcp dump"

TCPDUMP_FILE="/tmp/trace_$(date +"%Y%m%d_%H%M")_${MYIP}.pcap"

sudo tcpdump -w $TCPDUMP_FILE -W 1000 -G 1800 -K -n -s256 host files.pythonhosted.org and port 443 &

sleep 15s

echo "initiated tcp dump `ls -ltrh $TCPDUMP_FILE`"

 

cat <<'EOF' >> /tmp/copy_stats.sh

#!/bin/bash

 

SOURCE_FILE=$1

DB_CLUSTER_ID=$(echo $HOSTNAME | awk -F '-' '{print$1"-"$2"-"$3}')

 

if [[ ! -d /dbfs/databricks/tcpdump/${DB_CLUSTER_ID} ]] ; then

sudo mkdir -p /dbfs/databricks/tcpdump/${DB_CLUSTER_ID}

fi

 

BASEDIR="/dbfs/databricks/tcpdump/${DB_CLUSTER_ID}"

#BASEDIR="/local_disk0/tcpdump/${DB_CLUSTER_ID}"

 

mkdir -p ${BASEDIR}

 

FILESIZE=0

while [ 1 ]; do

CUR_FILESIZE=$(stat -c%s "$SOURCE_FILE")

if [ "$CUR_FILESIZE" -gt "$FILESIZE" ]; then

sudo cp -f $SOURCE_FILE ${BASEDIR}/.

fi

FILESIZE=$CUR_FILESIZE

sleep 1m

done

EOF

 

chmod a+x /tmp/copy_stats.sh

/tmp/copy_stats.sh $TCPDUMP_FILE &>/tmp/copy_stats.log & disown

fi

 

 

""",true)

 

 

 

 

 

Step 2: Once the notebook runs, it will generate a file under the location dbfs:/databricks/init-scripts/tcpdump_pypi_repo.sh.

 

 

[ATTACH=full]54081[/ATTACH]

 

 

 

Step 3: Add the init script location to the cluster configuration page, Advanced Options toggle and click Init Scripts tab to point the init script location.

 

 

[ATTACH=full]54082[/ATTACH]

 

 

 

Enable cluster logging path.

 

[ATTACH=full]54083[/ATTACH]

 

Step 4: Restart the cluster.

 

 

[ATTACH=full]54084[/ATTACH]

 

 

Step 5: Then, once you run the job in that cluster, the TCP dump will be generated in this path /dbfs/databricks/tcpdump/${DB_CLUSTER_ID}.

 

 

[ATTACH=full]54085[/ATTACH]

 

Download the TCPDUMP to your local machine by following below.

 

 

Step 1 : Install and configure Databricks CLI in your local computer

 

 

[ATTACH=full]54086[/ATTACH]

 

 

Step 2: Configure Databricks using Access Token https://learn.microsoft.com/en-us/azure/databricks/dev-tools/auth#--azure-databricks-personal-access-tokens-for-workspace-users

 

 

[ATTACH=full]54087[/ATTACH]

 

 

Step 3: Validate the workspace DBFS structure and Download .pcap file locally using cp command.

 

 

[ATTACH=full]54088[/ATTACH]

 

[ATTACH=full]54089[/ATTACH]

 

 

 

NOTE:

 

You can replace the “host” information as per your requirements. For example, if you are testing the connectivity from the cluster to your Azure SQL server, then replace the host information with the IP address of the Azure SQL DB. It should be sudo tcpdump -w /tmp/trace_%Y_%m_%d_%H_%M_%S_${MYIP}.pcap -W 1000 -G 1800 -K -n host <IPAddress of the Azure SQLDB>

 

 

 

Use cluster-scoped init scripts - Azure Databricks

 

https://kb.databricks.com/en_US/dev-tools/use-tcpdump-create-pcap-files?_ga=2.234523927.800705506.1682010247-1215488715.1655150820

 

https://learn.microsoft.com/en-us/azure/databricks/dev-tools/cli/

 

https://learn.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/vnet-inject#--troubleshooting

 

Continue reading...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...