Jump to content

Synapse Connectivity Series Part #3 - Synapse Managed VNET and Managed Private Endpoints


Recommended Posts

Guest FonsecaSergio
Posted

This is part 3 of a series related to Synapse Connectivity - check out the previous blog articles:

 

 

 

 

In this article we are going to talk about Synapse Managed Virtual Network and Managed Private Endpoints.

 

1 - Synapse Managed VNET and Data Exfiltration

 

2 - Managed Private Endpoints

 

3 - Synapse Managed VNET flavors

 

  • 3.1 - Option 1 - Synapse with NO VNET
  • 3.2 - Option 2 - Synapse with Managed VNET
  • 3.3 - Option 3 - Synapse with Managed VNET + DEP (Data Exfiltration Protection)
    • 3.3.1 - Alternative to SHIR VM

 

4 - Troubleshooting

 

 

 

1 - Synapse Managed VNET and Data Exfiltration

 

 

When you create your Azure Synapse workspace, you can choose to associate it to an Azure Virtual Network. The Virtual Network associated with your workspace is managed by Azure Synapse. This Virtual Network is called a Managed Workspace Virtual Network or Synapse Managed VNET.

 

 

 

 

Synapse with Managed VNET supports enabling Data Exfiltration Protection (DEP) for workspaces. With exfiltration protection, you can guard against malicious insiders accessing your Azure resources and exfiltrating sensitive data to locations outside of your organization’s scope. At the time of workspace creation, you can choose to configure the workspace with a managed virtual network and additional protection against data exfiltration.

 

  • Important note: You can only select these options (Manage VNET and DEP) during workspace creation and cannot change it after.

 

 

 

mediumvv2px400.png.941aaa1c6102e663e231cdbd2e368ed1.png

 

 

 

 

 

Azure Synapse provides various analytic capabilities in a workspace:

 

  • Data integration (ADF)
  • Serverless Apache Spark pool
  • Dedicated SQL pool
  • Serverless SQL pool.

 

 

 

If your workspace has a Managed VNET, ADF - Azure Integration Runtime (AzureIR) and Spark resources are deployed in the VNET. This means that when an Azure IR or Spark VM is created or started for an execution, it will get a private IP from this managed VNET and will comply with the rules of this managed VNET. If you have selected Data Exfiltration Protection, you cannot go out to ANY public endpoint. (More details below)

 

 

 

Dedicated SQL pool and serverless SQL pool are multi-tenant and therefore reside outside of the Managed workspace Virtual Network. Intra-workspace communication from ADF/ Spark to dedicated SQL pool and serverless SQL pool use Managed Private Endpoints. These private endpoints are automatically created for you when you create a workspace with a Managed VNET associated to it.

 

 

 

largevv2px999.png.240d1e6dee840d969804825a63bc9754.png

 

 

 

 

 

Taking into account all of the requirements mentioned, we have three variations of Synapse workspaces:

 

  • Option 1 - Synapse with Shared VNET (Shared VNET = No managed VNET)
  • Option 2 - Synapse with Managed VNET
  • Option 3 - Synapse with Managed VNET + DEP (Data Exfiltration Protection)

 

Before we dive into the details of the three options, we will explain more about are Managed Private Endpoints.

 

 

 

2 - Managed Private Endpoints

 

 

Managed private endpoints are Private Endpoints created within a Synapse Managed VNET. Managed private endpoints establish a private link to Azure resources, and Azure Synapse manages these private endpoints on your behalf. You can create Managed private endpoints from your Azure Synapse workspace to access Azure services like Azure Storage or Azure Cosmos DB, as well as and Azure hosted customer/partner services.

 

 

 

IMPORTANT !!!

 

You cannot reuse other existing private endpoints from your customer Azure VNET. Because in this scenario we want to connect Synapse resources on a Managed VNET to an Azure resource, not your client directly to resource, that means the traffic will not go through your VNET or through your firewall. Its an VM (ADF or Spark) on an Synapse Managed VNET, accessing the resource directly.

 

 

 

largevv2px999.png.ebffaf4db53123ee6285da8aab27c82b.png

 

 

 

A Managed private endpoint uses private IP address from your Managed Virtual Network to effectively bring the Azure service that your Azure Synapse workspace is communicating into your Virtual Network. Managed private endpoints are mapped to a specific resource in Azure and not the entire service. Customers can limit connectivity to a specific resource approved by their organization.

 

Ref: Synapse Managed private endpoints

 

 

 

A private endpoint connection is created in a "Pending" state. The destination resource owner is responsible to approve or reject the connection. Only a Managed private endpoint in an approved state can be used to send traffic to the private link resource that is linked to the Managed private endpoint. You can also create private link between different subscription and even different tenants.

 

 

 

mediumvv2px400.png.ed55a7fba6a48a888ee5b12892e689d0.png

 

 

 

Ref: Manage Azure Private Endpoints

 

 

 

In the image below I'm trying to show that when you start an ADF (Azure IR) execution or when you stark an Spark Job, we need a machine to actually run it, as the machines are created on demand as you pay per use. As the machines need to be part of the VNET we need to create them linked in the VNET

 

 

 

largevv2px999.png.79a640ecbf49463b85ac5bda850b225c.png

 

 

 

 

 

 

 

 

 

3 - Synapse Managed VNET flavors

 

3.1 - Option 1 - Synapse with NO VNET

 

 

ADF Azure IR and Spark VMs create a resource that will be used to process your workload, this process can take a few minutes to get ready

 

  • This warmup time can take up to 4 min considering SLA (SLA for Azure Synapse Analytics)
    • We guarantee that at least 99.9% of the time, all Azure Synapse pipeline activity runs will initiate within 4 minutes of their scheduled execution times.

    [*]VMs created are not part of any CX VNET and have no fixed outbound IP. More info at Azure Integration Runtime IP addresses

  • As not part of Managed VNET means you cannot use Managed Private Endpoints to be able to reach resources like an Storage account in a secure way. As the VMs does not have fixed IP, makes it also not easy to open your destination firewall to specific IP or IP Range
  • To be able to connect to secure resources with fixed IP, use a Self Hosted Integration runtime (SHIR) . ADF Service speaks with SHIR using port 443 and then SHIR is responsible to connect to the data source. Ref. Create and configure a self-hosted integration runtime
    • Note that the ADF service and SHIR need to communicate, and the communication protocol is crafted so that only outbound connections from the SHIR to the ADF service are required

    [*]Reg Inbound connection: Does not matter that you do not have managed VNET, your workspace still can be accessed by public or private endpoint. Check out Synapse Connectivity Series Part #2 - Inbound Synapse Private Endpoints for more information.

 

largevv2px999.png.134d28a813269cf28ab485bc2a7ce8fc.png

 

 

 

3.2 - Option 2 - Synapse with Managed VNET

 

 

ADF Azure IR and Spark VMs create a resource that will be used to process your workload, this process can take some minutes to get ready

 

 

"
By design, Managed VNet IR takes longer queue time than Azure IR as we are not reserving one compute node per service instance, so there is a warm up for each copy activity to start, and it occurs primarily on VNet join rather than Azure IR.
"

 

  • As part of Managed VNET means you can use Managed Private Endpoints to reach resources like a Storage account and other Azure Resources in a secure way.
    • If reaching Azure resource with Managed Private Endpoint, there is no need to open firewall on destination resource as access is private.
    • If reaching Azure resource without Managed Private Endpoint, as said above there is no fixed outbound IP. Check out Azure Integration Runtime IP addresses for more information.

    [*]You can also use a Self Hosted Integration runtime (SHIR) to access onprem resources or resources that does not have a Managed Private Endpoint. ADF Service speaks with SHIR using port 443 and then SHIR is responsible to connect to the data source. Ref. Create and configure a self-hosted integration runtime

    • Note that the ADF service and SHIR need to communicate, and the communication protocol is crafted so that only outbound connections from the SHIR to the ADF service are required
    • Reg "resources that does not have a Managed Private Endpoint". The list of available Managed Private Endpoints is limited and does not include the ability to create a managed private endpoint to a public Web API. For example, it is not possible to create a managed private endpoint to access the public API management.azure.com, which is used in some automation scenarios. In such cases, the SHIR must be used to connect to it in a private manner. Check below a list of currently supported private endpoints (valid as of 2022-01-05):
      [attachment=33572:name] [attachment=33573:name]

 

 

 

 

largevv2px999.png.38e653b95b148c959c2629e1ae6a2af0.png

 

3.3 - Option 3 - Synapse with Managed VNET + DEP (Data Exfiltration Protection)

 

 

The difference option 2 is you are NOT allowed to access any public endpoint, even the ones that are part of your subscription. You need to access the resources using Managed Private Endpoints.

 

  • Only workaround is using a Self Hosted Integration runtime (SHIR)
  • You can still connect to resources from other subscriptions and other tenants as long as you approve them as as long as access is done though Managed Private endpoints

 

mediumvv2px400.png.2460421ddd68ddd1f569f2b6fefa4c8e.png

 

Check out Data exfiltration protection for Azure Synapse Analytics workspaces for more information.

 

 

largevv2px999.png.8ebbcd8f99a5654f5811247f58d49c56.png

 

 

 

3.3.1 - Alternative to SHIR VM

 

 

Instead of using Self Hosted integration runtime you can use proxy machines. We will not go into the details of these solutions in this article, but the following documentation provides a step-by-step guide:

 

 

4 - Troubleshooting

 

 

Troubleshooting inbound connections have no influence if you have or not Managed VNET, if this the case, refer to Synapse Connectivity Series Part #2 - Inbound Synapse Private Endpoints.

 

 

 

Check the following troubleshooting items:

 

4.1 - Linked Services

 

 

Check if the linked service is using the managed private endpoint.

 

largevv2px999.png.c02f26031f600afe24bfe17d316250bd.png

 

 

 

4.2 - Managed Private Endpoints

 

 

Check if Managed private endpoints exists and if they are approved.

 

*Pay attention that some services have multiple endpoints like storage (blob and dfs), that will depend on an endpoint being used by you

 

largevv2px999.png.f0f89b59555250cbcd981e9cb58b4bae.png

 

 

 

You can also check it from resource point of view. Name of private endpoint will be [WORKSPACENAME].[NAME YOU GIVEN TO PE]

 

largevv2px999.png.72f276b5540ba886d03312dc3ae76844.png

 

 

 

4.2 - Test connection

 

 

Check if it's using the managed private endpoint.

 

mediumvv2px400.png.7ef31c6ff03b38e16ae3cd6a0ffb4189.png

 

 

 

Enable interactive authoring to test connections. As we have referenced before, we need a machine that exists on Synapse Managed VNET to test this connection, as something that is created on demand is not available right away.

 

  1. mediumvv2px400.png.e57d81a29cb3a2dd655490760ce729b7.png
  2. mediumvv2px400.png.e6766443a8a3846301d58309fe7a66dd.png
  3. mediumvv2px400.png.9552183d1fb4f254e3123a54fad024a9.png
  4. mediumvv2px400.png.c6daebae97bedacb4a8228cb5dce2f4b.png
     
     

4.2 - Test name resolution and port (from Spark)

 

 

As we do not have an Azure VM inside the Managed VNET to do some tests, we can use Spark Notebooks to test it directly.

 

 

 

Check name resolution, should resolve to something private like 10.x.x.x .

 

 

 

 

 

%%pyspark

import socket

 

hostname = "management.azure.com"

port = "443"

 

############################################################

def resolve_hostname(hostname):

try:

ip = socket.gethostbyname(hostname)

print(f"{hostname} resolved to {ip}.")

return ip

except:

print(f"Unable to resolve hostname {hostname}.")

return None

############################################################

def is_port_open(hostname, port):

try:

sock = socket.create_connection((hostname, port), timeout=1)

sock.close()

print(f"Port {port} is OPEN to {hostname}")

except socket.error:

print(f"Port {port} is CLOSED to {hostname}")

 

 

############################################################

 

resolve_hostname(hostname)

is_port_open(hostname, port)

 

 

 

 

 

 

 

We can see below that Storage is open because we have a Managed private endpoint, but management.azure.com show as closed because this was a workspace with DEP and it cannot go to public endpoints as explained above.

 

 

 

[attachment=33585:name]

[attachment=33586:name]

 

 

 

 

 

 

 

References and links

 

 

Continue reading...

mediumvv2px400.png.453eb1d51addd504c17f9f17914817d7.png

mediumvv2px400.png.08d6ba6d1812c5777807ec8f3db8dd89.png

largevv2px999.png.0acc5e8a786e4af9f7c135bab8a54b8b.png

largevv2px999.png.83093c5e7ebf9e0eaa38f655fe75c831.png

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...