When working with Azure Synapse Analytics, connectivity issues can arise when trying to interact with external APIs, particularly if your workspace is configured with Managed Virtual Networks and Data Exfiltration Protection.
Recently, our team faced a challenge connecting Azure Synapse to an AWS-hosted API, and through troubleshooting, we identified the root cause and implemented an effective workaround. This post details the problem, why it occurred, and how we resolved it.
What Is a Synapse Workspace Managed Virtual Network?
When creating an Azure Synapse workspace, you have the option to associate it with a Managed Virtual Network (VNet). This is a fully managed network that Azure Synapse controls, designed to enhance security and simplify networking configurations.
Some key features and benefits include:
- Network Isolation: Resources within the Synapse workspace (such as Spark pools and integration runtimes) are deployed inside the Managed VNet, isolating them from other networks.
- Simplified Management: Azure Synapse automatically handles network configuration, eliminating the need for manually managing subnets for Spark clusters or configuring Network Security Groups (NSGs).
- Dynamic Resource Allocation: There’s no need to pre-allocate subnets based on peak usage, as the Managed Virtual Network dynamically adjusts resources.
- Elimination of Inbound NSG Configuration: Since Azure Synapse manages network security, inbound NSG rules do not need to be manually configured, reducing security risks.
- Enhanced Data Protection: A Managed Virtual Network works with Data Exfiltration Protection to prevent unauthorized data transfers, ensuring security compliance.
Limitations of Managed Virtual Networks
- No Control Over Network Rules: Since NSGs and Route Tables do not apply, outbound network rules cannot be adjusted manually.
- Limited Outbound Connectivity: The only supported method for outbound traffic is via Azure Private Link, meaning services that do not support Private Link cannot be accessed.
- Irreversible Once Configured: If a workspace is created with a Managed Virtual Network, it cannot be removed or modified later. Similarly, you cannot add a Managed VNet to a workspace that was created without one.
For more details, refer to Microsoft’s official documentation on Synapse Managed Virtual Networks.
What Is Data Exfiltration Protection in Synapse?
Data Exfiltration Protection is an additional security feature in Azure Synapse Analytics that prevents unauthorized data transfers to external destinations. When enabled, it restricts all outbound (egress) traffic, ensuring that data can only be sent to approved Microsoft Entra tenants and specified destinations.
This feature prevents data leaks by enforcing strict network isolation, but it also introduces connectivity restrictions.
Key Considerations
- Outbound traffic is completely blocked unless the target supports Managed Private Endpoints.
- Once enabled, it cannot be turned off or modified—it is permanent for the workspace.
- All outbound API requests will fail unless routed through an approved method, such as a self-hosted integration runtime.
For more details, read Microsoft’s official documentation on Data Exfiltration Protection.
Troubleshooting the API Connection Issue
Initial Testing and Discovery
We tested connecting to the AWS API using a Synapse workspace with a Managed Virtual Network but without Data Exfiltration Protection, and the connection worked fine.
However, in our production workspace, where Data Exfiltration Protection was enabled, all outbound API calls failed.
After reviewing Microsoft documentation, we confirmed:
- Data Exfiltration Protection blocks all outbound traffic, including API requests, unless the target supports Managed Private Endpoints.
- It cannot be disabled or modified after the workspace is created, meaning our Synapse workspace was permanently restricted from making direct API calls.
The Solution: Using a Self-Hosted Integration Runtime
Since direct outbound internet access was blocked, we needed a workaround to proxy our API requests. The solution was to use a Self-Hosted Integration Runtime (IR) running on an Azure VM with internet access.
Steps to Implement the Self-Hosted IR
-
Deploy a Self-Hosted Integration Runtime on an Azure VM
- We installed Azure Integration Runtime on a VM with internet access.
- I walk through how to do this in my post on connecting sql server db to azure synapse: Link a SQL Server Database to Azure Synapse Workspace
-
Configure Synapse to Use the Self-Hosted IR
- Instead of relying on Synapse’s Managed Virtual Network, we configured our Synapse workspace to use the self-hosted IR as a proxy for outbound connections.
-
Testing the Connection
- After switching to the self-hosted IR, we successfully authenticated and reached the AWS API.
-
Reusing an Existing Self-Hosted IR
- Since we already had a self-hosted IR set up for an on-premises SQL Server connection, we reused it for API calls instead of spinning up a new VM.
Key Takeaways
- Managed Virtual Networks isolate Synapse resources, offering security but limiting outbound connectivity.
- Data Exfiltration Protection prevents all outbound traffic, including API calls, unless the destination supports Managed Private Endpoints.
- Once enabled, these settings cannot be modified, making pre-planning critical.
- A Self-Hosted Integration Runtime (IR) allows Synapse to connect to external APIs by acting as a proxy on an Azure VM.
- Existing Self-Hosted IRs can be reused as long as they have the necessary outbound connectivity.
Conclusion
This experience underscores the importance of understanding Azure Synapse’s security features before deployment. While Managed Virtual Networks and Data Exfiltration Protection enhance security, they introduce strict limitations on outbound connectivity.
If you’re facing similar connectivity issues in Synapse with a Managed Virtual Network, using a self-hosted IR on an Azure VM with internet access provides a reliable workaround.