Palo Alto GlobalProtect Issue: Split Tunnel VPN with Skype for Business

There was a weird issue when I first joined my current job: I was told was that because of the way Palo Alto GlobalProtect (GP) and Microsoft Skype for Business (SfB) works (or maybe was configured?), I needed to log-in to SfB first, then connect to the GP VPN. The rationale was that SfB wouldn’t connect, or it would take a long time to connect, AND THEN even after a period of time, SfB would start behaving weird and it’s Exchange connectivity would drop, so SfB wouldn’t get voicemails, missed calls, etc. Just all out weirdness going on. It’s 2020, so maybe some of this true to form for the year, but probably not.

Palo Alto GlobalProtect Skype for Business

Click here if you want to skip the context and go to the solution.

Uh…Skype for Business?

Full stop. I’m sure you’re asking yourself right now, “Why not just migrate to Microsoft Teams? Get rid of that whole on-premises stuff.”

Let me answer that in a meme:

Sean Beam Boromir Meme: One does not simply migrate from Skype for Business to Teams

Skype for Business is one of the integrative technologies that spans lots of technology stacks that isn’t exactly easy to just jump ship from, and Teams as a VoIP replacement is arguably not there yet.

Also, have you seen the UI comparisons? Going from a sleek floating window for calling, IM, and conferencing with SfB to the giant-lets-pack-lots-of-services-into-one-large-window that is Teams is kind of a hard sell on the user training side of things. Maybe I’m biased. Maybe, but I digress.

The Challenge

Ok, back to the GlobalProtect and Skype for Business issues.

I was admittedly puzzled that the solution — to instruct users to sign-in to SfB before they sign-in to the VPN — was the best solution; it doesn’t seem right from a user experience perspective, and then when you toss-in the sudden weird issues with Exchange connectivity, none of this seemed right, and I doubt that’s the ideal experience. So I brought this up and the team basically said, “we just haven’t had time to troubleshoot it, but if you want to figure it out, go for it.”

You know what that sounds like? An adventure. An itch to itch. Something to solve. A challenge! There could be only one response:

Challenge Accepted Meme

Why Split Tunnel Skype for Business?

Something you might be asking is “Why configure split tunnel in the first place? Isn’t split tunneling a headache to manage?”

Split tunneling can definitely be a PITA, but like a million IT questions out there, the answer ultimately to this is, “it depends.” From my experience, split tunneling becomes difficult when you have a lot of split tunneling to manage, but if you have one or two services, it’s not that bad.

For Skype for Business, it’s one of those technologies that is sensitive to jitter, latency, and packet loss. Why? It’s because it’s voice traffic, and just like voice traffic on the inside of the network, where there’s jitter, latency, and/or packet loss, users on opposite ends of calls/conferences will experience this as delayed audio or parts of the conversation will just break up and it leads to an overall poor experience.

When you configure split tunneling, particularly for technologies like SfB, you avoid the dual encryption scenarios and you allow the technology to use its own optimized methods for connecting voice and application traffic by letting the software connect to services over the internet directly versus through a tunnel.

Baseline

That said, what’s the baseline here? How is GlobalProtect configured with split tunneling and what issues are there?

For GlobalProtect, the split tunnel configuration was configured pretty much like this documentation from Palo Alto (using just the application split tunnel, nothing else). It looked like this:

GlobalProtect Split Tunnel Domain and Application Tab Showing Excluded lync.exe

Here are the issues that were encountered in this setup:

  1. Connectivity issues if connecting SfB after GP VPN is connected
  2. Exchange connectivity in the SfB client drops after a duration of time, even if connection is established before VPN connection
  3. Call transfers working inconsistently
  4. Application sharing working inconsistently
  5. Conference meetings working inconsistently

Issues 3-5 really came later because they were hard to pinpoint due to their inconsistency, but issues 1 and 2 brought some fast wins.

Let’s get to some solutions.

Solutions

Solution for 1 and 2: DNS. It’s always dns.

It’s kind of a joke, but DNS really does cause a lot of problems, and in a split tunnel configuration when you’ve split-tunnel the traffic by application, the application is still going to resolve addresses by the servers you specify in the GlobalProtect configuration. So if you haven’t changed DNS records, the application will split tunnel, but it will still try to connect to internal resources because that’s the records it has.

I don’t have a PCAP screenshot for this, but if you pull up Wireshark and look at the PCAPs for your network interface (non-GP interface), you’ll see attempts to get to SfB internal IP addresses that aren’t (typically) on your network, and thus services fail.

The solution is simple: for your VPN clients, serve the external IP addresses for A records being queried. I solved this by setting up dedicated DNS servers for VPN clients, then just creating the zones and root records for each FQDN. I did this for all the Skype for Business external IPs (edge and reverse proxy) and the external Exchange records.

After doing this, problems 1 and 2 went away because hostnames were being resolved correctly.

Solutions for 3 through 5: Firewall rules and IP Split Tunneling

Problems 3 through 5 were frustrating like no other because I couldn’t really narrow the problems down exactly. Some people had no problems with call transfers, application sharing, or conferencing, but then sometimes they would. So the thing to do is dig into the logs, and when I did I encountered a lot of this:

ms-diagnostics: 23;source="mediationServer.contoso.com";reason="Call failed to establish due to a media connectivity failure when one endpoint is internal and the other is remote"

Or

ms-diagnostics: 24;source="mediationServer.contoso.com";reason="Call failed to establish due to a media connectivity failure when both endpoints are remote"

Or even better:

ms-client-diagnostics: 52049; reason="Leaving app sharing because re-invite failed";UserType="Callee";MediaType="applicationsharing-video"

These all pointed to firewall issues, and even the ICEWARN messages noted something wrong with STUN, TURN, NAT, etc.

So I did some digging and found that firewall rules needed to be in-place to prevent VPN clients and internal SfB servers from communicating with one another. So I added some PAN policies, and things got better, but not perfect. Also, I added the external SfB IP addresses to the split tunnel in Network > GlobalProtect > Gateway > Agent > Client Settings > Client-Config > Split Tunnel > Exclude (which basically just adds static routes in the Windows routing table to send traffic for those IPs out the non-tunneled interface). Still the occasional error creeping up, and I could even witness it, but still can’t quite nail the problem.

Finally, I had a thought: why not get rid of the application process split tunnel? I mean, if I have DNS addreses configured, and IP split tunneling working, why is the application process split tunnel needed? Removed that from the setting and bam — all the problems went away. Like magic.

Shia Labeouf Magic

Here’s what the final outcome should look like for a GlobalProtect-Skype for Business-Exchange environment for split tunnel.

Palo Alto GlobalProtect Skype For Business Split Tunnel

Of course, I fully admit this is really more of a legacy design with everything on-premises, but you could just as easily send the Exchange traffic to Office 365 in the split tunnel.

Thoughts on GlobalProtect Application Process Split Tunnel

While I had configured the traditional methods of doing split tunnel configurations (IP split tunnel and DNS servers), I’m still a little puzzled to the fact that the Palo Alto GlobalProtect application process split tunnel seemed to cause issues. My guess is that something in the way the Skype for Business client is designed prevents the process from being completely split tunneled, and I think this has to do with the way Skype for Business operates with Windows.

If you get really bored on a Friday night and have nothing better to do in life, check out some of these deep dives on candidate path selection and other stuff related to media flow. What you’ll see in the SfB client log files is something like this:

Skype for Business candidate selection
Credit

Basically, SfB gets a selection of candidates that it uses from the interfaces on the computer. In a GP split tunnel set up (with or without application process split tunnel configured), you’ll see ALL IP addresses (including the tunnel address) listed as candidates, and my suspicion is that Skype for Business still tries to use a tunnel interface, and sometimes it gets around the Palo Alto GlobalProtect application exclusion, and then that causes calls, application sharing, and even conferences to fail. I can’t show my own logs seeing this for security reasons, so you’ll have to trust me on that one.

Solution (tl;dr)

Here’s the quick solution for GlobalProtect and Skype for Business Split Tunnel

  1. Create separate DNS servers for VPN clients and create the specific Skype for Business DNS records needed, and configure them for external IP addresses so that Skype for Business resolves external addresses and configures itself appropriately.
  2. Create firewall rules that block traffic to/from the VPN network to internal Skype for Business and Exchange IP addresses. We want the SfB client to determine it can’t go inside for traffic.
  3. In Panorama or PANOS, under Network > GlobalProtect > Gateway > Agent > Client Settings > Client-Config > Split Tunnel > Exclude, configure all external SfB addresses so that the GP client doesn’t send traffic for those IPs through the tunnel. Alternatively, under Network > GlobalProtect > Gateway > Agent > Client Settings > Client-Config > Split Tunnel > Domain and Application > Exclude Domain, you could add the SfB external FQDNs (that said, IIRC, the stuff under ‘Domain and Application’ requires the GlobalProtect license…technically).

Links, Further Reading, Credit

Skype for Business: Cleanly Shutting Down Server (Invoke-CSComputerFailover and More)

It’s been over three years since I managed and deployed a Skype for Business/Lync system, and at my new job I was hired on as a be a network engineer, but I noted in a past life I received a MCSE in Skype for Business, so I could definitely be the backup for the primary SME (subject matter expert) in SfB. However, in a strange twist, the primary SME left — and you know what, there’s just not a lot of Skype for Business/Lync engineers out there, especially in a small labor market, so I stepped up to help the organization because I was the most qualified by a long-shot.

So I’m back doing some Skype for Business again.

Captain America Here We Go Again

I actually have always liked voice routing, so it’s fun to be doing some of this stuff again (although SfB is a pretty intense, integrative technology, so it’s not all Pop-Tarts and unicorns).

However, I was trying to get reacquainted with some commands for cleanly shutting down a Skype For Business server, and I just didn’t find a lot of good information out there, so I thought I might write something up “real quick”. This is somewhat basic info for SfB enterprise deployments, but it might be helpful.

Enough of the pretext, let’s get to the first command…

Get-CSWindowsService

I’m starting with this command because it’s the most basic command you should already know, but plays a role for later in this post. Typically you’ll use the command to see how many connections are being used by a service, if a service is running, etc.

The command is straightforward: Get all or one of the SfB (or Communication Server, which is where the acronym CS comes from) Windows Services on the machine.

Stop-CSWindowsService

The command `Stop-CSWindowsService` is the most basic command you’ll use to stop all or one of the services on a SfB server. The command will execute stopping of services in the proper order of stopping SfB services, including any dependent services.

Typically you’ll be using this command on a ‘Standard’ deployment SfB server, or any non-front end server in an ‘Enterprise’ deployment such as mediation/edge servers (more info: standard vs enterprise deployment). However, there are probably rare situations in which you’ll stop just one service, so you’ll likely be stopping all of them.

If you’re doing this outside a maintenance window for some reason, I prefer to do the following: Stop-CSWindowsService -Graceful. The -Graceful is important here, because what it does is it puts the services into a paused state, preventing any new connections from happening and waiting on existing connections to disconnect. On mediation servers, whenever I’ve need to stop a server in a mediation pool, this is my preferred method so that I wait for the calls to end. However, it won’t stop until the call is done, so you might be waiting awhile.

Invoke-CSComputerFailover

For whatever reason, this command scared me at first, largely because of my ignorance of what it does. The official documentation on the command I don’t think does it justice, so here’s my attempt at it.

The command `Invoke-CSComputerFailover` will basically perform a Stop-CSWindowsService -Graceful operation, but it acts slightly different. The differences:

  1. It’s used on front end servers in an enterprise deployment (or at least I’ve never seen it documented or used on other SfB server pools). The command causes the front end server to be in a ‘failover’ state, making it unavailable to the rest of the front end pool.
  2. The command migrates data, routing groups, and more to the other front end servers.
  3. The command has a wait time of 1 hour per service, after which if the connections haven’t disconnected, it will force a disconnect. This default can be changed with the `-WaitTime` parameter.
  4. This command will make the server unavailable in the front end pool. After a reboot, or if for some reason you run Start-CSWindowsService, the server won’t be available until you run Invoke-CSComputerFailBack.

After working with it and using it several times, it’s not as scary as I thought. Just run it on one machine at a time lest you have some Windows Fabric issue due to quorum loss (or something to that effect).

Invoke-CSComputerFailover Hanging or Taking Awhile

Sometimes when you’re failing over a front end server, you get stuck waiting for some services to stop like this:

Status screen waiting for Invoke-CSComputerFailover to Progress

If you look at `Get-CSWindowsService`, you might actually find something like this:

Get-CSWindowsService Seeing Services With Hanging Connections

If you note the red and blue arrows, the services are left open, likely from a conference that has ended already, but is being left open for whatever reason. To speed up Invoke-CSComputerFailover, just open a separate elevated terminal and stop the services like this:

Stopping the stalled services with Stop-CSWindowsService in separate window

After which, `Invoke-CSComputerFailover` will continue on as expected.

Invoke-CSComputerFailover progressing

I originally tried out the idea on my own, but the following blog entry also helped me and explains it from a different perspective.

IIS URL Rewrite Basic Walkthrough

Over the years doing various Skype for Business deployments, or just doing some vanilla web server work, I’ve needed a reverse proxy that was simple and easy to deploy. There are quite a few out there such as HAProxy (my preference), NGINX, and then some commercial products like KEMP. However, the deployments I was doing didn’t really need the investment of a major appliance, and some of the users I was working with preferred to steer clear of Linux/Unix systems, so a great choice for this is IIS Application Request Routing. This is a simple reverse proxy that, after a few tweaks, can do the job well with minimal effort.

However, I wanted to get a little more complicated with the reverse proxy and it’s URL rewrite rules, so I decided dig in and figure out the URL rewrite logic a little better, which is the focus of this post. This is going to be GUI focused, but there are certainly better ways to do this via XML, but this was the easier approach that I took at the time.

(If you’re looking on how to set up IIS ARR, check this blog out, read the documentation from Microsoft on IIS ARR, or google it.)

Simple goals here:

  • Create two rules to reverse proxy the “cookies” and “cupcakes” traffic to the web server, both for HTTP and HTTPS
  • Create a catch-all rule to send everything else to giantmidgets.org

Setting Up HTTP Reverse Proxy Rule/Back-References Demonstrated

After setting up the server farms that the URL rewrite will direct traffic to, go to the root of the server and open up ‘URL Rewrite’, then I clicked ‘Add Rule(s)…’

Add Rule(s)...

I went ahead and selected ‘Inbound Blank Rule’. I want to keep this simple.

I named it something useful (I’m creating a rule for HTTP and HTTPS separately). Then I put in the pattern I needed:

Routing Rule for Match URL

This is a regex that looks for anything with “www.consentfactory.com/”, and for the URL path to either have “cupcakes” or “cookies”, then whatever string is available after that.

Next, I set up my conditions:

HTTP Conditions

The condition basically requires the FQDN to be present. Next comes the routing rule:

Route to Server farm rules
Something is wrong here.

Here I’m stating that the action type is to route to the server farm (basically the ARR component of this), then to send it as HTTP with the path taken from after the FQDN of the request. However, note the “Path” field; it says “/{R:0}”, but what the heck does that value come from? To see that value, click on ‘Test Pattern’ up at the top of the rule under ‘Match URL’:

Match URL Pattern Test

Input the URL that you’re trying to reverse proxy in the ‘Input data to test’ field, then click ‘Test’. This is actually how you can see those ‘{R:X}’ values will be derived. These are called ‘back references‘, and the format ‘{R:X}’ refers to matching rules from the ‘Match URL’ section. {R:0} will always contain the entire string being sent, which is why my routing action for routing to the web server is incorrect because if I were to leave it like that, anything after the FQDN would be sent, which currently would add “/www.consentfactory.com/cookies” to “www.consentfactory.com”, looking like “www.consentfactory.com/www.consentfactory.com/cookies”.

There are two ways to fix this.

One approach would be to just correct the routing action to use {R:1} and {R:2}, like this:

Routing rules with {R:1} and {R:2} concatenated

However, my preferred approach is to keep the regex more simple, which allows us to use the original routing action of {R:0}, so I configure my regex URL matching to look like this:

Cleaner Match URL with "www.consentfactory.com/" removed

Which tests out our back-reference values to look like this, thereby allowing the {R:0} rule:

{R:0} is cookies/mdm.pdf, {R:1} is cookies, and {R:2} is /mdm.pdf

Now that’s done, the HTTP rule is set up. The only thing left is to set up the HTTPS rule, and a catch all for anything that isn’t in a subdirectory.

HTTPS Reverse Proxy

The HTTPS rule is the same as the HTTP rule, except we adjust the condition to look for HTTPS being used like this:

HTTPS condition is set to 'on'

The routing rule will be configured like this:

Note the 'Scheme' field is set to HTTPS

Catch-All Redirect Rule

Finally, I’m creating a rule to just catch anything that isn’t a specific subdirectory of consentfactory.com. The rule will be the same as the HTTP rule, but the routing action will actually be a redirect somewhere else, like this:

Redirection to Another Site Using 'Redirect', the url of the site, and '301 Permanent' for redirect t ype

Hopefully this helps explain that process a bit. It helps me to see examples, so maybe this will help others.

(Edit (20171023): my HTTPS routing rule image was incorrect. It didn’t use “https://” for the ‘Scheme’, which is what we want it to route to.

Exchange 2016 Updates: Don’t forget to activate the components!

I’ve done a number of Exchange and Skype for Business server deployments over the last year, and recently I moved to Exchange 2016 versus 2013 just to get the deployments up and running on the latest. However, after performing my upgrade to Exchange 2016 (per these instructions), my EWS connections between Skype for Business and Exchange were not working correctly. Of course, Exchange isn’t fully running for anyone, I’m still testing things out, so not a big deal, but still. What the hell is going on?

In S4B, when I run Test-CSExStorageConnectivity, I’m getting “Test-CsExStorageConnectivity : ExCreateItem exchange operation failed, code=50043”.

testcsexstorageconnectivityerror

The standard response, and search result in Google, for a 50043 error is to check and make sure that your “ExchangeAutodiscoverUrl” property after running Get-CSOAuthConfiguration is configured for the Exchange server’s autodiscover metadata json URL (“https://<exchangeAutodiscover>/autodiscover/metadata/json/1”). But what happens if you’ve already checked that? The URL is correct and you’re good to go, so what changed?

Wait, didn’t I say I upgraded to the latest Exchange 2016 CU (CU3)? Did I completely follow the instructions?

Hmm..let’s check the Exchange server components (something new, AFAIK, to Exchange 2016):

Well. Guess I didn’t the follow instructions at the end that states you to have run the following:

Set-ServerComponentState EX2016SRV1 –Component ServerWideOffline –State Active –Requester Maintenance

followtheinstructions

I’m going to start tagging moments like this as ‘ya dummy’ moments.

Now, let’s check the component status:

get-servercomponentsactive

And then running Test-CSExStorageConnectivty works, and all is well.

So I guess one thing to look at if you’re getting a 50043 error and your have the Metadata URL correct is to verify that EWS is running on your Exchange box.