Palo Alto GlobalProtect Issue: Split Tunnel VPN with Skype for Business

There was a weird issue when I first joined my current job: I was told was that because of the way Palo Alto GlobalProtect (GP) and Microsoft Skype for Business (SfB) works (or maybe was configured?), I needed to log-in to SfB first, then connect to the GP VPN. The rationale was that SfB wouldn’t connect, or it would take a long time to connect, AND THEN even after a period of time, SfB would start behaving weird and it’s Exchange connectivity would drop, so SfB wouldn’t get voicemails, missed calls, etc. Just all out weirdness going on. It’s 2020, so maybe some of this true to form for the year, but probably not.

Palo Alto GlobalProtect Skype for Business

Click here if you want to skip the context and go to the solution.

Uh…Skype for Business?

Full stop. I’m sure you’re asking yourself right now, “Why not just migrate to Microsoft Teams? Get rid of that whole on-premises stuff.”

Let me answer that in a meme:

Sean Beam Boromir Meme: One does not simply migrate from Skype for Business to Teams

Skype for Business is one of the integrative technologies that spans lots of technology stacks that isn’t exactly easy to just jump ship from, and Teams as a VoIP replacement is arguably not there yet.

Also, have you seen the UI comparisons? Going from a sleek floating window for calling, IM, and conferencing with SfB to the giant-lets-pack-lots-of-services-into-one-large-window that is Teams is kind of a hard sell on the user training side of things. Maybe I’m biased. Maybe, but I digress.

The Challenge

Ok, back to the GlobalProtect and Skype for Business issues.

I was admittedly puzzled that the solution — to instruct users to sign-in to SfB before they sign-in to the VPN — was the best solution; it doesn’t seem right from a user experience perspective, and then when you toss-in the sudden weird issues with Exchange connectivity, none of this seemed right, and I doubt that’s the ideal experience. So I brought this up and the team basically said, “we just haven’t had time to troubleshoot it, but if you want to figure it out, go for it.”

You know what that sounds like? An adventure. An itch to itch. Something to solve. A challenge! There could be only one response:

Challenge Accepted Meme

Why Split Tunnel Skype for Business?

Something you might be asking is “Why configure split tunnel in the first place? Isn’t split tunneling a headache to manage?”

Split tunneling can definitely be a PITA, but like a million IT questions out there, the answer ultimately to this is, “it depends.” From my experience, split tunneling becomes difficult when you have a lot of split tunneling to manage, but if you have one or two services, it’s not that bad.

For Skype for Business, it’s one of those technologies that is sensitive to jitter, latency, and packet loss. Why? It’s because it’s voice traffic, and just like voice traffic on the inside of the network, where there’s jitter, latency, and/or packet loss, users on opposite ends of calls/conferences will experience this as delayed audio or parts of the conversation will just break up and it leads to an overall poor experience.

When you configure split tunneling, particularly for technologies like SfB, you avoid the dual encryption scenarios and you allow the technology to use its own optimized methods for connecting voice and application traffic by letting the software connect to services over the internet directly versus through a tunnel.

Baseline

That said, what’s the baseline here? How is GlobalProtect configured with split tunneling and what issues are there?

For GlobalProtect, the split tunnel configuration was configured pretty much like this documentation from Palo Alto (using just the application split tunnel, nothing else). It looked like this:

GlobalProtect Split Tunnel Domain and Application Tab Showing Excluded lync.exe

Here are the issues that were encountered in this setup:

  1. Connectivity issues if connecting SfB after GP VPN is connected
  2. Exchange connectivity in the SfB client drops after a duration of time, even if connection is established before VPN connection
  3. Call transfers working inconsistently
  4. Application sharing working inconsistently
  5. Conference meetings working inconsistently

Issues 3-5 really came later because they were hard to pinpoint due to their inconsistency, but issues 1 and 2 brought some fast wins.

Let’s get to some solutions.

Solutions

Solution for 1 and 2: DNS. It’s always dns.

It’s kind of a joke, but DNS really does cause a lot of problems, and in a split tunnel configuration when you’ve split-tunnel the traffic by application, the application is still going to resolve addresses by the servers you specify in the GlobalProtect configuration. So if you haven’t changed DNS records, the application will split tunnel, but it will still try to connect to internal resources because that’s the records it has.

I don’t have a PCAP screenshot for this, but if you pull up Wireshark and look at the PCAPs for your network interface (non-GP interface), you’ll see attempts to get to SfB internal IP addresses that aren’t (typically) on your network, and thus services fail.

The solution is simple: for your VPN clients, serve the external IP addresses for A records being queried. I solved this by setting up dedicated DNS servers for VPN clients, then just creating the zones and root records for each FQDN. I did this for all the Skype for Business external IPs (edge and reverse proxy) and the external Exchange records.

After doing this, problems 1 and 2 went away because hostnames were being resolved correctly.

Solutions for 3 through 5: Firewall rules and IP Split Tunneling

Problems 3 through 5 were frustrating like no other because I couldn’t really narrow the problems down exactly. Some people had no problems with call transfers, application sharing, or conferencing, but then sometimes they would. So the thing to do is dig into the logs, and when I did I encountered a lot of this:

Or

Or even better:

These all pointed to firewall issues, and even the ICEWARN messages noted something wrong with STUN, TURN, NAT, etc.

So I did some digging and found that firewall rules needed to be in-place to prevent VPN clients and internal SfB servers from communicating with one another. So I added some PAN policies, and things got better, but not perfect. Also, I added the external SfB IP addresses to the split tunnel in Network > GlobalProtect > Gateway > Agent > Client Settings > Client-Config > Split Tunnel > Exclude (which basically just adds static routes in the Windows routing table to send traffic for those IPs out the non-tunneled interface). Still the occasional error creeping up, and I could even witness it, but still can’t quite nail the problem.

Finally, I had a thought: why not get rid of the application process split tunnel? I mean, if I have DNS addreses configured, and IP split tunneling working, why is the application process split tunnel needed? Removed that from the setting and bam — all the problems went away. Like magic.

Shia Labeouf Magic

Here’s what the final outcome should look like for a GlobalProtect-Skype for Business-Exchange environment for split tunnel.

Palo Alto GlobalProtect Skype For Business Split Tunnel

Of course, I fully admit this is really more of a legacy design with everything on-premises, but you could just as easily send the Exchange traffic to Office 365 in the split tunnel.

Thoughts on GlobalProtect Application Process Split Tunnel

While I had configured the traditional methods of doing split tunnel configurations (IP split tunnel and DNS servers), I’m still a little puzzled to the fact that the Palo Alto GlobalProtect application process split tunnel seemed to cause issues. My guess is that something in the way the Skype for Business client is designed prevents the process from being completely split tunneled, and I think this has to do with the way Skype for Business operates with Windows.

If you get really bored on a Friday night and have nothing better to do in life, check out some of these deep dives on candidate selection and other stuff related to media flow. What you’ll see in the SfB client log files is something like this:

Skype for Business candidate selection
Credit

Basically, SfB gets a selection of candidates that it uses from the interfaces on the computer. In a GP split tunnel set up (with application process split tunnel configured), you’ll see ALL IP addresses (including the tunnel address) listed as candidates, and my suspicion is that Skype for Business still tries to use a tunnel interface, and sometimes it gets around the Palo Alto GlobalProtect application exclusion, and then that causes calls, application sharing, and even conferences to fail. I can’t show my own logs seeing this for security reasons, so you’ll have to trust me on that one.

Solution (tl;dr)

Here’s the quick solution for GlobalProtect and Skype for Business Split Tunnel

  1. Create separate DNS servers for VPN clients and create the specific Skype for Business DNS records needed, and configure them for external IP addresses so that Skype for Business resolves external addresses and configures itself appropriately.
  2. Create firewall rules that block traffic to/from the VPN network to internal Skype for Business and Exchange IP addresses. We want the SfB client to determine it can’t go inside for traffic.
  3. In Panorama or PANOS, under Network > GlobalProtect > Gateway > Agent > Client Settings > Client-Config > Split Tunnel > Exclude, configure all external SfB addresses so that the GP client doesn’t send traffic for those IPs through the tunnel. Alternatively, under Network > GlobalProtect > Gateway > Agent > Client Settings > Client-Config > Split Tunnel > Domain and Application > Exclude Domain, you could add the SfB external FQDNs (that said, IIRC, the stuff under ‘Domain and Application’ requires the GlobalProtect license…technically).

Links, Further Reading, Credit

pyribbon: Python Module for Sonus/Ribbon SBC REST API

Part of my recent stumbling into supporting VoIP systems again is supporting and maintaining Ribbon session border controllers (SBCs). SBCs are pretty much gateways/routers and even firewalls for VoIP traffic, which not only routes/limits traffic between disparate voice systems (like ISDN lines to SIP trunks), but can also manipulate the traffic such as adjusting the phone number format, media codec, changing SIP header information, yada yada — you get the point.

Diagram showing Ribbon SBC connecting to different systems

Pretty much all SBCs are configured via extensive web interfaces with lots of deep options, so if you have to manage these at scale, it can get a little time consuming — which is what my problem was. I have many SBCs scattered across our state, and I generally loathe having to make mass changes via any GUI, so does Ribbon have some method of configuring SBCs at scale? Yes, through its Ribbon API.

However, I’m not going to sit down and use Postman to configure these (well, maybe initially to test), so I need a scripting framework. I could use PowerShell or curl with bash scripting to do this, but most of my work in networking is done via Python, and thus because I couldn’t find any Python module built for this already (and it seemed like fun to create), I made this simple Python module and it’s on Github.

pyribbon: Python Module for Sonus/Ribbon SBC REST API

I wrote this with a VoIP engineer in-mind, and the basic idea is you import the module in a Python script, then interact with source of data to make changes or perform actions. For example, I’ve used it to create a ton of phone number changes with the routing and transformation tables, and I’ve used it to perform configuration backups across all systems.

To do most of this though, you need to get familiar with API and how object resources are designed. Personally I started with Postman and got a sense for how the API works, then I built the class to handle the API. Initially I started small, but then realized it was rather simple to add additional methods to the pyribbon class to perform different tasks. One could easily build additional methods such as reboot, backup, etc., but I just didn’t see the need for that — yet.

One potential cool use is to perhaps use the module to query for call statistics and send that to a database, then maybe graph it out with Grafana; or, if your NMS allows for script pollers, perhaps use that versus digging through SNMP OIDs.

Final thoughts: I know this is probably high on the esoteric use-cases. I mean, how many VoIP engineers are there in the world, then narrow that down to Sonus/Ribbon VoIP engineers, then dwindle that down to those that interact with the API, and then dwindle that down even more that do it via Python. Pretty small, but if you like Python for this kind of work, this should help.

Also, I thought about putting this in PyPi, but I can’t imagine this getting much use or further development on my end (it does what I need it to do well), and thus I’m shying away from that.

Let me know if you find this useful or have questions.

Skype for Business: Cleanly Shutting Down Server (Invoke-CSComputerFailover and More)

It’s been over three years since I managed and deployed a Skype for Business/Lync system, and at my new job I was hired on as a be a network engineer, but I noted in a past life I received a MCSE in Skype for Business, so I could definitely be the backup for the primary SME (subject matter expert) in SfB. However, in a strange twist, the primary SME left — and you know what, there’s just not a lot of Skype for Business/Lync engineers out there, especially in a small labor market, so I stepped up to help the organization because I was the most qualified by a long-shot.

So I’m back doing some Skype for Business again.

Captain America Here We Go Again

I actually have always liked voice routing, so it’s fun to be doing some of this stuff again (although SfB is a pretty intense, integrative technology, so it’s not all Pop-Tarts and unicorns).

However, I was trying to get reacquainted with some commands for cleanly shutting down a Skype For Business server, and I just didn’t find a lot of good information out there, so I thought I might write something up “real quick”. This is somewhat basic info for SfB enterprise deployments, but it might be helpful.

Enough of the pretext, let’s get to the first command…

Get-CSWindowsService

I’m starting with this command because it’s the most basic command you should already know, but plays a role for later in this post. Typically you’ll use the command to see how many connections are being used by a service, if a service is running, etc.

The command is straightforward: Get all or one of the SfB (or Communication Server, which is where the acronym CS comes from) Windows Services on the machine.

Stop-CSWindowsService

The command Stop-CSWindowsService is the most basic command you’ll use to stop all or one of the services on a SfB server. The command will execute stopping of services in the proper order of stopping SfB services, including any dependent services.

Typically you’ll be using this command on a ‘Standard’ deployment SfB server, or any non-front end server in an ‘Enterprise’ deployment such as mediation/edge servers (more info: standard vs enterprise deployment). However, there are probably rare situations in which you’ll stop just one service, so you’ll likely be stopping all of them.

If you’re doing this outside a maintenance window for some reason, I prefer to do the following: Stop-CSWindowsService -Graceful. The -Graceful is important here, because what it does is it puts the services into a paused state, preventing any new connections from happening and waiting on existing connections to disconnect. On mediation servers, whenever I’ve need to stop a server in a mediation pool, this is my preferred method so that I wait for the calls to end. However, it won’t stop until the call is done, so you might be waiting awhile.

Invoke-CSComputerFailover

For whatever reason, this command scared me at first, largely because of my ignorance of what it does. The official documentation on the command I don’t think does it justice, so here’s my attempt at it.

The command Invoke-CSComputerFailover will basically perform a Stop-CSWindowsService -Graceful operation, but it acts slightly different. The differences:

  1. It’s used on front end servers in an enterprise deployment (or at least I’ve never seen it documented or used on other SfB server pools). The command causes the front end server to be in a ‘failover’ state, making it unavailable to the rest of the front end pool.
  2. The command migrates data, routing groups, and more to the other front end servers.
  3. The command has a wait time of 1 hour per service, after which if the connections haven’t disconnected, it will force a disconnect. This default can be changed with the -WaitTime parameter.
  4. This command will make the server unavailable in the front end pool. After a reboot, or if for some reason you run Start-CSWindowsService, the server won’t be available until you run Invoke-CSComputerFailBack.

After working with it and using it several times, it’s not as scary as I thought. Just run it on one machine at a time lest you have some Windows Fabric issue due to quorum loss (or something to that effect).

Invoke-CSComputerFailover Hanging or Taking Awhile

Sometimes when you’re failing over a front end server, you get stuck waiting for some services to stop like this:

Status screen waiting for Invoke-CSComputerFailover to Progress

If you look at Get-CSWindowsService, you might actually find something like this:

Get-CSWindowsService Seeing Services With Hanging Connections

If you note the red and blue arrows, the services are left open, likely from a conference that has ended already, but is being left open for whatever reason. To speed up Invoke-CSComputerFailover, just open a separate elevated terminal and stop the services like this:

Stopping the stalled services with Stop-CSWindowsService in separate window

After which, Invoke-CSComputerFailover will continue on as expected.

Invoke-CSComputerFailover progressing

I originally tried out the idea on my own, but the following blog entry also helped me and explains it from a different perspective.

Modding EX2200 PoE Switch with Quieter Fans

Since COVID-19 hit, I moved out to my garage so I could work from home, and in doing so I realized how loud my network rack really is. My rack is not the prettiest rack, and to be frank I’ve never really been a fan of the sleek LED lit home lab. My aesthetic is more like something from a Tatooine droid shop: not disorganized, per se, but certainly not pretty.

Watto's Shop

I’ve really been wanting a Juniper switch for home, so I went shopping on ebay to see if I could find anything, and surprisingly, right now you can find EX2200 switches for really cheap — like, $75 shipped. I wondered then if I could modify the fans on the switch, and lo and behold, you can! So I bought a Juniper PoE switch, but the next part was purchasing the fans.

I came across the video below from Christian Scholz showing mostly how to replace the fans. It’s an excellent video, and shows you how to replace the two back chassis fans.

However, it doesn’t show you how to replace the fans for the power supply, largely because the power supply on non-PoE EX2200s doesn’t have fans (see YouTube preview above).

For the fans on a PoE EX2200, there’s actually just one fan, and if you want to replace it with a Noctua fan, it’s going to work, but it’s going to get a bit warm. One solution is to add a fan on the exhaust port that draws the air out, but it’s going to require a little modification for the chassis cover.

I’m going to briefly show what I did, but I didn’t think to take pictures during the process, but here’s a shot of what it looks like overall:

Uncovered EX2200 switch with fans added.

Backstory: I actually screwed up and got the wrong Noctua fan for the power supply fan. I thought it was a PWM for some reason, but it actually needed a FLX fan. However, I was able to still use the PWM fan. Here’s a breakdown of the above:

  • Blue – This is the power supply fan, a Noctua NF-A4x20 FLX fan.
  • Teal – This is a splice of the old cable adapter with the new using the Noctua omnijoin adapter set that comes with the fans (which is stupidly easy to use). Red and black wires matched up, and then I matched the yellow on the Noctua cable to the blue on the old fan plug.
  • Green – This is the Noctua NF-A4x20 PWM fan that mistakenly bought. The fan runs at 100% all the time, but not an issue. While a mistake, the fan came with…
  • Pink – Y-Adapter set that came with the PWM fan. I was able to plug this into the power plug port for the right FLX rear chassis fan (red), then plug that fan into the main y-adapter and then the PWM into the other port.
  • Red – These are FLX fans.
  • Purple – This was a port labeled “J9” that tried to use for power, but didn’t work (hence why I used the y-adapter).

Some notes I learned. For one, you’ll need to lift the power supply up in order to unscrew the screws:

EX2200 power supply lifted out

I also had to drill a fairly large hole in the chassis cover so that I could fit the the cable in (see below). Had I not screwed up and spliced the PWM cable, the hole could have been smaller and I probably could have fit the y-adapter within the chassis.

EX2200 Fan backside

I had to also modify the chassis for the PoE exhaust port by flattening the metal screen (so the attached fan didn’t rub against it) and I had to drill two additional holes so that the silicon screws could hold the fan down.

Finally, here’s a look from the CLI side:

EX2200 show chassis environment - Fans spinning normally

Final Thoughts

  • Overall, it’s working really well. It’s almost completely quiet — my EVE-NG server is actually louder than this thing.
  • It’s not pretty, but the ugliness is hidden in the back.
  • I DO NOT RECOMMEND YOU DO THIS TO PRODUCTION EQUIPMENT. I’m doing this to my home stuff, so I can live with it and the consequences, but I would never do this at work (too much work, TBH; better to buy an EX2300-C).
  • Screwing the chassis cover back on will indeed be a little tighter, but with a little force you can get all the screws on.
  • I don’t recommend using a drill to unscrew these (you can strip the screws pretty easily), but if you do, have a firm downward motion and screw/unscrew in bursts.
  • I used to loathe the EX2200s for how slow they are, but on the 12.3R12.4 software, they seem to work well.