Juniper EX3400: How to Recover from PoE Firmware Upgrade Failure

Updated 20200117. See below.
Updated 20200308. I might have a path for upgrade success. Maybe.

Did you know Juniper EX switches have PoE firmware updates to be applied?

Chelsea Lately - Great question. I had no idea.

Well, I didn’t until about a year ago when I did an upgrade and was checking on PoE power. Looking at the controller info from show poe controllerI noticed the following:

Juniper poe firmware available

Huh. Ok. Well, I’ve got a eight unit stack here, and the Juniper EX software upgrade is usually pretty solid, so let’s upgrade it — and it goes off without a hitch.

Fast forward nine months later, and I’m running into strange issues with PoE and Mercury door controllers, particularly model ‘MRE62E’. Basically the Juniper switches won’t provide power to this model, but the older MRE52’s had no problem. Checking out the firmware version using show chassis firmware detailI noticed that the switch had the older 1.x firmware and not the new 2.x.

PoE firmware 1.6.1.21.1

 

Alrighty then — let’s upgrade this stack. I upgrade the software using the latest JTAC recommended version (staying in 15.x), then upgrade the PoE firmware — no problem. Door controller is now getting power, I see a MAC address. Everything is hunky dory.

Now let’s upgrade this other stack.

No problems on EX software upgrade. Great. Now upgrade PoE firmware…

Ten minutes later, I get the following on the terminal:

Magic Thread Message

Of note, and the thing that made me panic, was that out of nine switches in the stack, only one came back online. Checking the firmware versions, I see the following:

Various PoE firmware versions, some missing, some 0.0.0.0.0, only one 2.x

Okay… F***. Well, let’s reboot the stack; perhaps a reboot is needed*. After reboot, I get the following:

PoE Device Fail on FPC 8. All but FPC 2 are missing.WTF.

Guy shaking head mouthing WTF

In the past when I’ve done a PoE firmware upgrade (between now and when I first learned about it), I had no recourse but to RMA the switch. Well in this case, I don’t have eight spare switches to fill this temporarily while I wait for an RMA! WTF am I going to do?!

Solving the PoE Firmware Upgrade Failure

If you’re in the same situation as I was in, take a deep breath — you’re not dead in the water.

There are two three scenarios for a PoE firmware upgrade failure that I’ve encountered, and I have a solution for both:

  • PoE Firmware Failure #1 – After firmware upgrade, you see a mixed result of firmware versions, some being 0.0.0.0.0, some being correct (2.1.1.19.3**), and some missing/blank (see picture above showing mixed/missing versions)
  • PoE Firmware Failure #2 – Perhaps you did as I did and rebooted and the PoE controller shows one with the message DEVICE_FAILED (see above)
  • PoE Firmware Failure #3 – #2 option doesn’t work and nothing you do is getting the PoE controller to upgrade. You may also have the process hang during the download, or if the controller is still at DEVICE_FAILED and you try to upgrade, you get a message Upgrade in progress, even after a reboot.

In all these solutions, here are some tips/info about the Poe upgrade procedure until Juniper fixes the process for upgrading them all at once:

  • Upgrade one at a time.

Solution for PoE Firmware Failure #1

If you encounter this failure, DON’T REBOOT THE STACK. You’ll make your life harder if you do.

Next, Juniper TAC (finally) has a solution — and it requires remote/on-site hands. If you’re going on-site or working with someone remotely, get yourself a cup of coffee (or beverage of choice) and some podcasts lined-up, because you’re going to be doing this awhile (~10 minutes for each switch/fpc).

From their site, the solution is the following (with my own notes):

  1. Power cycle the affected FPC (re-seat the power cord). Do not perform a soft reboot.
  2. After the FPC joins the VC or the standalone device reboots, execute one of the following commands in operational mode:
    request system firmware upgrade poe fpc-slot <slot>

    or
    Note: This is the method I used
    request system firmware upgrade poe fpc-slot 1 file /usr/libdata/poe_latest.s19
    JTAC Note: You need to change the fpc-slot number accordingly. Also, it is recommended that you push the PoE code one by one instead of adding all members in the virtual-chassis setup. (Emphasis mine)
  3. After the above command is executed, the FPC should automatically reboot. If not, reboot from the Command Line Interface.
    Note: Be patient and wait. No, seriously…wait. It takes awhile. If you need to reboot, you’re rebooting the whole unit AFAIK:
    request system reboot
  4. After the FPC is online, check the PoE version with the show chassis firmware detail command. The PoE version should be the latest version (2.1.1.19.3) after the above steps are completed.
  5. If the version is correct, the PoE devices should work.
  6. Repeat the above steps to upgrade the PoE versions on other FPCs in the virtual-chassis setup.

The one thing to note that when it’s doing its upgrade is that you can see the progress with show poe controller, but at some point it will hang at 95%, then disappear, then come back, then the process will be complete — in other words…WAIT, unless you want to try out the solution for failure #2. 😆

Solution for PoE Firmware Failure #2

In this scenario, you rebooted the stack and something failed. The following is similar to solution #1, but the failed PoE controller requires to basically upgrade it twice. The steps:

  1. Execute the following command to reload the firmware on the FPC:
    request system firmware upgrade poe fpc-slot 1 file /usr/libdata/poe_latest.s19
    Note: You need to change the fpc-slot number accordingly.
  2. The PoE controller will disappear when you run show poe controller, then come back and start upgrading like this:
    PoE firmware upgrading
  3. After the firmware upgrade completes, the firmware will likely be incorrect (it always was for me). Power cycle the affected FPC (re-seat the power cord). Do not perform a soft reboot.
  4. After the FPC joins the VC or the standalone device reboots, execute one of the following commands in operational mode:
    request system firmware upgrade poe fpc-slot 1 file /usr/libdata/poe_latest.s19
    JTAC Note: You need to change the fpc-slot number accordingly. Also, it is recommended that you push the PoE code one by one instead of adding all members in the virtual-chassis setup. (Emphasis mine)
  5. After the above command is executed, the FPC should automatically reboot. If not, reboot from the Command Line Interface.
    Note: Be patient and wait. No, seriously…wait. It takes awhile. If you need to reboot, you’re rebooting the whole unit AFAIK: request system reboot
  6. After the FPC is online, check the PoE version with the show chassis firmware detail command. The PoE version should be the latest version (2.1.1.19.3) after the above steps are completed.
  7. If the version is correct, the PoE devices should look like this:
    Successful PoE firmware upgrade
  8. Repeat the above steps to upgrade the PoE versions on other FPCs in the virtual-chassis setup.

Just like solution #1, one thing to note is that when it’s doing its upgrade you can see the progress with show poe controller, but at some point it will hang at 95%, then disappear, then come back, then the process will be complete — in other words…WAIT! You don’t really want to re-apply this whole process, do you?

Solution for PoE Firmware Failure #3 (Update 20200117)

I recently had some more issues, and solution #2 just wasn’t doing the trick, so I offer solution #3, which I’ve had success with but there’s a caveat/rabbit hole that may come of it. This is the nuke-from-orbit approach on the switch if you want to avoid doing an RMA (or if you have no choice).

The gist of it: disconnect the switch from the VC (if connected), perform an OAM recovery, zeroize and reboot the switch, then perform the firmware upgrade.

From my experience, there are a few different scenarios that you’ll encounter when you need to use this method:

  • During the firmware upgrade, the process just hangs/stalls. You’ll run show poe controller and at some point the download hangs/stalls like this:Terminal shows download hangs at 50%
  • You receive a DEVICE_FAIL for any reason and nothing is resolving it, like this:PoE Device Fail on FPC 8. All but FPC 2 are missing.
  • You’re switch is stuck at upgrading the firmware. No matter what you run, the switch displays the following message: Upgrade in progress. In this scenario, the switch just thinks it’s still in the process of upgrading, but no matter how long you wait (or if you can’t wait some indefinite period of time for it to upgrade), the switch won’t upgrade the firmware.

What we need to do at this point is just get the switch to fresh state so that we can upgrade the PoE controller; and believe it or not, this is actually one of the awesome things about Juniper equipment: when one component of the switch is hosed, the entire switch isn’t hosed and can still function normally. For instance, I have had a switch have a failed PoE controller, but the switch still operated like a non-PoE switch without issue; i.e., Juniper allows for components to be recoverable.

Here’s the solution I came up with:

  • Step 1: Zeroize the switch: request system zeroize
    In this step, we’re just starting fresh and clearing out the configuration, which takes about 10 minutes and then reboots. If the switch still thinks there’s an upgrade in progress for the PoE controller, we’re clearing it out. It’s possible that this may fail due to storage issues. If that’s the case go to the next step, otherwise skip to bullet #3.
  • If step 1 fails: Perform an OAM recovery: request system recover oam-volume
    This is an optional step, and I’ve had to do this when zeroize would fail. If step #1 happens, try this first. takes about 10 minutes as it copies the OAM partition then compresses it for the Junos volume.
    Caveat: EX3400s, even in 18.2 land, still have storage issues sometimes. I have one switch that couldn’t recover from oam-volume, and I’m not sure why. I’ll update this once I have a solution.
  • After the switch reboots, the controller will still come up as failed when you run show poe controller. Go ahead and run the upgrade again:
    request system firmware upgrade poe fpc-slot 1 file /usr/libdata/poe_latest.s19
    It should behave like this after running the command:PoE upgrade process for Juniper
  • The switch should behave normally at this point, upgrading normally. If it doesn’t then you’ll likely need to replace the switch (or live without PoE).

And reminder, just like solution #1 and #2, one thing to note is that when it’s doing its upgrade you can see the progress with show poe controller, but at some point it will hang at 95%, then disappear, then come back, then the process will be complete — in other words…WAIT! You don’t really want to re-apply this whole process, do you?

Final Thoughts

Here’s the kicker for me: I’ve had this work just fine for stacks and single switches alone, and fail on stacks and single switches alone — I can’t find the common denominator here. Perhaps there’s a hardware build that has this more than others, but I can’t figure it out. The official documentation doesn’t hint on a best practice for this (other than maintenance hours), so I’m uncertain on the best approach.

(Update) Juniper does have an official bug report for this, and is apparently fixed in 15.1X53-D592, but I had the issue on 18.2R3, so I’m not convinced it isn’t resolved yet.

Here’s some ideas I have to change my PoE firmware upgrade procedure (unsure if this will help):

  • Turning off PoE on all interfaces
  • Upgrading one at a time.
  • Trying an earlier version of the JTAC software, the going to the latest recommended. Example: I had no problems with 15.1X53-D59.4 or 15.1X53-D590, but the sample size for determining that is small (only two stacks attempted).
  • Update: I can’t find any rhyme or reason, TBH. I’ve had it fail multiple ways, so not sure the above will help.
  • Update 2: I have had some success with the following (but I don’t feel that confident about it yet):
    • Use the 18.2 branch
    • Upgrade one at a time
    • Waiting for a period of time after a software upgrade and reboot. Don’t get upgrade-happy. Give the hardware some time to get back up and going.
    • Cross your fingers. And legs. On a full moon.
  • Update 2: If you have a controller showing DEVICE FAIL, I’ve had success fixing it just by running:
    request system firmware upgrade poe fpc-slot 1 file /usr/libdata/poe_latest.s19 (change fpc-slot # accordingly)

Time will tell.

Hope this helps! If it doesn’t I’d love to know the different experiences others have. Please share if you’ve had success or failures with any of this!

* I swear I saw a message that a reboot is required, but I can’t confirm this (I didn’t screencap it)

** There is a version 3.4.8.0.26, but that’s on the 18.x software version line, and it requires a whole different set of upgrade procedures. This is outside the scope of this post.

ztpgenerator: A ZTP Python Script for Juniper Devices (Maybe more?)

This is a script I’ve been working on for simplifying the ZTP process for Juniper switches.

https://github.com/consentfactory/ztpgenerator

What does it do?

  • Updates ISC-DHCP for ZTP devices (creates DHCP reservations)
  • Creates Juniper configurations based on Jinja2 templates
  • Can also create virtual chassis configurations if desired

There’s some pre-work that needs to be done for set up, but it’s fairly simple to deploy and doesn’t require a lot.

All of the documentation is there on Github.

Hoping this helps others out there.

Juniper Lessons Learned Compilation

My experience with Juniper Networks and JunOS was never too deep, especially since I was trained in the Cisco world. Working for a VAR that sold Juniper gear and offered system services, I was a system engineer that knew enough about Juniper gear to get servers connected and off to the races. Fast-forward a few years, and being bored with system engineering/administration at my organization, an opportunity came up because of my Juniper experience to join the network engineering team and I grabbed onto it.JUNOSThis post is just a quick compilation of things I’ve learned about JunOS that I thought I’d share, be it good, bad, or maybe even ugly. Some of these things you can learn from the ‘Day One’ books, but I feel like some things just aren’t that clear for whatever reason.

Showing Juniper Configuration As Commands

Coming from the Cisco IOS command line world, I was always a little annoyed with the configuration of Juniper because it was in a dictionary/JSON type of format. I mean, it’s great for reading and breaking down configurations into a structured data format, but it always felt a little inefficient on display space, and if you need to make a quick change to copy-edit-paste the configuration, it’s a little bit of a pain.

Then I learned about the ‘display’ command, which opened up a few doors.

Typical Juniper configurations look something like this:

interfaces {
    ge-0/0/0 {
        unit 0 {
            family ethernet-switching {
                interface-mode access;
                vlan {
                    members 60;
                }
            }
        }
    }
}

However, you can get your configuration as commands with by piping your ‘show config’ command to ‘display set’, like this:

show configuration | display set

Which gives you the above configuration like this:

set interfaces ge-0/0/0 unit 0 family ethernet-switching interface-mode access
set interfaces ge-0/0/0 unit 0 family ethernet-switching vlan members 60

You can then show all the interfaces, make a quick copy-edit-paste change, and be done. Personally, I like this better then the ‘load merge|replace|set terminal” method, but that’s just me.

Even cooler about this is that when you do this, you can pipe your ‘show config’ command to ‘display set’ and then to your ‘match “0/0/0″‘ statement to find the configuration you’re looking for and identify the stanza it’s in. Something like this:

show configuration | display set | match "0/0/0"

Which, in the above example, which show you all configurations that include “0/0/0” in it, like this:

set interfaces ge-0/0/0 unit 0 family ethernet-switching interface-mode access
set interfaces ge-0/0/0 unit 0 family ethernet-switching vlan members 60
set switch-options voip interface ge-0/0/0.0 vlan PHONES
set switch-options voip interface ge-0/0/0.0 forwarding-class expedited-forwarding

The other thing about the display command is that it also has a bunch of cool output formats that I think are interesting to look at (‘display detail’ sure is interesting, and ‘json’ or ‘xml’ sure is helpful).

Juniper Wildcard Ranges for Configurations

When you’re in edit mode and you’ve just come from the Cisco world, you’re probably wondering where the heck the ‘interface range’ command is. If you’re like me, you started using ‘interface-range’ statements and just assumed JunOS is just quirky and resigned to not having range commands.

Well, you’re in luck if you’re reading this because JunOS does have the same feature and the command is ‘wildcard range’.

In Cisco, you may be used to doing something like this:

int range g1/0/1-2,g1/0/4,g1/0/6

With Juniper, it’s like this:

wildcard range set interfaces ge-0/0/[1-2,4,6]

Commit Sync By Default on Virtual Chassis’d Systems

This is a small one, but I thought it was worth sharing. One thing on any virtual chassis’d switches/routers that you want to remember to do is a ‘commit synchronize’ when you’re committing configurations. However, I’d prefer to not have to type ‘sync’ and just be done with it so I can ‘commit and-quit’ super quick. The solution is simple; run/add this to your config:

set system commit synchronize

Do one last ‘commit sync’ and then this is done automatically.

Juniper Has Rapid-PVST Interoperability

One of the issues my org has dealt with at our Juniper-deployed networks is that our Cisco routers completely block the VLANs for downstream ports when a loop is detected. Almost always it’s that a user has plugged their phone into two ports on the network, and since spanning tree was only set to the default configuration (RSTP) on our Juniper gear, these outages were somewhat frequent.

After doing some research and testing, I found out that Juniper has developed a protocol for rapid-pvst interoperability called ‘VSTP’. I’m still developing and ironing-out the details about our configuration, but here’s the base configuration we’ve done.

protocols {
    vstp {
        interface all {
            bpdu-timeout-action {
                block;
            }
        }
        vlan all;
    }
}

Of note, I’ve deleted RSTP because I don’t know the ramifications of keeping it in there (which Juniper notes is supported).

Also of note, if you choose to configure each interface individually instead of ‘interface all’, VSTP has an interface configuration limit of just over 300 ports (I found this out the hard way while trying to configure a 8-count virtual chassis stack). Juniper notes this somewhere in it’s documentation, but they don’t give the exact number and they just vaguely say there’s a limit because of processing load.

I plan on fleshing what I’ve learned with Juniper spanning tree in a future post.

No-More Using That Spacebar

Finally, here’s a super quick one. If you run a ‘show config’ and just need it to print out the configuration before a software update, pipe the command to ‘no-more’ and it will print out the entire configuration, much like setting your terminal length to 0. Command:

show configuration | no-more

Final Thoughts

Don’t really have much else to say other than I’ve come to appreciate JunOS a lot more than when I originally learned it. Overall, I think learning Juniper has made me a better engineer because I’ve had to think about network configuration and protocols in a broader sense, much like how learning Linux made me a better Windows administrator.

From my initial experience going down the Python networking development road, Juniper certainly is easier to work with than screen-scraping with Cisco. NAPALM is certainly my tool of choice as of right now.

SNMP with Juniper, however, leaves something to be desired. For example I added MIBs and configured some sensors in PRTG, but I get weird issues with interfaces such as getting their ‘logical’ interfaces in the form of something like ‘0/0/0.0’, but missing the interfaces that actually work in the form of ‘0/0/0’.

Lastly, what I’ve learned working between Cisco and Juniper is that I just don’t think Cisco has any special sauce with access switching anymore. As a result, what Cisco (and Juniper and others) are doing is selling you a platform instead of just hardware now. You have to buy into a giant platform narrative instead of just letting hardware compete with other hardware.