Skip to main content

Juniper EX3400: How to Recover from PoE Firmware Upgrade Failure

Updated 20200117. See below.
Updated 20200308. I might have a path for upgrade success. Maybe.

Did you know Juniper EX switches have PoE firmware updates to be applied?

Chelsea Lately - Great question. I had no idea.

Well, I didn’t until about a year ago when I did an upgrade and was checking on PoE power. Looking at the controller info from show poe controllerI noticed the following:

Juniper poe firmware available

Huh. Ok. Well, I’ve got a eight unit stack here, and the Juniper EX software upgrade is usually pretty solid, so let’s upgrade it — and it goes off without a hitch.

Fast forward nine months later, and I’m running into strange issues with PoE and Mercury door controllers, particularly model ‘MRE62E’. Basically the Juniper switches won’t provide power to this model, but the older MRE52’s had no problem. Checking out the firmware version using show chassis firmware detailI noticed that the switch had the older 1.x firmware and not the new 2.x.

PoE firmware 1.6.1.21.1

 

Alrighty then — let’s upgrade this stack. I upgrade the software using the latest JTAC recommended version (staying in 15.x), then upgrade the PoE firmware — no problem. Door controller is now getting power, I see a MAC address. Everything is hunky dory.

Now let’s upgrade this other stack.

No problems on EX software upgrade. Great. Now upgrade PoE firmware…

Ten minutes later, I get the following on the terminal:

Magic Thread Message

Of note, and the thing that made me panic, was that out of nine switches in the stack, only one came back online. Checking the firmware versions, I see the following:

Various PoE firmware versions, some missing, some 0.0.0.0.0, only one 2.x

Okay… F***. Well, let’s reboot the stack; perhaps a reboot is needed*. After reboot, I get the following:

PoE Device Fail on FPC 8. All but FPC 2 are missing.WTF.

Guy shaking head mouthing WTF

In the past when I’ve done a PoE firmware upgrade (between now and when I first learned about it), I had no recourse but to RMA the switch. Well in this case, I don’t have eight spare switches to fill this temporarily while I wait for an RMA! WTF am I going to do?!

Solving the PoE Firmware Upgrade Failure

If you’re in the same situation as I was in, take a deep breath — you’re not dead in the water.

There are two three scenarios for a PoE firmware upgrade failure that I’ve encountered, and I have a solution for both:

  • PoE Firmware Failure #1 – After firmware upgrade, you see a mixed result of firmware versions, some being 0.0.0.0.0, some being correct (2.1.1.19.3**), and some missing/blank (see picture above showing mixed/missing versions)
  • PoE Firmware Failure #2 – Perhaps you did as I did and rebooted and the PoE controller shows one with the message DEVICE_FAILED (see above)
  • PoE Firmware Failure #3 – #2 option doesn’t work and nothing you do is getting the PoE controller to upgrade. You may also have the process hang during the download, or if the controller is still at DEVICE_FAILED and you try to upgrade, you get a message Upgrade in progress, even after a reboot.

In all these solutions, here are some tips/info about the Poe upgrade procedure until Juniper fixes the process for upgrading them all at once:

  • Upgrade one at a time.

Solution for PoE Firmware Failure #1

If you encounter this failure, DON’T REBOOT THE STACK. You’ll make your life harder if you do.

Next, Juniper TAC (finally) has a solution — and it requires remote/on-site hands. If you’re going on-site or working with someone remotely, get yourself a cup of coffee (or beverage of choice) and some podcasts lined-up, because you’re going to be doing this awhile (~10 minutes for each switch/fpc).

From their site, the solution is the following (with my own notes):

  1. Power cycle the affected FPC (re-seat the power cord). Do not perform a soft reboot.
  2. After the FPC joins the VC or the standalone device reboots, execute one of the following commands in operational mode:
    request system firmware upgrade poe fpc-slot <slot>

    or
    Note: This is the method I used
    request system firmware upgrade poe fpc-slot 1 file /usr/libdata/poe_latest.s19
    JTAC Note: You need to change the fpc-slot number accordingly. Also, it is recommended that you push the PoE code one by one instead of adding all members in the virtual-chassis setup. (Emphasis mine)
  3. After the above command is executed, the FPC should automatically reboot. If not, reboot from the Command Line Interface.
    Note: Be patient and wait. No, seriously…wait. It takes awhile. If you need to reboot, you’re rebooting the whole unit AFAIK:
    request system reboot
  4. After the FPC is online, check the PoE version with the show chassis firmware detail command. The PoE version should be the latest version (2.1.1.19.3) after the above steps are completed.
  5. If the version is correct, the PoE devices should work.
  6. Repeat the above steps to upgrade the PoE versions on other FPCs in the virtual-chassis setup.

The one thing to note that when it’s doing its upgrade is that you can see the progress with show poe controller, but at some point it will hang at 95%, then disappear, then come back, then the process will be complete — in other words…WAIT, unless you want to try out the solution for failure #2. 😆

Solution for PoE Firmware Failure #2

In this scenario, you rebooted the stack and something failed. The following is similar to solution #1, but the failed PoE controller requires to basically upgrade it twice. The steps:

  1. Execute the following command to reload the firmware on the FPC:
    request system firmware upgrade poe fpc-slot 1 file /usr/libdata/poe_latest.s19
    Note: You need to change the fpc-slot number accordingly.
  2. The PoE controller will disappear when you run show poe controller, then come back and start upgrading like this:
    PoE firmware upgrading
  3. After the firmware upgrade completes, the firmware will likely be incorrect (it always was for me). Power cycle the affected FPC (re-seat the power cord). Do not perform a soft reboot.
  4. After the FPC joins the VC or the standalone device reboots, execute one of the following commands in operational mode:
    request system firmware upgrade poe fpc-slot 1 file /usr/libdata/poe_latest.s19
    JTAC Note: You need to change the fpc-slot number accordingly. Also, it is recommended that you push the PoE code one by one instead of adding all members in the virtual-chassis setup. (Emphasis mine)
  5. After the above command is executed, the FPC should automatically reboot. If not, reboot from the Command Line Interface.
    Note: Be patient and wait. No, seriously…wait. It takes awhile. If you need to reboot, you’re rebooting the whole unit AFAIK: request system reboot
  6. After the FPC is online, check the PoE version with the show chassis firmware detail command. The PoE version should be the latest version (2.1.1.19.3) after the above steps are completed.
  7. If the version is correct, the PoE devices should look like this:
    Successful PoE firmware upgrade
  8. Repeat the above steps to upgrade the PoE versions on other FPCs in the virtual-chassis setup.

Just like solution #1, one thing to note is that when it’s doing its upgrade you can see the progress with show poe controller, but at some point it will hang at 95%, then disappear, then come back, then the process will be complete — in other words…WAIT! You don’t really want to re-apply this whole process, do you?

Solution for PoE Firmware Failure #3 (Update 20200117)

I recently had some more issues, and solution #2 just wasn’t doing the trick, so I offer solution #3, which I’ve had success with but there’s a caveat/rabbit hole that may come of it. This is the nuke-from-orbit approach on the switch if you want to avoid doing an RMA (or if you have no choice).

The gist of it: disconnect the switch from the VC (if connected), perform an OAM recovery, zeroize and reboot the switch, then perform the firmware upgrade.

From my experience, there are a few different scenarios that you’ll encounter when you need to use this method:

  • During the firmware upgrade, the process just hangs/stalls. You’ll run show poe controller and at some point the download hangs/stalls like this:Terminal shows download hangs at 50%
  • You receive a DEVICE_FAIL for any reason and nothing is resolving it, like this:PoE Device Fail on FPC 8. All but FPC 2 are missing.
  • You’re switch is stuck at upgrading the firmware. No matter what you run, the switch displays the following message: Upgrade in progress. In this scenario, the switch just thinks it’s still in the process of upgrading, but no matter how long you wait (or if you can’t wait some indefinite period of time for it to upgrade), the switch won’t upgrade the firmware.

What we need to do at this point is just get the switch to fresh state so that we can upgrade the PoE controller; and believe it or not, this is actually one of the awesome things about Juniper equipment: when one component of the switch is hosed, the entire switch isn’t hosed and can still function normally. For instance, I have had a switch have a failed PoE controller, but the switch still operated like a non-PoE switch without issue; i.e., Juniper allows for components to be recoverable.

Here’s the solution I came up with:

  • Step 1: Zeroize the switch: request system zeroize
    In this step, we’re just starting fresh and clearing out the configuration, which takes about 10 minutes and then reboots. If the switch still thinks there’s an upgrade in progress for the PoE controller, we’re clearing it out. It’s possible that this may fail due to storage issues. If that’s the case go to the next step, otherwise skip to bullet #3.
  • If step 1 fails: Perform an OAM recovery: request system recover oam-volume
    This is an optional step, and I’ve had to do this when zeroize would fail. If step #1 happens, try this first. takes about 10 minutes as it copies the OAM partition then compresses it for the Junos volume.
    Caveat: EX3400s, even in 18.2 land, still have storage issues sometimes. I have one switch that couldn’t recover from oam-volume, and I’m not sure why. I’ll update this once I have a solution.
  • After the switch reboots, the controller will still come up as failed when you run show poe controller. Go ahead and run the upgrade again:
    request system firmware upgrade poe fpc-slot 1 file /usr/libdata/poe_latest.s19
    It should behave like this after running the command:PoE upgrade process for Juniper
  • The switch should behave normally at this point, upgrading normally. If it doesn’t then you’ll likely need to replace the switch (or live without PoE).

And reminder, just like solution #1 and #2, one thing to note is that when it’s doing its upgrade you can see the progress with show poe controller, but at some point it will hang at 95%, then disappear, then come back, then the process will be complete — in other words…WAIT! You don’t really want to re-apply this whole process, do you?

Final Thoughts

Here’s the kicker for me: I’ve had this work just fine for stacks and single switches alone, and fail on stacks and single switches alone — I can’t find the common denominator here. Perhaps there’s a hardware build that has this more than others, but I can’t figure it out. The official documentation doesn’t hint on a best practice for this (other than maintenance hours), so I’m uncertain on the best approach.

(Update) Juniper does have an official bug report for this, and is apparently fixed in 15.1X53-D592, but I had the issue on 18.2R3, so I’m not convinced it isn’t resolved yet.

Here’s some ideas I have to change my PoE firmware upgrade procedure (unsure if this will help):

  • Turning off PoE on all interfaces
  • Upgrading one at a time.
  • Trying an earlier version of the JTAC software, the going to the latest recommended. Example: I had no problems with 15.1X53-D59.4 or 15.1X53-D590, but the sample size for determining that is small (only two stacks attempted).
  • Update: I can’t find any rhyme or reason, TBH. I’ve had it fail multiple ways, so not sure the above will help.
  • Update 2: I have had some success with the following (but I don’t feel that confident about it yet):
    • Use the 18.2 branch
    • Upgrade one at a time
    • Waiting for a period of time after a software upgrade and reboot. Don’t get upgrade-happy. Give the hardware some time to get back up and going.
    • Cross your fingers. And legs. On a full moon.
  • Update 2: If you have a controller showing DEVICE FAIL, I’ve had success fixing it just by running:
    request system firmware upgrade poe fpc-slot 1 file /usr/libdata/poe_latest.s19 (change fpc-slot # accordingly)

Time will tell.

Hope this helps! If it doesn’t I’d love to know the different experiences others have. Please share if you’ve had success or failures with any of this!

* I swear I saw a message that a reboot is required, but I can’t confirm this (I didn’t screencap it)

** There is a version 3.4.8.0.26, but that’s on the 18.x software version line, and it requires a whole different set of upgrade procedures. This is outside the scope of this post.

21 thoughts to “Juniper EX3400: How to Recover from PoE Firmware Upgrade Failure”

  1. hey jimmy! Thanks for this article, i was quite happy to see someone else with the same problem and that actually wrote a detailed post about it!

    i’m having the same issue. 3 of 6 vc members are stuck in a DEVICE FAIL state. I’m running 18.2R3-S3.11. When i try to “request system firmware upgrade poe fpc-slot 1 file /usr/libdata/poe_latest.s19” there is just no “file” option i can add after the fpc-slot. it just doesn’t exist. I try with a normal account and root and i can’t add that parameter.

    show chassis firmware detail
    FPC 0
    PoE firmware Unknown
    Boot Firmware
    U-Boot U-Boot 2016.01-rc1 (Sep 01 2016 – 16:00:13 -0700) 1.3.0
    Boot Firmware
    loader FreeBSD/armv6 U-Boot loader 1.2
    FPC 1
    PoE firmware Unknown
    Boot Firmware
    U-Boot U-Boot 2016.01-rc1 (Sep 01 2016 – 16:00:13 -0700) 1.3.0
    Boot Firmware
    loader FreeBSD/armv6 U-Boot loader 1.2
    FPC 2
    PoE firmware Unknown
    Boot Firmware
    U-Boot U-Boot 2016.01-rc1 (Sep 01 2016 – 16:00:13 -0700) 1.3.0
    Boot Firmware
    loader FreeBSD/armv6 U-Boot loader 1.2
    FPC 3
    PoE firmware 2.1.1.19.3
    Boot Firmware
    U-Boot U-Boot 2016.01-rc1 (Sep 01 2016 – 16:00:13 -0700) 1.3.0
    Boot Firmware
    loader FreeBSD/armv6 U-Boot loader 1.2
    FPC 4
    PoE firmware 2.1.1.19.3
    Boot Firmware
    U-Boot U-Boot 2016.01-rc1 (Sep 01 2016 – 16:00:13 -0700) 1.3.0
    Boot Firmware
    loader FreeBSD/armv6 U-Boot loader 1.2
    FPC 5
    PoE firmware 2.1.1.19.3
    Boot Firmware
    U-Boot U-Boot 2016.01-rc1 (Sep 01 2016 – 16:00:13 -0700) 1.3.0
    Boot Firmware
    loader FreeBSD/armv6 U-Boot loader 1.2

    show poe controller
    Controller Maximum Power Guard Management Status Lldp
    index power consumption band Priority
    0 740W 0.00W 0W DEVICE FAIL Disabled
    1 740W 0.00W 0W DEVICE FAIL Disabled
    2 740W 0.00W 0W DEVICE FAIL Disabled
    3 740W 5.40W 0W Class AT_MODE Disabled
    4 740W 8.60W 0W Class AT_MODE Disabled
    5 740W 58.20W 0W Class AT_MODE Disabled

    The missing “file” parameter really puzzled me. I’m wondering where to go from there before RMA.

    Thanks

    1. Have you tried just copy and pasting the command? This is one of those commands that won’t autocomplete and are ‘hidden’, if you will.

      Try copying and pasting or just typing it out.

  2. Awesome writeup! I had to perform all methods, multiple times on a single EX2300 until finally it came up properly 😀

    1. Glad to have helped. This is one of those esoteric issues that can throw you for a loop. Hopefully Juniper fixes this (and storage issues) on newer access/edge switch models.

  3. Hello,
    thanks for the help.
    I have a device stucking at stage “POE_SW_ERASE” for hours.
    Even afer a zeroise it does not like to delete …what ever it deletes.

    any ideas?
    thanks
    Hristos

    1. I would zeroize the device again and then wait for the zeroize to process and reboot, then run:

      > request system firmware upgrade poe fpc-slot file /usr/libdata/poe_latest.s19

      If that doesn’t solve the issue, then I would say you probably need to RMA the switch.

  4. I have a EX-2200-C that is having a similar issue, however it won’t accept the upgrade command, as in it won’t even let me type characters after “file”. Any last ditch ideas I can try? I’ve zeroized it already.

    root@s1-ex2200> …stem firmware upgrade poe fpc-slot 0 file
    ^
    syntax error

    root@s1-ex2200> show poe controller
    Controller Maximum Power Guard Management Status Lldp
    index power consumption band Priority
    0 100.00W 0.00W 0W DEVICE FAIL Disabled

    root@s1-ex2200> show chassis firmware detail
    FPC 0
    Boot SYSPLD 20
    PoE firmware Unknown
    PFE-0 3
    Boot Firmware
    uboot U-Boot 1.1.6 (Apr 4 2013 – 10:33:10) 1.0.0
    loader FreeBSD/arm U-Boot loader 1.1

    {master:0}

    1. root@s1-ex2200> …stem firmware upgrade poe fpc-slot 0
      Unable to initiate firmware upgrade.

      1. root> show log messages
        Sep 9 10:00:02 S1-EX2200 newsyslog[2262]: logfile turned over due to size>128K
        Sep 9 10:00:04 S1-EX2200 chassism[1231]: PoE: Write to PoE controller failed
        Sep 9 10:00:04 S1-EX2200 chassism[1231]: PoE: poe_restore_factory_defaults failed

        Everything I can see points to a dead POE controller, but oddly it’s still providing PoE power, the chassis just can’t see it.

    2. What version of Junos are you running?

      I’ve done this procedure on a EX2200-C, and I know it works, so I’m wondering if there’s a software version issue.

      1. I’m on 12.3R12.4. Think it’s worth rolling back and trying the upgrade from an older version? Any idea on which one you would recommend?

  5. EX3400 v20.2R3.9 and I have the same issue. I’ll try your fixes, thanks for the blog post!

  6. hey Jimmy, awesome write up! I’ve done a few upgrades and have some switches that just hang at 0% and 95%. Do you think doing a hard reboot and following one of the methods above is the best next step or do you have other ideas?

    Thanks!

  7. Great post.

    I now have a switch ( standalone ) that just won’t progress after SW_DOWNLOAD(95%) when I do request system firmware upgrade poe fpc-slot 0 file /usr/libdata/poe_latest.s19.

    I gave it hours. Then I cold booted it. show poe controller showed nothing
    ran the upgrade , stalled at 95%… now leaving it overnight.

    to be continued.
    running 22.2R1.9 on a EX3400-48P

    1. 8 hour later.

      show poe controller
      Controller Maximum Power Guard Management Status Lldp
      index power consumption band Priority
      0** 1440W 0.00W 0W SW_DOWNLOAD(95%) Disabled
      **New PoE software upgrade available.
      Use ‘request system firmware upgrade poe fpc-slot ‘
      This procedure will take around 10 minutes (recommended to be performed during maintenance)

      not sure what to do now.

Leave a Reply

Your email address will not be published. Required fields are marked *