Channelizing Ports: The Case of the Missing 1 Gbps Interfaces on EX4650 Switches

A few weeks ago while helping deploy a demo Juniper EX4650 aggregation/distribution layer switch, we ran into a problem where 1 Gbps interfaces would not function correctly; i.e., 1 Gbps interfaces were missing and wouldn’t appear on the EX4650s. The issue went something like this:

  • Plugged-in 1 Gbps SFP modules
  • Ran show chassis hardware and verified SFPs were installed
  • Ran show interface ge-0/0/4 to check out the interface but received “error: device ge-0/0/4 not found

Steve Brule Looking Confused

Huh? Come again?

Truth be told, we were using Cisco 1 Gbps SFPs on the switch, and not having any Juniper SFPs (we never have), we chalked up the issue to this being a newer switch and Juniper not capable of supporting non-Juniper SFPs. Thus we ordered Juniper SFPs, waited for them to ship to us, and then tried again — same result.

Steve Brule saying 'k'.

Fast forward an evening of troubleshooting and waiting for TAC to get involved, eventually a Juniper engineer gave us the solution: we needed to channelize our ports.

Channelizing Ports on Juniper EX4650s

First off, I recommend reading the documentation from Juniper on channelizing ports on EX4650s. It’s not required for what’s below, but if you want more information about it, I recommend reading up on it.

Trivia time! EX4650-48Y-8C switches are the same hardware as QFX5120-48Y-8C, just a different OS package!

EX4650 switches come with 48 SFP+ ports that are capable of up to 25 Gbps ports, but come configured by default as 10 Gbps ports; they also come with 8 QSFP+ ports that are capable of up to 100 Gbps speeds, but can operate as 40 Gbps, or can be broken up individually into 4 channels of 25 Gbps (100 Gbps to 4-25 Gbps via breakout cables) or 4 channels of 10 Gbps (40 Gbps to 4-10 Gbps via breakout cables). Breaking them up would be done via a cable like this (there are multiple options for break out cables; this is just one) option:

Breakout cable - 1 40 Gbps QSFP+ module to 4 SFP+ 10 Gbps modules

If you’re not familiar with channelizing ports (I wasn’t until this), channelizing is the process of configuring interface ports to operate in different capacities. The most important thing to note about QFX5120 or EX4650 switches is that the QSFP+ uplink ports are the only ports that perform a process called auto-channelization, a process in which if you plug-in a module, the port will automatically switch between 100 Gbps and 40 Gbps (not sure if you plugged-in a 10 Gbps module if it would do this). If you wish to use the 25/10 Gbps break-out cables, you’ll need to disable auto-channelization and manually configure the ports to operate as such (that’s outside my scope here, but read the link above for more info).

Why this is so important for my issue is that on EX4650s, the 48 SFP+ ports do not perform auto-channelization! These ports, by default, come configured as 10 Gbps ports, and if you wish to use 1 or 25 Gbps modules, you have to manually configure the switches to perform this. This is exactly why the 1 Gbps modules were not appearing, because the ports were not configured to operate in 1G mode!

To configure the ports for the speed needed, here is the configuration we needed:

Under the chassis > fpc 0 > pic 0 stanzas, we configured speeds for the ports. However, note that the ports configurations above are broken up every four ports; this is because for the 48 SFP+ ports, port speeds are configured in groups of four (quads), and each quad can be 1, 10, or 25 Gbps. Here’s a visual of the quads:

EX4650 Port Quads - every group of four ports are colored and labeled by the first port - Port 0, port 4, etc.
(Click to enlarge)
Each quad is colorized above (port/quad 0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44)

Therefore, in order to configure 1 Gbps interfaces on an EX4650 switch (or a QFX5120 for that matter), you need to manually set the configuration speed in the chassis configuration.

Problem solved. And Bob’s your uncle.

Some additional items that I’ve discovered in this process:

  • Because the ports are configured in groups of 4, all four ports in a quad will be the same speed. You cannot configure port 1, for example, at a different speed. This makes me suspect that on the backplane side of things, each quad is really a 100 Gbps port of some kind, like the uplink ports on the right, but is channelized in some way on the backend. Maybe. Not sure. This post makes question that logic.
  • Like other EX series switches, non-Juniper copper 1 Gbps modules do not work. The copper modules must be Juniper (or Juniper-coded) SFPs.
  • Non-Juniper 1 Gbps optic modules do work correctly.
  • The eight uplink ports on the right are configured individually, not as quads.
  • Unlike older Juniper equipment, a system reboot is not required for changing port speeds.

Cisco Engage Boise 2020: Get Your Programability On

Note: I wrote this the evening of the event. Just getting around to publishing it now.

Another day has gone and another Cisco Engage Boise is behind us. This year at Cisco Engage Boise 2020, there was a very clear theme that I took away: network engineers need to be adding programability to their skill sets.

Cisco Engage

Note that I used the term programability and not automation. I think Cisco did a fantastic job making this distinction because the term automation gets thrown around a lot in engineering circles, often mistakenly conflated with scripting, or perhaps even with programability.

So what is programability and how is it different than automation? In my mind and in the context of network engineering, programability is an approach to network engineering that utilizes APIs and other tools within a software development framework to accomplish network engineering tasks. Automation, on the other hand, utilizes programability, but uses it to repeat tasks on regular intervals or based on events, both without the need of human interaction.

Cisco made it abundantly clear that future of networking involves incorporating programability into your skill sets, and they’re emphasizing this so much that they’re incorporating programability into the new certifications (click here or here for a great explanation). The engineering tracks will have 80% engineering and 20% programability, whereas the new developer-focused certifications will have 80% development, 20% engineering.

Cisco certifications, showing an engineering track and developer track of certifications.

Cisco is also pushing this paradigm shift by offering a ton of resources on it’s learning platform DevNet, which has a lot of cool resources, and was also a topic that received a lot of airplay today.

Personally, I’ve long held the belief that programability is the future for network and system engineers alike, but to be completely honest, I’ve often been guilty of using the term automation to describe something that I didn’t have great word for. It’s great to see Cisco finally start pushing this, but I must say other vendors having been pushing programability and education on this topic for awhile now; why this is more relevant now is because the big player in this industry, Cisco, is pushing it now, so hopefully we’ll get better programability offerings such as APIs, structured data models, etc.

Sessions

Of course, it goes without saying that Cisco Engage Boise 2020 also had sessions, but to be honest, I don’t think there’s much to note that hasn’t already been mentioned on Reddit or discussed by Packet Pushers. There were a few different sessions to select from, but no tracks, so I went to sessions I thought were interesting for my interests right now (largely operational logistics and preventative analytics).

For conferences and sessions, I must say I have a bit of fatigue for security topics right now, especially on the networking front, so I avoided those. I don’t think anyone is really presenting anything new with security, and most of the fashionable topics are largely relevant to the higher tier of the NIST or CIS security frameworks. Most issues are addressed in the lower tiers, and it’s in those tiers where I’m interested.

That said, for the sessions I attended, I have just a few notes worth writing about:

  • Zero Trust –  Cisco’s philosophy to zero trust is like others: assume the traffic malicious until proven otherwise. Cisco basically believes there are three components that need to be addressed in order move towards a zero trust framework, so Cisco has a off-the-shelf framework for you to purchase and implement. The framework:
    • Users need to be interrogated and challenged for access. Cisco product: Duo.
    • Workloads need to be monitored and inspected for malicious activity. Cisco product: Tetration.
    • Workplaces need to be secured and addressed. Cisco products: DNA, SD-Access and/or ISE.
  • Hyperflex – This is Cisco’s hyperconverged infrastructure, and of note Cisco is going to be deploying Cisco Hyperflex Application Flatform (HXAP), which is a hypervisor to compete with the likes of VMware and will include it’s own Kubernetes implementation. Initially HXAP will be focused on containers, but eventually will grow to allow VMs and so forth. They noted that you’ll be able to avoid the ‘VMware tax’, but I’m certain that will just be replaced with the ‘Cisco tax’.
  • DNA – DNA was definitely featured in a few of the sessions I attended, but I don’t care to go into the details of it. DNA basically is a platform to monitor and manage anything and everything about your Cisco network gear (well, only stuff that does 16.9, and basically on the hardware side is limited to 3850s and 9000 series gear). What really stood out to me about watching DNA in action for the first time was why Juniper made it’s Mist acquisition, because Juniper doesn’t really have a complete product to compete with DNA (yet, but they are making great strides). However, having features and having capabilities is one thing, but execution and proof of execution is a whole other thing — not to mention the price tag for the full vertical integration of everything to integrate with DNA.

Final Thoughts and An Evolving Mindset

When I signed up for Cisco Engage, I fully expected product evangelism — and I was not disappointed. I have a bit of an allergy to sales for some reason, but I’ve been challenging myself about this topic recently, and as a result, I found today’s conference quite useful.

That said, what I’m finally coming around to is the notion that I am a network engineer first (although I use engineer quite loosely here), and a user of products second. At a fundamentally high level, I am trying to solve networking problems for the organization I’m a part of, helping us accomplish our goals the best I think we can. It mostly doesn’t matter what products we deploy and use, as long as we can accomplish what the business needs us to.

I realize that’s not too much of a profound thought, but at times I feel like there’s a religious battle that continually takes place between vendors x, y, and z, aa, bb, etc. The reality is that, to paraphrase a podcast I was recently listening to, the technology landscape is much more diverse than just one or two vendors; there is plenty to learn from out there, and I think limiting yourself to one or even two vendors might force you into a paradigm/framework bubble that may not even be the best solution for your problems, maybe even stifling your ability to think more critically and out-of-the-box.

As a result, I’m trying to stay open to more ideas in networking, trying to not see every problem as a nail.

Connecting to Multiple Devices with Netmiko Using Python Threads and Queues

tl;dr – Click here to go straight to the Python example.

The journey to automation and scripting is fraught with mental obstacles, and one concept I continued to not really comprehend in Python was the concept of threading, multiprocessing, and queuing.

Python Logo

Up until recently, I felt like I basically had my dunce cap on (relatively speaking, of course) and was restricted to sequential loops and connections — in other words, I was stuck in “for i in x” loop land and could only connect to one device at a time. In order to speed up my scripts and connect to multiple devices at once (using Netmiko, for example), the path to that is through queues, and threading/multiprocessing.

Ultimately I landed on threading instead of multiprocessing because when you’re connecting to devices/APIs over the network, you’re typically waiting for a remote host to process the request, and thus your CPU is sitting there ‘idle’ waiting. To quote a great blog post that breaks down threading versus multiprocessing:

“[t]hreading is game-changing because many scripts related to network/data I/O spend the majority of their time waiting for data from a remote source. Because downloads might not be linked (i.e., scraping separate websites), the processor can download from different data sources in parallel and combine the result at the end.” (source)

While the above link has some great examples, for some reason I still didn’t quite grasp the concept of threads and queues, even after trying the example of other approaches. Why? Well, sometimes we need different perspectives to a problem because we all learn differently, thus my hope here is to provide a different perspective to threading and connecting to multiple devices with Python.

Netmiko Using Threaded Queue

I don’t want to waste to much time, so let’s just cut to the chase and get to the script:

I’m going to try a different approach here, so here’s an overly verbose perspective on how the script runs. It’s a step-by-step breakdown of how it processes. That said, as much as a I tried to describe the process in a linear manner, it’s not going to be perfect.

  1. Load and stage the modules and terminal messages relating to hitting ctrl+c. (lines 6-20)
  2. Load the global variables (lines 23-38)
    1. Prompt the user to securely enter a password (var: password) (line 23)
    2. Read a list of IP addresses from text file (vars: ip_addrs_file and ip_addrs) (lines 26-27)
    3. Set up the number of threads to spin up (var: num_threads) (line 30)
    4. Set up the global queue that we’ll use to set up a list of ip addresses to be processed later (var: enclosure_queue) (line 32)
    5. Set up an object that we’ll use to lock up the screen to print the contents of a thread so as to avoid having threads print over each other later (var: print_lock) (line 34)
    6. Set up the command you’ll want to run. This is a simple one command script for the purpose of the demo. (var: command) (line 38)
  3. The two functions deviceconnector() and and main() get loaded and staged. (lines 41-107)
  4. The main() function is called and begins the execution of the main components of the script (line 112)
    1. Loop through a number list (num_threads), and for each number in that list (i): (line 92)
      1. Load a thread that runs an instance of deviceconnector() sending to the function the thread number (i) and the global queue (enclosure_queue) (line 95)
        1. deviceconnector() accepts the variables i as i, and enclosure_queue as q (line 41)
        2. deviceconnector() starts an unending while loop that: (line 45)
          1. Attempts to acquire an IP address from the global queue (line 50)
            1. If there is no IP address in the queue, the while loop will be blocked and wait until there is an ip address in the queue
          2. Sets up dictionary for Netmiko (lines 54-59)
          3. Netmiko attempts to connect to the device (lines 52-53)
            1. If there is a time out, lock the process and print a time-out message, marking the queue item as processed and restarting the while loop (lines 55-58)
            2. If there is an authentication error, print an authentication error and exit the whole script (lines 70-73)
          4. Send command to the device, lock the process and print the output (lines 76-80)
          5. Disconnect from the device, mark the queue item as complete, and loop back (lines 83-86)
      2. Set the thread to run as a daemon/in the background (line 97)
      3. Start the thread (line 99)
    2. Loop through a list of IP addresses (ip_addrs), and for each IP address (ip_addr) (lines 102-103)
      1. Add the IP address (ip_addr) to the global queue  as an individual queue item to be processed (line 103)
    3. Wait for all queue items to be processed before exiting the queue and script (line 106)
    4. Print a statement to the console indicating the script is complete (line 107)

Use this as you wish, and hope it’s helpful.

Here’s the Github version.

Credit and Additional Info

This was inspired by a few different blog posts, so here’s some additional info to follow:

  • Multiprocessing Vs. Threading In Python: What You Need To Know.
    A great breakdown of threading versus multiprocessing, and influential for some of the work I’m doing.
  • How to Make Python Wait
    This one actually reignited my interest in figuring out how to use threading. It’s a good explanation of the different approaches to make a script wait in Python.
  • Queue – A thread-safe FIFO implementation
    Although written in Python 2, this post helped me put everything together so I could understand what the heck is going on. Some of the code I used here, but refactored for Python 3. Below is a crude diagram I did to help me figure out what was going on with this post, and the circles with arrows indicate loops, with the ‘f’ in the middle meaning ‘for’ loops and ‘w’ meaning ‘while’ loops.

Diagram that attempts to show how the thread and queue process works. Too complicated to explain in an alt tag, so look at code.

Quickie: Setting Up SMTP Relay With G Suite Domain Without Pulling Your Hair Out and Questioning Your Google-Fu Skills

For the love of Pete!

I just spent an hour or two in a furious state trying to set up SMTP relay with a G Suite domain, and it was stupidly frustrating because there was one little component stopping everything. Oh the rage!

Kid in Incredible Hulk outfit raging...kind of.

My intent was simple: SMTP relay using a domain account for authentication and TLS. Should be easy.

When you look at the the instructions for setting up SMTP relay, they do appear on the surface pretty straight-forward:

  • Go to G Suite > Apps > Gmail > Advanced Settings
  • Make sure you’re at the top level of the OU structure (you should be)
  • Add “SMTP relay service” (it gives you a few options, in my case I want to use an account to authenticate, see below)

SMTP Relay service options in G Suite

Then you configure your SMTP relay settings on your application to point to smtp-relay.gmail.com on port 587, input your SMTP authentication and then all done, right?

WRONG! You’re going to get continual authentication errors (Google’s SMTP error code “535 5.7.8”) and Google’s SMTP service will tell you to pound sand and send you to unhelpful help articles about having bad credentials.

You have to perform one more step not mentioned in the Google documentation (here’s where I found this fix): enable “Less secure app access” in your Google service SMTP account settings. Easiest way to get to it is go to Gmail > click on the service account profile > click ‘Manage your Google Account’ (tangent: why is ‘your’ not capitalized?) > then just search on the top for ‘Less secure app access’ and toggle the button to on. It looks like this:

Google account settings with Less Secure App toggle buttonThat’s it! After that, SMTP relay will start working correctly.

Maybe this will help prevent some Hulk transformations.

(Update 20191220.1238 – Reworked some parts because I was in a rush last night).