ACI and L4-7 Integration

I’ll start this post by stating that this is entirely an opinion piece. It’s certainly based on some real life experience but doesn’t necessarily reflect the opinion/position of anyone but myself.

Since its launch, one of the key features that has been touted (especially by marketing!) has been ACI’s integration to the greater layer 4-7 services ecosystem. Thats basically a fluffy way to say that ACI is intended to integrate with L4-7 devices such as firewalls, IPS, and load balancers/ADCs. That’s all well and good as of course any data center switch/platform would need to integrate with an A10/F5/etc., ACI however takes that to a new level by having native integration directly into the APIC platform. What does this actually mean, and do we care? Lets first talk about the three ways that we can integrate these third-party devices into the fabric.

1- “Traditional”

Perhaps this should be called the “legacy” method, but we’ll stick with “traditional” as it sounds classy. Simply put we treat the fabric like we would treat any 5k/7k/Arista/Juniper/blah, and we run a trunk (or a vPC) up to the device — on top of this trunk/link we put a bunch of VLANs and/or we run a routing protocol. For ADCs this generally means one big vPC and a bunch of VLANs — the ADC then has an IP(s) in whatever subnets are associated to those VLANs (EPGs). For firewalls this is generally a Layer 3 out which is a routing construct in ACI – we just route to/from the firewall using different VRFs on each zone/nameif of the firewall to keep traffic isolated and then we prefer to reach the adjacent zones via the route learnt (or static) via the firewall. Pretty straightforward stuff, much like we do in traditional networks, just with fancy ACI words!

2- Device Packages – Managed Service Graphs

This is/was the grand idea around L4-7 services in ACI. Some manufacturer (Cisco, F5, A10, Palo Alto, etc.) creates what is called a device package. This device package is really just a script that tells the APIC what the device is capable of doing, and how the APIC can tell the device what needs to happen. The device is then used in a service graph. The service graph is basically an outline of your service chain, or how you want traffic to flow through the fabric and the third-party devices. Lets say you have an example three-tier web app — you may wish to have a load balancer sit in front of the web tier, and a firewall sit between the web tier and the app tier. You would basically (I’m going to over simplify this quite a bit since this post is not about the technical how of device packages/service graphs), tell the fabric where you want the services inserted, and define some parameters about the service graph. You could pick lets just say a “one-armed” setting on the load balancer and define the VIP pool and real server IPs, and then on the firewall you would push some security policy outlining the particular flow. The APIC then makes API calls out to the third-party devices to configure all of what you just modeled (because it knows how to interact with these devices via the device package). More on this in a bit…

3- Unmanaged Service Graphs

This is the latest, and perhaps greatest(?) flavor of L4-7 integration with ACI. As of the 1.2 code train we have the option for an “unmanaged service graph.” This is basically the same as the managed service graph, except the APIC will NOT configure any of the third-party devices. You still define them and model the flow in the controller, however you must manually configure the load balancer or firewall or whatever. I think the spirit here is that we are still modeling the application, the flow, and the dependencies in ACI, but shedding some of the complexities involved with having the APIC configure third-party devices.

Okay, so that’s the super quick and dirty run down of the ways we can integrate these load balancers and firewalls and the like into the ACI fabric. Now, lets talk a bit about the pros and the cons for each.

1- “Traditional”

  • Pros
    • Not much changes, this is the same way we’ve been doing things for years.
    • It’s relatively simple to configure.
    • Easier for teams who are siloed (which is bad!).
    • Better from a CMP perspective (more on this later).
  • Cons
    • Not much changes, this is the same way we’ve been doing things for years.
    • Not very dynamic/programmatic.

2- Device Packages – Managed Service Graphs

  • Pros
    • Automated provisioning of stuff — do things in the APIC then don’t worry about the firewall/ADC/etc.
    • Visual documentation of whats going on — the APIC is the central source of truth for the application and it’s basically self documenting.
    • Easier to do complex things — like insert a transparent device into the traffic flow. This could be done in the Traditional way but would take a lot of steps, the service graph is intended to minimize the effort there.
  • Cons
    • Vendor support:
      • It’s up to the third-party vendor to build and maintain the device package, what happens with code upgrades and supportability. The jury is still out on this, but it worries me.
      • Limited exposure to the APIC — not all features are exposed in a device package. What happens if you want to play with widget X, but it’s not in the device package? You’re back to manual configuration, no good.
    • Hard for siloed teams since they’ve got to give up control of their firewall/ADC to the network team (or join the network team I suppose).

3- Unmanaged Service Graphs

  • Pros
    • Visual documentation of whats going on — the APIC is the central source of truth for the application and it’s basically self documenting.
    • Less complicated/less dependencies than managed service graphs.
    • Easier for teams who are siloed (which is bad!).
    • Better from a CMP perspective (more on this later).
  • Cons
    • Not very dynamic/programmatic.

Okay so that’s my run down, I think that it’s probably pretty clear on where I stand on this at this point, but just in case it’s not let me explain.

Historically I’ve been a strong advocate for option 1 above. Device packages are dependent upon the third-party vendor — meaning Cisco does not write or maintain these device packages. Thats given me (personal opinion) some pause — what happens when I upgrade my L4-7 device, or upgrade the fabric, will there be any interoperability issues (this is relatively unlikely but I think about this possibility). Depending on the third-party vendor the device package may or may not be very robust; F5 for example has done, from what I’ve seen, the best job at making a fully featured device package, however this is not necessarily representative of others. So the potential for lost functionality and ultimately having to manually administer the L4-7 device to complete whatever task is at hand is a potential pitfall. Finally, being perfectly honest, the device package deployment and configuration is not the most straightforward thing — at the very least connecting a device to ACI and manually configuring an EPG is simpler.

Now that we have the unmanaged mode, thats quickly becoming the preferred method. With this option we get to have some visibility into where on the fabric L4-7 devices sit, and how and where they interact with EPGs. With unmanaged mode you still configure the service chain in a similar way as the managed mode, however gone are any complexities/pitfalls of having the fabric configure the third-party device.

In my opinion the bigger question here isn’t really whether or not you can manage L4-7 devices via ACI — but if you should. Most of the time, ACI is just one bit of the overall automation/orchestration push. Generally speaking at some point a cloud management platform (CMP) of some sort will come into play — this could be UCS-D, CliQr, or any other similar type platform. Once this platform is in the environment it is generally used as the central source of automation/orchestration — often times used in conjunction with some type of service catalog (Service Now is common). The CMP will make API calls to configure the various components required to execute a job or to build some environment/platform as permitted by the service catalog. Likely you’ll have API calls to the compute layer, the storage layer, and of course to ACI for the network and security (contracts) components. At this point the L4-7 bits come back into play — generally there will be something that has to be done on the F5 (for example) when deploying a new application via the CMP. At this point in our example the CMP has made API calls to three different platforms (compute, storage, network/security), and could easily be configured to make a fourth to the F5. IF we are managing the F5 via ACI, then we must make an API call to the APIC from the CMP, which in turn will kick off another API call from the APIC to the F5. Not only is this an unnecessary layer of abstraction, it’s also potentially difficult to handle and creates more work. Ideally the CMP could simply make an API call (perhaps it’s natively supported) to the L4-7 device and cut out the middle man of ACI.

Don’t get me wrong, ACI can absolutely handle the device package/managed mode, and could of course be used in conjunction with a CMP, I just personally think that its more work and complexity than it needs to be. So, in short, my $0.02 is to go with “Traditional” or Unmanaged Service Graph modes.

ACI Troubleshooting – The case of the Blackhole Leaf!

The other day I was working with some folks on their ACI fabric and we ran into some, what I thought at the time, weird stuff. Turns out the troubleshooting revealed the fabric was acting exactly the way it was supposed to (which may not be directly obvious if you’re not familiar with ACI) and the whole thing was a good reminder to not make assumptions about stuff 🙂 Anyway, figured I’d try to explain the perceived problem, a bit of the troubleshooting steps, and the ultimate resolution so hopefully this doesn’t stump anyone else!

Lets start with a quick overview of the topology we’ll be discussing:

blackholeleaf

Nothing earth shattering here — basic ACI fabric (not totally drawn, only the two relevant leaf nodes pictured to keep the clutter away), a L2 connection to another switch (over a “cloud” in this case), and a pair of out-of-band switches. We ended up needing to temporarily connect the out-of-band switches to the fabric to aid in some file transfers and the like (obviously not so out-of-band at that point, but it is just temporary!).

The setup here is pretty straightforward — both of the out-of-band switches have simple /30 links to the fabric — “OOB1” connects to Leaf_1, and “OOB2” connects to Leaf_2. OSPF peering is up and exchanging routes between the fabric and the out-of-band switches.

The other non-ACI switch (top of the diagram) is trunking some VLANs into ACI, including VLAN 999. VLAN 999 has IP interfaces on both the ACI fabric and the top switch. We’re basically using VLAN 999 to route between this otherwise simple L2 trunk. The fabric has 10.9.9.4/24 (remember, the whole fabric runs any cast, so every leaf has this IP), and the top switch has 10.9.9.1, 10.9.9.2, and 10.9.9.3 (there is as second switch not pictured, in an HSRP pair, hence the three IPs). The top switch has a static route to 10.10.10.0/24 which is our out of out-of-band subnet living on the OOB switches, the static route points to the 10.9.9.4 address living on VLAN 999 in the Fabric. The last important bit about the top switch is that it also has VLAN 1000 10.0.0.1/24 (more on this in a bit). The Fabric has a default route pointing back to the top switch on VLAN 999. Finally, the OOB switches are learning Fabric local routes via OSPF, but have simple static routes for all RFC1918 subnets pointing at the Fabric to cover everything else (they need to have their default route go another way so that other OSPF peer has a better cost, and thus the default route on the OOB is in OSPF, but from a different neighbor).

Phew… still with me? Lots of setup here for this post I guess!

So if you’re tracking this far, basically we have OSPF from the OOB to ACI, but really are just relying on static RFC1918 in this direction. Then ACI has a default route via OSPF to VLAN 999 on the other switch. In the reverse direction, the “top switch” has a static route to the OOB subnet via VLAN 999 in ACI, then ACI knows about the OOB subnet via OSPF from the OOB switches.

Okay, so on to the issue: top switch (really should have named that something… too late now!) can ping 10.10.10.1 and 10.10.10.2 sourced from VLAN 999, and it can ping 10.10.10.1 and 10.10.10.2 sourced from VLAN 1000 as well. So far so good right? Heres where it gets problematic — again from VLAN 999, top switch can NOT ping 10.10.10.3. Same story from VLAN 1000. So what gives?

Lets start looking at basic routing to confirm that what I’ve outlined here isn’t a lie 🙂

OOB1 has the following routes in the FIB, note that 10.255.0.0/30 is the routed link between OOB1 and Leaf1:

OOB1# show ip route
10.0.0.0/8, ubest/mbest: 1/0
 *via 10.255.0.2, [1/0], 2d07h, static
172.16.0.0/12, ubest/mbest: 1/0
 *via 10.255.0.2, [1/0], 2d07h, static
192.168.0.0/16, ubest/mbest: 1/0
 *via 10.255.0.2, [1/0], 2d07h, static

OOB 2 has the same statics in the FIB, just going to 10.255.0.6 (Leaf2). Okay that’s great. What about ACI — and maybe a better question if you haven’t played with ACI — how the hell do I see what routes ACI has?

For now the easiest way to do this kind of troubleshooting in ACI is still the CLI (I know I know I know — don’t be the elevator operator, but that’s just where we are at for now). So SSH into your APIC (or you can go directly to your Leaf nodes if you have in band or out-of-band setup for that). then you can connect to the Leaf node of your choosing with the following command:

attach <<LEAF_NAME>>

Of course <<LEAF_NAME>> is what you named whatever Leaf you want to get into. In our case, we’ll start at Leaf_1. When you get into the Leaf, you are initially dropped into iBash. This is a Linux shell that has some NX-OS-like commands you can do (stuff like “show vrf”). From here we want to get into vShell which is still a Linux thing, but is like a normal NX-OS (for most show commands at least). You can do that by simply running the command “vsh”.

Once in the vShell you have a very NX-OS like place to poke about. For now we want to simply look at the routing table for the appropriate VRF — remember that everything in ACI is in a VRF under the covers. In our case we just want to peek at the routing table for VRF “Tenant.”

Leaf_1# show ip route vrf Tenant
IP Route Table for VRF "Tenant"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%' in via output denotes VRF 
0.0.0.0/0, ubest/mbest: 1/0
 *via 10.9.9.1, vlan999, [110/1], 1d06h, ospf-default, type-2, tag 1
10.10.10.0/24, ubest/mbest: 2/0
 *via 10.255.0.1, vlan2, [110/15], 1d06h, ospf-default, intra

Some things to note about all this… the VLANs in real life will be whatever ACI decides to assign. This is because this is all internal to the Fabric. I just changed things to make it a bit more readable. Anyway, you can see this Leaf has the default route as we expected and the more specific OSPF routes via the OOB switches. The OOB switches are single attached, so this Leaf only has the route from OOB1, but if that link failed the route from OOB2 would be populated in the FIB. So basically, so far so good right? Lets look at Leaf_2 just for kicks:

Leaf_2# show ip route vrf Tenant
IP Route Table for VRF "Tenant"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%' in via output denotes VRF 
0.0.0.0/0, ubest/mbest: 1/0
*via 10.9.9.1, vlan999, [110/1], 1d06h, ospf-default, type-2, tag 1
10.10.10.0/24, ubest/mbest: 2/0
*via 10.255.0.5, vlan2, [110/15], 1d06h, ospf-default, intra

Same deal here, just learning the OOB subnet via the locally attached OOB switch. Okay, great!

Finally, lets take a look at the “top switch” routing table:

Top_Switch#sh ip route
S*  10.10.0.0/24 [1/0] via 10.9.9.4

That looks good – of course there are more routes, but that’s the relevant one for us.

So bottom line is that things are looking how you would expect them to look. Lets test things out from top switch:

Top_Switch#traceroute 10.10.10.2

Type escape sequence to abort.
Tracing the route to 10.10.10.2
1 10.9.9.4 0 msec 0 msec 4 msec
2 10.255.0.1 0 msec 4 msec 0 msec

Top_Switch#traceroute 10.10.10.3
Type escape sequence to abort.
Tracing the route to 10.10.10.3
1 10.9.9.4 0 msec 0 msec 4 msec
2 10.255.0.1 0 msec 4 msec 0 msec
3 * * * 
4 * * * 
5 * * *

Okay weird right? So basically top switch can hit OOB1, but dies at OOB1 while trying to get to OOB2… If it can get to OOB1, and the destination is in the same subnet you would think that we would be able to get to OOB2 unless OOB2 doesn’t have a route back to top switch… okay so lets look at that:

OOB2# trace route 10.0.0.1
 traceroute to 10.0.0.1 (10.0.0.1), 30 hops max, 40 byte packets
 1 10.255.0.6 (10.255.0.6) 1.332 ms 1.048 ms 1.057 ms
 2 * * *
 3 * * *

Well, we get to Leaf_2… I guess thats a start, but why the hell is it dying there? We know that Leaf_2 has a route to top switch as well as the OOB subnet so why wouldn’t that work?

Lets look back at this one key bit:

0.0.0.0/0, ubest/mbest: 1/0 *via 10.9.9.1, vlan999, [110/1], 1d06h, ospf-default, type-2, tag 1

VLAN 999 shows up here on both leafs as the location we are learning the default route from. If, however, we look and see whats going on in VLAN land on both leafs we see something a bit confusing:

Leaf_1# show vlan
VLAN Name Status Ports
---- -------------------------------- --------- -------------------------------
999    --                               active    Eth1/48
Leaf_2# show vlan
VLAN Name Status Ports
---- -------------------------------- --------- -------------------------------

Obviously these switches would have some VLAN action happening, but the point I’m trying to make here is that VLAN 999 has not been created on Leaf_2. Of course in ACI there is no “vlan 999” config to enter, so what gives? Simply put, ACI will never instantiate a VLAN on a leaf node that doesn’t have an EPG configured with a static path binding on it. Okay what the does that mean in non-ACI wordy words? Basically if there is no port that needs to be configured for a VLAN on a leaf, that VLAN will never get created. This is a thing for a great reason — why the hell would we build a ton of VLANs that we don’t need? That’s the point, lets not waste TCAM space and have config clutter etc. when we don’t need to. So why is this a problem for us? Well… the leaf is trying to route to a next hop in a VLAN that it doesn’t have — probably won’t work out too well eh? It turns out it doesn’t. Remember WAY long ago in the beginning of this post where I said don’t make assumptions? Yeah… well I made the assumption that “top switch” was dual attached to the fabric. If that was the case, then the EPG for VLAN 999 would have had a static path binding (basically hooking an EPG — and in turn a VLAN — to a port) for both leaf switches. Given that was NOT the case, VLAN 999 never existed on Leaf_2 and it basically become a giant black hole.

So the end “fix” is one of two things — either dual attach “top switch,” or attach OOB2 directly to Leaf_1. Obviously the better thing to do would be to dual attach “top switch,” in our case since this was a temp thing, we just attached OOB2 to Leaf_1.

 

Now I’m fairly certain there was something else wrong with this fabric unfortunately, but I didn’t have time with the customer to dig any deeper and since it was a temporary thing it didn’t end up warranting too much effort to investigate. In any case, the point of this was just to walk through some troubleshooting that may seem a bit abnormal at first, but is really a lot of the same stuff we’ve been doing on “normal” Nexus gear for a while. If I’m being honest this was a while ago so some of the output and stuff has been made up a bit to help fill in the blanks in my memory, but all in all should be relatively close to real outputs 🙂