Guest Post! WTF Are all those Checkboxes? (ACI L3 Outs) – Part 2 of ???

My friend and colleague Mr. Jason Banker recently ran into some good times with the mysteries of the ACI L3 Out Checkbox Madness! He Slack’d me and told me he’d found some clowns blog post about it (yours truly) and that some updates and additional information was needed, so he kindly volunteered some time to help out! Without further ado here is Jason’s Checkbox Madness:


 

As we continue to deploy fabrics we always joke about these damn routing checkboxes shooting us in the foot.  We play with different scenarios in the lab to ensure we understand how these pesky boxes work and what other options we have for future deployments.   The scenario here was to use get different OSPF areas connected to the same border leaf using ACI as the transit.  This scenario brings up some certain challenges and hopefully my testing will help others understand it a little better as well.

Design:

We have two external routers coming into a border leaf on ACI, two L3Outs (required because of multiple areas), one is Area 0 (backbone) and one is Area 1.  Here is the breakdown of routes on each router:

External Router 1 (Area 0):

  • Loopback0: 2.2.2.2/32
  • Loopback1: 4.4.4.4/32
  • Transits: 192.168.0.0/29

External Router 2 (Area 1):

  • Loopback0: 3.3.3.3/32
  • Loopback1: 5.5.5.5/32
  • Transits: 172.16.0.0/29

 

Using ACI as a transit we want routes from Area 0 to be imported into Area 1 and vice versa across the two L3Outs.   We will skip the build of the L3Out portion but I want to focus on those pesky checkboxes again.  Whenever I build an L3Out my network EPG usually looks something like this:

By default, “External Subnets for the External EPG” is checked (this checkbox simply enforces policy on this L3out and contracts are applied to the specific subnet) and I am using 0.0.0.0/0 network as a catch-all.  Moving along with the defaults, I show full adjacency:

 

As well as a full routing table within ACI, receiving the networks above in the fabric as expected:

Note:  Anything received from area 0 is shown as backbone and everything from area 1 is 0.0.0.1.

 

Now if look at the routing table for External Router 1 we see no routes across the fabric being received from External Router 2.

 

Let me check my OSPF Neighbors:

 

So, we have no OSPF routes but we have a neighbor relationship.  Let’s go check External Router 2:

 

So, we are showing some OSPF routes but they are only the loopbacks of the ACI Fabric (Area 0), not what we are necessarily looking for.  ACI blocks transit routes between different L3Outs unless permitted by policy via an OSPF area filter-list (to verify ssh to the border leaf and run “show route-map”).  Let’s go look at the Network EPG checkboxes again and see if we can get routing to occur between OSPF Areas across the fabric.

 

As we showed earlier we are using a catch-all 0.0.0.0/0 with “External Subnets for the External EPG”. If we select the “!” on the upper right portion of the screen we will receive the properties of this screen:

 

Based on what this screen states, “Export Route Control Subnet” – controls the export route direction and “Import Route Control Subnet” – controls the import direction.  This sounds like what we need to get routing to traverse the fabric.  Let’s go ahead and select them for Area 0 but before we can select import there is another configuration we need to apply before we can get import to not be grayed out.  If we go back to the top of the L3Out in the navigation pane we need to select the “Route Control Enforcement:” import checkbox:

 

Now if we go back to the Network EPG we should have both options available to us:

 

Now let’s see if we have any changes within our routing table:

 

We still have no change in the table.  Remember that we are using a catch all 0.0.0.0/0? This would require us to also select the aggregate export and import features on the subnet/network epg we have created for Area 0 and Area 1:

 

Time to verify:

 

This looks great.  Now we will verify External Router 2:

 

If we want to take it a step further we can do network specific routes instead of a catch-all:

 

Router 2:

 

Verify Router 2 is receiving 2.2.2.2/32 from Router 1:

 

Now we can send 3.3.3.3/32 from Router 2 into Router 1:

 

Router 1:

 

As I stated earlier these checkboxes are updating route-maps and prefix-lists within ACI.  Prior to us selecting the import/export feature our route-maps had a deny all so no routes would traverse areas.  Upon selecting these checkboxes we can see the change:

 

Instead of giving you route-map blah I will try and breakdown the map for you just focusing on export route-map. Prior to import/export our route-map looked like this:

route-map exp-ctx-2392064-deny-external-tag, deny, sequence 1

  Match clauses:

    tag: 4294967295

  Set clauses:

route-map exp-ctx-2392064-deny-external-tag, deny, sequence 9998

  Match clauses:

    ospf-area: backbone

  Set clauses:

route-map exp-ctx-2392064-deny-external-tag, deny, sequence 9999

  Match clauses:

    ospf-area: 0.0.0.1

  Set clauses:

route-map exp-ctx-2392064-deny-external-tag, permit, sequence 10000

  Match clauses:

  Set clauses:

route-map exp-ctx-2392064-set-external-tag, permit, sequence 2

  Match clauses:

  Set clauses:

    tag 4294967295

route-map imp-ctx-bgp-st-interleak-2392064, deny, sequence 1

  Match clauses:

    tag: 4294967295

  Set clauses:

route-map imp-ctx-bgp-st-interleak-2392064, permit, sequence 10000

  Match clauses:

  Set clauses:

 

You can see that we had “deny” for backbone and area 0.0.0.1 preventing us from using the fabric as a transit.  After we selected import/export features our route-map is updated as such (just focusing on the export route-map):

route-map exp-ctx-2392064-deny-external-tag, deny, sequence 1

  Match clauses:

    tag: 4294967295

  Set clauses:

route-map exp-ctx-2392064-deny-external-tag, permit, sequence 9801

  Match clauses:

    ip address prefix-lists: IPv4-ospf-rt2392064--0-dst-rtpfx

    ipv6 address prefix-lists: IPv6-deny-all

    ospf-area: backbone

  Set clauses:

route-map exp-ctx-2392064-deny-external-tag, permit, sequence 9802

  Match clauses:

    ip address prefix-lists: IPv4-ospf-rt2392064--1-dst-rtpfx

    ipv6 address prefix-lists: IPv6-deny-all

    ospf-area: 0.0.0.1

  Set clauses:

route-map exp-ctx-2392064-deny-external-tag, deny, sequence 9998

  Match clauses:

    ospf-area: backbone

  Set clauses:

route-map exp-ctx-2392064-deny-external-tag, deny, sequence 9999

  Match clauses:

    ospf-area: 0.0.0.1

  Set clauses:

route-map exp-ctx-2392064-deny-external-tag, permit, sequence 10000

  Match clauses:

  Set clauses:

route-map exp-ctx-2392064-set-external-tag, permit, sequence 2

  Match clauses:

  Set clauses:

    tag 4294967295

route-map imp-ctx-bgp-st-interleak-2392064, deny, sequence 1

  Match clauses:

    tag: 4294967295

  Set clauses:

route-map imp-ctx-bgp-st-interleak-2392064, permit, sequence 10000

  Match clauses:

  Set clauses:

Now our route-map has been updated with prefix-lists to allow our traffic across areas, we will look at the prefix-list itself:

 

Leaf-103# show ip prefix-list IPv4-ospf-rt2392064--0-dst-rtpfx

ip prefix-list IPv4-ospf-rt2392064--0-dst-rtpfx: 1 entries

   seq 1 permit 0.0.0.0/0 le 32

 

The 0.0.0.0/0 catch-all has been added and our routes can traverse the fabric.  I suggest you also peak at the import route-maps and see what is happening under the hood there as well.

 

 

 

 

Post-TFD Segment Routing Roundtable Thoughts

My brain has now had a bit of time to recover from the information overload that was the Tech Field Day Segment Routing Round Table, so it is most definitely time to write a bit about what I learned. You may want to get a listen in on the Software Gone Wild Podcast with Ivan Peplenjak for a solid foundation of what SR is before jumping into things. After that, head over to the TFD YouTube channel to check out the recordings from the event. We had some really great presentations from Walmart, Microsoft, and Comcast, each of these companies explained how Segment Routing is helping them in their particular environment. I would start with the presentation from Mark Pagan of Walmart as it goes over a lot the real world day 1 benefits of SR. Then take a listen to the Microsoft and Comcast presentations, they really kicked it up a lot in terms of complexity of their overall solutions, but also really highlighted a lot of what is possible with Segment Routing.

I’m not going to try to write anything too technical about SR because I am definitely not enough up to speed on it to talk about it at that level. What I am going to do is jot down my view on it as a technology, and its applicability (in my mind I guess) in day-to-day network world. I also want to respond to my own thoughts/questions from my previous post before the TFD event.

I’ll try to address my own previous points first:

– What ever happened to NSH: Guess I didn’t really get a solid answer here. As far as I can tell NSH is still technically a thing, but really seems to be fading away. I think ultimately its too big of a problem (or I guess solution) to really successfully implement. Somebody please chime in if there’s something new/interesting happening w/ NSH that I should be reading about. In any case, as compared to SR, they really are different beasts. I think there is some overlap in terms of what NSH was promising and what SR can do. Sure SR can direct traffic through a network, and maybe even to or through some devices on the network but it’s not intending to do “service-chaining” in the same way that NSH was/is.
– Config nightmare: Nope — think that was my biggest takeaway is that SR is pretty much MPLS 3.0. I mean 3.0 because it is just that much simpler, not only to configure, but to troubleshoot as well. I bring up troubleshooting since I think this is/was the biggest and most important part of the whole event – SIDs (Segment IDs) are globally significant. Sounds not very exciting/important by itself eh? Well the reason I think that is so huge is if you’ve ever worked w/ MPLS and you are troubleshooting and trying to understand the end to end label switch path (LSP) then you will know that the labels are all over the place and are significant to the local router — now they are unified across the whole LSP… that’s pretty badass! I should also note that instead of using LDP, SR distributes tag data via TLVs in OSPF or IS-IS, kinda sorta like LDP auto-config.
– Granularity/Service Chaining: I think you can do some of this with SR, but it’s really not its intended use case — a bit more on this later.
– Isn’t MPLS dead?: Heh… yes? No? Obviously it’s not dead in the service provider world, and likely won’t be for a long long time. In the data center… mostly dead is maybe fair? I can say I personally don’t see much/any MPLS in the DC at least. I think that part of why I was bringing this up before was because I was thinking more about an Enterprise DC (as that’s my day-to-day focus). I think you could absolutely use SR in an enterprise DC but I don’t think it’s really the best tool for that job. If you take a look at who presented though, you’ll see that while these are “Enterprises” (well plus Comcast as a Service Provider), but they’re freaking huge, and they’re really their own SPs doing SP type things. (MS is using this in the DCs but in a very hyperscale/SP type way)

Alright so I guess that addresses the points from my previous post, now on to a bit more wordy words to recap my thoughts on SR.

I feel like SR is kind of no-brainer in the SP/WAN world, it really does just seem like a way better way to do MPLS. You’ll still have to layer “stuff” on top of the SR bits (vpnv4/6 address family type stuff or whatever it is you’re running atop your MPLS), but SR just makes the rest seem so trivial. TE just got owned also… seems like there is basically no point to TE as we know it today if you can just use SR-TE to make your life so much easier. All that is well and good but, I don’t live in a provider-centric world really, I focus on data centers so…

While I am now a fan of SR, I feel like it doesn’t have a place in the data center. I know that the folks working on it will probably disagree, and I would like to agree with them but I can’t at this point. The current biggest challenge in the data center (at least at normal enterprise scale) is we still have to have L2 in some capacity. This is a super super super lame requirement, but it is what it is. This requirement is the reason we have jenky spanning-tree kludges (MLAG, vPC, VSS, etc.), FabricPath/TRILL, OTV, VPLS, and now VxLAN. Now from what I understand, there isn’t technically any reason you couldn’t use SR w/ some AToM or VPLS (maybe PBB?) to provide L2 over L3 in the data center, but that sounds like a freaking headache. VxLAN has pretty much won the DC overlay wars, and I don’t see any reason to introduce SR into the DC. Between data centers SR certainly could have a role in providing transport services, or even L2 extension, however even then as VxLAN continues to mature and grow into that role it doesn’t feel like its worth it to tack on another protocol/feature to support that requirement. If SR was the panacea for service chaining that I was kinda hoping it would be, then perhaps I’d feel differently. So at this point, given our stupid “requirements” for L2, I think SR should/will likely stick to the WAN/hyperscale folks. Theres nothing wrong with that of course, but I do feel like its important to delineate where SR is best suited (at least in my mind!).

 

PS – Go watch Paul Mattes presentation (Microsoft), they’re using link-bandwidth in BGP which has always seemed to me to be the best kept secret of BGP. I was very exited to hear they’re really taking advantage of it in production, I rarely see it so that was fun. /end nerdgasm