State of the Union of ACI — Have we reached “peak” ACI?

Woohoo click bait title!! This post is actually a good long while overdue, but work and my other passion (car projects!) have been consuming a goodly amount of my time and attention.

So what exactly is “peak” ACI and why do I think we’ve reached it? Also… is that a good thing or a bad thing? I suppose I should start by adding a bit of back story to this to set the stage. ACI is I suppose the first product/solution that I’ve had as a central  focus, and not just that, but had as a central focus really since the commercial inception (FCS if you want to use Cisco wordy-words (first customer ship for my non-Cisco sales-y friends)). Previously in my career I’ve of course been involved with let’s say the Nexus 7000 line since the early days of that product line, but I was also doing things on Cat6ks, ASRs, ISRs, different firewalls, and a menagerie of different switches from other vendors too. So this journey with ACI has been a unique one for me.

Despite ACI being so much of a focus for me, I’m not actually associated with the product in any way (i.e. I don’t work for Cisco/INSBU) other than designing/installing it, and occasionally teaching a class on it. This gives me an interesting perspective on it. Now before I go any further, I should mention a few things:

1) (Obligatory) This post does not represent any viewpoints of my employer, or Cisco, or anyone other than myself

2) I am absolutely an ACI fan boy, not because of the logo on the box, but because I’ve worked on a lot of different products and I happen to think this is a good one. I’m happy to have a discussion about my thoughts on this at any time, feel free to tweet/comment away.

3) Just because I am a fan boy doesn’t mean I don’t have critical opinions (as you’ll see in this post)

With all that out-of-the-way — what is “peak” ACI? In short, I think the product has achieved its desired goals, and everything from this point on is simply bloatware being added in to appease large customers and their giant POs. Awfully cynical eh? Let’s dissect that a bit more.

The very early days of ACI were, shall we say… interesting. It is my strong opinion that the folks who created ACI (and other “SDN” products for that matter) looked deep into their crystal balls and saw the future, or at least their hope for the future. They saw APIs – and with it the death of the CLI, containers and micro-services more generically, and the synergy of hardware and software (buzzworddddd binggoooooo!); in short, they had a vision. If you look back to the 1.x versions of ACI, and dispel the punditry and bemoaning that ran rampant on Twitter/blogs, you’ll see that the product itself was executed quite well — IF, and only if you too believed in the aforementioned vision of how the data center should look. There was kind of a CLI, but not really. There was a GUI, but it was much loathed — yet the API gave you 100% clear programmatic control of the fabric. There were oddities (to “normal” network folk), things like not supporting routing “through” ACI. None of these shortcomings would matter in the least though if you too bought into the vision.

Frankly all this talk about “visions” is total supposition on my part. Though I do believe there is compelling reason to believe my version of this recent history. Ultimately I’m neither agreeing with nor diverging from the original vision of ACI, however the “market” (channel partners, end customers, bloggers, tweeters, etc.) clearly did not agree. Cisco to their credit (though it seems no-one wants to acknowledge this) reacted relatively quickly. By the 1.2 code release major changes to the GUI were implemented — improving if nothing else the aesthetics. A very familiar “NX-OS style” CLI was implemented — ultimately doing nothing more than masking the very same API calls the creators of the product hoped you would use. Transit routing was quickly added into the mix as well. It is entirely possible that these additions were on the roadmap anyway though given my thoughts on the “vision” of the product I would find that to be a dubious assumption at best.

In any case, features have been of course added since the initial release of the product — things like multicast support, IPv6 support, additional nerd knobs on routing protocols, and many more (in my opinion) necessary features. With the relatively recent release of the 2.0 train — codenamed “Congo” — I, and many other folks I speak with, feel that ACI has reached “mature” status. That’s not a knock that it wasn’t stable before this (there were bugs to be sure, but there are bugs in EVERYTHING — there were bugs in my first “Hello World” script so you know…), just that ACI has moved from minimal viable product for a commercial release to a fully featured, mature, data center platform.

This however does bring us back to my theory regarding “peak” ACI. We now have literally more features in ACI than I can keep up with — and recall that ACI has been my central/primary focus! Yet even so, the features keep pouring in. Now certainly there will always been feature adds as hardware/software improves — the new FX blades/switches are an excellent example of this; they add MACSEC encryption all over the place, this was just not possible on previous hardware version — but we are getting more and more features that don’t have an obvious use for 99% of customers that I see. As with every other product in the history of products, money talks, so of course if large customers have demand for feature XYZ and money to back it then feature XYZ it is! The problem with this is that for the 99% these features add confusion, and a larger code base, which of course means a larger surface area for our friends the bugs to latch on to!

My principal complaint about the feature bloat isn’t even the expanded code base and potential for bugs (thus far with ACI at least bugs have been very isolated to particular features so hopefully that trend continues), but instead the confusion that this brings to the field. I feel at this point I should also make it clear that this is of course not an ACI specific problem. Regardless, the problem persists. I’ve been on many a call with partners and/or Cisco folks who are not actual practitioners/hands-on users of ACI and yet they insist on pushing new features — eking out every last bit of (perceived?)”value” so that they can make the sell to customers. Don’t get me wrong — new features will absolutely solve needs customers have, but my primary concern is that the real value of ACI is overlooked, under-sold, or completely ignored, perhaps in the face “progress,” or more likely in the face of sales targets for newer products.

Hence we have arrived at peak ACI. I don’t mean this in the stupid Gartner way. I mean to say that we have collectively moved on to a point where the core benefits of ACI are ignored, pushed aside, or otherwise looked over in favor of the up-sell. This post has turned into a bit of a call to action to remember that keeping it simple works. Always. All the time. I am, and will likely remain, an ACI fan boy; however I now am looking at new features with a cynical eye. I’ll still advocate ACI, however I will absolutely continue telling people to stick to the “KISS” (Keep It Simple Stupid!) principles.

WTF Are all those Checkboxes? (ACI L3 Outs) – Part 1 of ???

Here is a screenshot of some of said checkboxes in case you don’t know what the hell I’m going on about:

 

wtf-checkboxes

If you still don’t know what these things are, these are part of an L3 Out in ACI, specifically these are options that are configurable on the “subnet” of a “network” on an L3 Out. Essentially these control the import/export of prefixes for the L3 out. Since I literally always forget which one does what I figured other people probably do too,  let’s try and go through each of them and figure it out.

First off, lets outline our really really simple topology we’ll be using for this. I will be working with two leafs, and two virtual routers (CSR1000v) connected via two vPCs to a pair of UCS Fabric Interconnects. My CSRs each live in a single VLAN (two total VLANs, one per CSR) that is plumbed through UCS up into ACI (no VMM stuff, just a good ol’ fashion VLAN piped up to the leafs). So more or less I have two routers connected via vPC to my leafs, and I’m routing on a VLAN between everything. Clear as mud?

I’m going to mostly ignore the actual setup of the L3 Out itself, as I want to focus on these damn check boxes. For now, we will just start with our L3 Out “CSR-1” which is a simple OSPF area 0 L3 Out, our adjacencies between this L3 Out and CSR-1 (the router, not the L3 Out) are already up and look good.

screen-shot-2016-09-21-at-1-42-17-pm

Okay great, so we have an L3 Out, we’ve got some adjacencies now what? Well first things first, this is a “regular” area in OSPF, so we should be exchanging routes between the CSR and ACI (both CSR and ACI should be advertising their loopbacks – CSR-1 and CSR-2 are 1.1.1.1 and 2.2.2.2 respectively, and Leaf103 and Leaf104 are 3.3.3.3 and 4.4.4.4 respectively), so lets take a look on CSR-1:

CSR-1#sh ip route ospf
Codes: L - local, C - connected, S - static, R - RIP, M - mobile, B - BGP
 D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
 N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
 E1 - OSPF external type 1, E2 - OSPF external type 2
 i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
 ia - IS-IS inter area, * - candidate default, U - per-user static route
 o - ODR, P - periodic downloaded static route, H - NHRP, l - LISP
 a - application route
 + - replicated route, % - next hop override, p - overrides from PfR

Gateway of last resort is not set

3.0.0.0/32 is subnetted, 1 subnets
O 3.3.3.3 [110/2] via 10.1.1.2, 00:00:08, GigabitEthernet2
 4.0.0.0/32 is subnetted, 1 subnets
O 4.4.4.4 [110/2] via 10.1.1.3, 00:00:08, GigabitEthernet2
O E2 192.168.1.0/24 [110/20] via 10.1.1.3, 00:02:21, GigabitEthernet2
                    [110/20] via 10.1.1.2, 00:02:22, GigabitEthernet2

Okay cool, so obviously OSPF is up as we are learning about the loopbacks of the Leafs. Good start. How does it look on the ACI side (note you can do all the stuff I will be doing in the GUI if you’d like but in the interest of not taking up a million pages of screenshots I’ll use the CLI)?

Leaf103# show ip route ospf vrf Carl-Testbed:Carl-Testbed
IP Route Table for VRF "Carl-Testbed:Carl-Testbed"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>

Leaf103#

So that’s not really what we are wanting to see is it…? We have relationships up, and it’s obviously working since the CSR gets the routes from ACI, so what gives? (Before you think it’s a stub area or something like that it’s not, this has to do with check boxes remember!?) Well out of habit I created the External EPG (under “Network” in the L3 Out) with a subnet of 0.0.0.0/0 — why? Because usually I like to make ACI a stub area and just take a default route and be on with my life (which requires no check box foo). So what does this have to do with anything? Well lets take a look at those check boxes a little closer, we have the following options under the “Scope” heading:

  • Export Route Control Subnet
  • Import Route Control Subnet (currently grayed out)
  • External Subnets for the External EPG (checked by default)
  • Shared Route Control Subnet
  • Shared Security Import Subnet

So before we can have any traffic flowing between the CSR and ACI, we probably need to figure out why ACI is NOT getting any routes from the CSR (note right now I have one of the CSRs shut down so we are focusing just on CSR-1 at the moment). Lets take a look at that mysterious little “i”  button in the top right of the screen… it sometimes has pretty good information.

screen-shot-2016-09-22-at-8-06-21-am

 

Okay so lots of words. Think we know what an IP is, but lets take a bit closer look at that one to start with… “subnet IP address to be imported from the outside into the fabric.” Okay, so that basically sounds like it may be used in route filtering, lets keep reading… “contracts associated with its parent […] are applied to the subnet.” Alright so we know have an idea that the subnet(s) configured under the “Network” in the L3 Out have at least two purposes 1) defining routes that can be imported into the fabric, and 2) this subnet is related to contracts somehow. Since we still don’t have a route from the CSR in the fabric, lets dig a bit more as to why…

Per the information page it mentions that this subnet matches a route to be imported into the fabric, well I just put 0.0.0.0/0 which would match nothing but a default route right? So let’s try and add a subnet to the “Network” (which is a bit of a confusing name for the parent folder) that corresponds with the loopback on CSR-1:

screen-shot-2016-09-22-at-8-11-50-am

 

Alrighty, subnet added (with only the default “External Subnets” box checked). Lets see if we get a route on the leaf:

 

Leaf103# show ip route ospf vrf Carl-Testbed:Carl-Testbed
IP Route Table for VRF "Carl-Testbed:Carl-Testbed"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>

Leaf103#

 

No? Well now we are matching an exact prefix, at least in theory, but what about those damn check boxes? If you’ll notice, there’s a box that is grayed out for “Import Route Control Subnet.” That sounds vaguely helpful right? The note from the “i” page says “Controls the import route direction.” Yeah, that seems kinda like what we want, but why is it grayed out, and more importantly how do we un-grey it out? To do that we have to take a look at the parent L3 Out object.

l3out-import

 

Back up on the top-level of the L3 Out (parent object), there are two boxes for “Route Control Enforcement” one for Import and one for Export. As far as I know Export is always checked, but Import is not checked by default. If we want to bring routes into the fabric, we will need to check this box, so I’m going to go ahead and do just that. Do note that Import control is not supported for EIGRP. With the Import Route Control box checked, our “Import Route Control Subnet” box is now check-able for our subnet, so I’ve gone ahead and ticked that box as well. In theory, we will now get the loopback from CSR-1 installed in the route table on our leaf switches… (drum roll please!):

Leaf3# show ip route ospf vrf Carl-Testbed:Carl-Testbed
IP Route Table for VRF "Carl-Testbed:Carl-Testbed"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>

1.1.1.1/32, ubest/mbest: 1/0
 *via 10.1.1.5, vlan73, [110/5], 00:00:01, ospf-default, intra
Leaf3#

Well look at that, sure enough we now install the route for the loopback. Now what if I decided that I didn’t want a /32 loopback on my CSR and did something like this:

CSR-1#show run int lo0
Building configuration...

Current configuration : 63 bytes
!
interface Loopback0
 ip address 1.1.1.1 255.255.255.255
end

Carl-Test-1#conf t
Enter configuration commands, one per line. End with CNTL/Z.
CSR-1(config)#int lo0
CSR-1(config-if)#ip add 1.1.1.1 255.255.255.0
CSR-1(config-if)#ip ospf network point-to-point
CSR-1(config-if)#end

So now my loopback is a /24, and I’ve added ospf network type of point-to-point to ensure OSPF doesn’t just insert the loopback IP as a /32 into the database. Lets pop back over and see what the leaf things about this:

Leaf3# show ip route ospf vrf Carl-Testbed:Carl-Testbed
IP Route Table for VRF "Carl-Testbed:Carl-Testbed"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>

Leaf3#

The leaf sure doesn’t think that’s very cool. When we first started looking at the list of check boxes, there was a whole other section that I left out, there are three boxes in the “Aggregate” section.

  • Aggregate Export
  • Aggregate Import
  • Aggregate Shared Routes

The information page is again our friend here, the very first line in the “Aggregate” section states: “when aggregation is not set, the subnets are matched exactly.” Which perfectly explains why when we changed our loopback IP we lost the route in ACI. So knowing that we can summarize, I’m going to go ahead and delete the 1.1.1.1/32 subnet from the network, and enable “Import Route Control Subnet” as well as “Aggregate Import,” and then lets take a look at the routing table on the leaf again:

Leaf3# show ip route ospf vrf Carl-Testbed:Carl-Testbed
IP Route Table for VRF "Carl-Testbed:Carl-Testbed"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>

1.1.1.0/24, ubest/mbest: 1/0
 *via 10.1.1.5, vlan73, [110/5], 00:00:16, ospf-default, intra
Leaf3#

Bam! Alrighty so if you were paying attention earlier you may have noticed that there was a 192.168.1.0/24 prefix in the route table on the CSR, that subnet lives in this same VRF in ACI and contains an Ubuntu VM, so lets pop over to that and see if we can ping through to our loopback.

screen-shot-2016-09-22-at-9-08-37-am

Well it starts out looking pretty good… we can ping our default gateway, but we can’t ping our loopback. We can however ping the loopback from a leaf. Remember that box checked by default? “External Subnets for the External EPG”? Turns out there’s a bit more to that! By default in ACI VRFs are in “enforced” mode, which just means that you need to have contracts in place to permit traffic to flow between EPGs. This subnet we’ve been talking about is in fact an EPG, it is a prefix-based EPG. That little tick box that is set by default simply says “hey this subnet that I’m defining is now part of the EPG “ExternalEPG” (the name we gave our “network”).” Okay cool so it’s an EPG, and we need contracts but where do those go…

contracts

In our “Network” (what is basically an EPG!), there is a tab on the main Policy pane for contracts. If you open that tab you will have options for provide and consume. I’ve added a simple permit any contract between the EPG containing my VM and this EPG in our L3 Out, and another drum roll….

screen-shot-2016-09-22-at-9-21-02-am

Awesome! So far we’ve covered the main Import Route Control check box, Import Route Control Subnet, and External Subnets for the External EPG “Scope” check boxes, as well as the Aggregate Import check box. We’ve still got quite a few check boxes to figure out though, so stay tuned for some more!