Tidbit: ACI Fabric Certificate and NTP

Quick post here for a fun little issue I ran into this week.

I’ve been building out a greenfield ACI deployment with some Nexus 6ks “north” of the fabric acting as a core. The 6ks and the fabric is all greenfield at this point, no connectivity to the rest of the customers network or the internet or anything. I configured the fabric to use the 6ks as the NTP server, with the intent that eventually the 6ks will point to a low stratum clock of some kind. Set this up, was happy NTP was synced and the NTP faults went away in the fabric and I went to grab some lunch. When I came back none of the APICs could see each other (controller state unavailable), yet I could ping in band on each APIC to any other APIC…

Well, the NTP configuration worked, but since the 6ks were out of the box and didn’t have their clocks set their time was the default — something in 2001. Normally that’s probably just an annoyance — “hey! my clock is way off!”… not this time! The APICs synced up with the 6ks and then their time was set to whatever time/date in 2001 as well. What I didn’t know then was that there is a certificate on the APICs (automagically there from Cisco) that has a “not valid before” date… and, when NTP synced up it flipped the clock on the APIC to a date that was before the validity of the cert. This caused the whole APIC cluster to fail because the controllers couldn’t build the cluster because this certificate was deemed invalid! I should note here that even though I foobared the entire APIC cluster, the fabric was totally fine and passing traffic and all that was good — just couldn’t change the fabric since the cluster was messed up.

You can verify the health of the cluster with the “acidiag avread” command — this will list out all the APICs in the cluster. The output isn’t very friendly to read, but you should be able to see all your APICs that way.

The “show controller” command is much more friendly:

apic1# show controller
Fabric Name : ACI-Pod-1
Operational Size : 1
Cluster Size : 3
Time Difference : -27031964
Fabric Security Mode : permissive

ID Address In-Band IPv4 Address In-Band IPv6 Address OOB IPv4 Address OOB IPv6 Address Version Fl
ags Serial Number Health
 ---- --------------- --------------- ------------------------- --------------- ------------------------- ------------ --
--- ---------------- ----------------
 1* x.x.x.x 0.0.0.0 0.0.0.0 x.x.x.x 0.0.0.0 1.2(2g) cr
v- xxxxxxxxxxx fully-fit


Flags - c:Commissioned | r:Registered | v:Valid Certificate | a:Approved

From this output you can see the controller and some important information – in band IP (TEP IP in the infra tenant), any in band management IPs, the version, serial number, and importantly the flags. In the above output we can see we have the flags “c”, “r”, and “v” — for this particular scenario the one I care about is that valid certificate flag. If you do like I did and foobar the NTP beyond when the cert is valid this would be something to look for.

Finally, and perhaps easiest of all you can use the “acidiag verifyapic” command to very easily check the certificate status as well as the dates that the certificate is valid for:

apic1# acidiag verifyapic
openssl_check: certificate details
subject= CN=xxxxxxxxxxx,serialNumber=PID:APIC-SERVER-M1 SN:xxxxxxxxxxx
issuer= CN=Cisco Manufacturing CA,O=Cisco Systems
notBefore=Mar 21 03:21:13 2015 GMT
notAfter=Mar 21 03:31:13 2025 GMT
openssl_check: passed
ssh_check: passed
all_checks: passed

So to fix all of this hot mess the date/time had to be set manually on the 6k, but then the time was too far out of range for the APICs to get synced up so we had to actually manually set the time on the APIC. You can do that like so:

dbgtoken

login root

date MMDDhhmmYYYY

Note that you will need the debug token and internal access (TAC) to get this fixed.

Advertisements

ACI – Network vs Application Centric Deployments

If you’ve talked to folks about deploying ACI, chances are that the conversation about a ‘network centric’ vs ‘application centric’ deployment has come up. What the hell is the difference? Isn’t it ALL application centric? I mean it is in the name after all! I don’t think that there is any official definition, but really just some widely agreed upon basics. So lets start with the by trying to define what a ‘network centric’ deployment looks like.

I think in the simplest terms a network centric deployment basically just means that we are taking ACI, and we are treating it just like it is a traditional Nexus switch. We do all the same things we’ve been doing for years in say a Nexus 7k or a Nexus 5k, we just happen to do it on ACI instead. But what does that even mean, that still doesn’t really answer any questions… so here is my definition of a net-centric deployment:

  • Every VLAN (that you would be building if this was indeed a traditional Nexus deployment) = 1 Bridge Domain = 1 EPG
    • Often times the Bridge Domains are configured with flooding enabled
  • VRFs are unenforced, and/or VzAny (permit any/any at the VRF level instead of EPG level) is configured
  • ACI may or may not be the default gateway — i.e. Bridge Domains may or may not have a subnet
    • This often depends on the migration strategy — sometimes an existing 7k houses all default gateways
    • If the fabric isn’t the default gateway or VLANs (EPGs) need to be extended to other traditional devices the Bridge Domain requires flooding (hence point above)
  • ACI may integrate with L4-7 services, however this is done in the ‘traditional’ (see previous post) method — no service graphs

Obviously none of these are hard and fast rules, but this is generally speaking what I would consider a network-centric model. So if that’s the case that’s great, but what does application-centric mean? Do we just build some contracts and call it app-centric? Maybe? I think this is where the definitions start to get a bit fuzzy, but I’ll try to outline what I’ve seen and what I think comprises an app-centric deployment:

  • A single (or very few) Bridge Domains are configured
    • In an app-centric model, we don’t really need to care about flooding domains and extending layer 2 outside of the fabric. The idea here is that we can set our Bridge Domain(s) to be optimized, and just lump everything into a single BD for simplicity sake.
  • Following point one — all EPGs map back to the one or few Bridge Domains
  • VRFs are set to enforced mode — this means the fabric is not going to allow any traffic permitted between EPGs so contracts must be created and applied appropriately
  • L4-7 services may be integrated with managed or unmanaged service graphs
    • (ADC specifically) OR and I’ve seen this recently, in the ‘traditional’ method, however with a single interface into each VRF in the fabric. Normally you’d have the ADC have a leg into each EGP (vPC to the F5 for example with a bunch of dot1q tags representing each EPG) — in this method you would stick the ADC into a single EPG (call it ADC or Load-Balancer or whatever), and each EPG has contracts to the ADC EPG which does all its functionality off of this single interface (front end for VIPs, and inside for SNAT/SubnetIP or use this as default gateway for servers)
  • This one makes people’s heads hurt… Everything lives in a single subnet!
    • We could have multiple subnets roll up to our one BD, but why bother? We’re optimizing flooding, we’re isolating EPGs with contracts, so why not just have a single subnet?

Again, none of this is hard and fast. I think that if I had to be really picky I would say that if you are doing 1:1 EPG:BD in place of where you would traditionally build VLANs that would be net-centric — I would like to say that if you don’t map 1:1 and instead have a single BD with multiple EPGs and allow ACI to optimize flooding that would be app-centric but I think that’s just not fair to the people doing app-centric… I think app-centric really is a lot more than that. App-centric means understanding your application flows and building contracts as appropriate. It means not relying on extending L2 outside of the fabric because you’re not only building a network that eschews L2 in favor of L3, but more critically building apps that don’t require L2 (including the ability to migrate without needing to retain IP address — this obviously only applies across multiple data centers since we have anycast gateways within ACI).

So in summary I suppose that its easy (relatively speaking, obviously building a data center is not a trivial thing) to deploy a net-centric fabric, but it is an entirely other beast to really build an app-centric fabric. That being said, IF you can get everything you need to build contracts appropriately and to understand the flows and to eliminate the need to stretch L2 around, an app-centric fabric is seriously the coolest thing ever. All the complexity, once you get over the initial hurdle at least, starts to go away. One BD, one big fat subnet, no L2 extension to worry about, and security in theory gets simpler since the fabric can make policies re-usable and semi-self-documenting.

To be fair, most people are deploying ACI in a network-centric model as it is definitely the path of least resistance, but I really hope to keep seeing more application-centric deployments as I really think that is the best way to take advantage of ACI and to begin integrating ACI with other tools like CliQr and Service Now to get the most out of it.