Cisco ACI Bootcamp: Notes and Thoughts Pt.3

Here comes part 3 of my ridiculously long notes taken from the ACI boot camp with Cisco this week. You can find part 1 here, and part 2 here. There is a TL;DR in the first post, so I’ll just jump into the details again here.

Before going too much further I wanted to have a tiny section to outline some of the ACI hierarchy as it will get a bit confusing going forward without a high level understanding.

ACI Heirarchy

The above (not very beautiful) diagram outlines the rough hierarchy of ACI. At the highest level, we have the ‘root’ this is just where everything lives like it sounds. Within that, we can define our tenants, more on this below. Within each tenant we have a ‘Private-L3’ domain which is essentially a VRF. I think that you could imagine it as a VRF within a VRF because I *think* that the Tenant level is also a VRF.

Within the VRFs are bridge-domains. These aren’t actually VxLANs, or VLANs, or what you would normally define as a bridge domain. The bridge-domain is like a container for subnets — it’s used to define a L2 boundary, but not like a VLAN, its more a container for L2 domains. You could run legacy L2 services such as Appletalk or IPX within a bridge-domain. Basically the core of this layer is that it allows L2 broadcasts within the contain. It was explained that there will almost always only be a single bridge-domain per tenant, but you could have multiple bridge domains if desired. The use case for having multiple bridge domains would be to support some legacy server to server heartbeat or something that runs its own proprietary L2 protocol that you want seperated from the rest of your network.

Finally, within the bridge domains we have subnets — VLANs, VxLANs, NVGRE (in the future), etc. This is basically ‘normal’ networking stuff. I added EPGs to the drawing, as they essentially group things across the subnets, more on that below.

Policy Framework in ACI:

  • ACI is all about abstraction — again I’m no UCS guy, but it was said multiple times that there are a lot of similarities to service profiles in UCS
    • EPGs are a central piece of ACI; anEPG is a group of endpoints (End Point Group — makes senseeh?)
      • EPGs can span physical and virtual, and different VMMs (virtual machine manager domains) i.e. you can have a physical server, an ESX host, and a Hyper-V host all in one EPG
      • EPGs can be defined as a subnet, VLAN, vPort (in a hypervisor switch), VxLAN, orNVGRE <– these are available at FCS
        • In the future we will also be able to define hosts for an EPG by: DNS, DHCP Pool, VM Attributes, and probably others
      • All members of anEPG have unfettered communication with each other
        • In the future, we will have a private-VLAN type functionality within an EPG
  • Multi-Tenancy
    • Support for 64k(!) tenants
    • Each tenant can have their own (non fabric-admin) dashboard to manage their slice of the ACI fabric
    • ACI supports overlapping of address space across tenants
      • It wasn’t specifically called out, but I believe this is done with normal MPLS VPNs — MP-BGP is used internally to the fabric, so it would make sense
      • A more practical use case for a standard enterprise customer would be to have a Dev/Prod/QA tenant, all of which use the same addressing for testing purposes; once a service passes through Dev/QA, it could be ‘promoted’ into Prod without having to re-IP.
  • Application Network Profiles
    • A set of defined contracts between EPGs — basically a parent object to store policy for a set of EPGs
  • Contracts
    • All traffic is denied by default — you can also run the fabric in a mode where everything is permitted by default
    • Contracts are meant to be re-usable; the whole contract policy model seems similar to MQC in that we can define classes and then re-use classes across different policies etc.
    • ACI has a very ‘provider’ / ‘consumer’ focus when creating contracts and application policies. Something (EPG) provides a service, something else (another EGP) consumes that service
      • Web servers would provide a contract that users would consume, the web tier would consume services from an app tier, the app tier would consume services from the DB tier — going the other way, the DB tier provides services for the app tier, the app tier provides services for the web tier, and the web tier provides services to users
      • CONSUMER = using a service PROVIDER = providing the service — pretty obvious stuff, but very important when building contracts and application profiles
    • Contracts can be bundled; you can create and import/export bundles of contracts; the thought here is that a ‘management’ bundle could be created once and re-used across other ACI fabrics
  • Subjects
    • Subjects are contained within a contract; it’s basically what you are talking about (I believe this means EPGs)
    • Within subjects you ‘do’ things to the subject; filter, action (mark, drop, redirect, etc.), or label
    • There will be more ‘things’ available to do within a subject as ACI develops

 Traffic Flow and Fabric Load Balancing:

  • Internal to the ACI fabric all traffic is ISIS+MP-BGP+VxLAN
    • My best guess is that ISIS and some sort of IP-unnumbered hack provides basic reachability for the VTEPs, and the MP-BGP is just there for multi-tenancy/VRFs
    • VxLAN in the fabric uses the reserved bits to add some proprietary magic; in theory even with this proprietary addition ACI VxLAN should be able to interoperate with other VTEPs (although not sure of a use case for that)
  • Policy is applied to traffic flows on ingress to the fabric IF the leaf node has already received the policy for a particular flow from the APIC
    • If the ingress leaf does NOT have the policy to enforce on the flow, the traffic is sent to the egress leaf where policy is applied — there is a single bit in the reserved field of the VxLAN header that is used to identify if traffic has had policy enforced on it.
    • It seems that this scenario where policy is enforced on the egress will only happen in an initial learning state — after some flows have hit the leaf nodes all the leafs will get the policies pushed to them and be able to apply policy on ingress.
    • All of this policy logic is flow based — much like a stateful firewall — the first packet in a flow is sort of ‘process switched’ and packets thereafter are ‘CEF’ switched (not real CEF but that’s the idea anyway)
  • ACI supportsvPC from leaf node pairs
    • No peer-keepalive of peer-link is required! All the magic to make vPC works happens in the fabric
  • ACI can be thought of almost as a single large bridge – like FabricPath sort of. Because of the internal fabric magic, every leaf is a default gateway for every subnet. This holds true it seems when connecting ACI to ‘external networks’ as well.
  • Although ACI requires a leaf/spine topology, it doesn’t rely solely on good old fashionECMPfor load balancing across the fabric.
    •  APIC maintains atomic counters for all flows to measure the exact time it takes to cross given links in the fabric; because APIC has this granular knowledge of time to cross each link, it can dynamically load balance across all of the available links.
    • Dynamic Flow Prioritization is also used to ensure that elephant flows don’t scare/crush the mice flows.
      • The gist of this is that the first few packets of EVERY flow get put into a priority queue — this allows those mice flows to ‘sneak by’ the elephant flows — and after those first few packets the elephant flows get relegated to the ‘normal’ queue
    • Flowlets
      • I need to read more on this, but they sound very interesting. The gist seems to be that since the fabric has these atomic counters and can so precisely know how long each path through the fabric takes to traverse, we can chop up a TCP flow and send different sections of the flow on different paths.
      • Normally this wouldn’t happen because we don’t want to have out-of-order delivery of the packets (TCP can of course handle that, but that’s extra work), but because we know the exact time it takes to send over each link, the fabric can send packets over different links and ensure in order delivery
      • Instructor said to read this: LINK to learn more

Service Insertion:

  • Service insertion is configured using contracts in Application Profiles — here we can redirect traffic to a firewall or load balancer etc.
  • Vendors (and Cisco for now) can write Device Packages — I liken this t aPDLM forNBAR sort of; Device Packages contain information about the device (F5LTM for example), that helps APIC understand what it is and what it can do
    • Device Packages contain an XML file that has generic information about the device, and a Python script
      • The Python script is a generic script that contains functions that the APIC can call — these functions are things like ‘add endpoint’ ‘remove endpoint’ ‘check health’ etc.
      • The vendor of the device is left to fill in the blanks in the script — all that APIC cares about is that it can call these functions and get information or cause an action to the node
  • At FCS we can do service insertion ‘linearly’; but in the future, we will be able to SPLIT packets off (copy them) to hit multiple services simultaneously. I can see this being used to split traffic and send some of it to a TAP while having the ‘main’ flow unaffected and without having to physically go through the extra hops required to get to the TAP.
  • Basic round robin load balancing is being built into the ACI fabric natively, but is not available yet — heres looking at you NSX
  • Service Graphs:
    • Define how a flow goes through service insertion type devices
    • Service Graphs are referenced in Contracts (which make up Application Profiles) and are assigned to Subjects within the Contracts (again, all very MQC feeling to me)

 

In the next (and final!) post in this section I’ll wrap up with ACI to ‘the outside world’ connectivity, and some miscellaneous notes/thoughts.

Cisco ACI Bootcamp: Notes and Thoughts Pt.2

Here we go with part 2 of my ridiculously long notes taken from the ACI boot camp with Cisco this week. You can find part 1 here. There is a TL;DR in the first post, so I’ll just jump into the details again here.

APIC Controller:

  • APICs must be deployed in N+2 flavor. This has to do with the ‘shard’ (notshart… thankfully) data structure and how data is distributed across the controllers. It sounds like if you deploy threeAPICs, and lose one nothing will happen. If you lose a second controller simultaneously you may lose data. However if the second failure doesn’t happen right away it sounds like the controllers will have time to synchronize again and in theory you could lose the controller two without impact.
    • You can deploy up to 32 APICs… thats crazy town. Even the instructor said there is almost never a requirement for more than three controllers. I suppose you can gain even more fault tolerance with more controllers though if you are super paranoid like that!
  • APICs connect directly to ACI Leaf nodes, and only to ACI leaf nodes.
  • On boot up the APIC verifies the ACI topology to make sure its a leaf/spine topology and that everything is looking good to go — it does this withLLDP ACITLVs
    • The APIC will disable links that are out of policy — i.e. links that would not be allowed in a leaf/spine topology
  • APIC can be managed in-band or out-of-band; I’m assuming that initial management must be out-of-band or CLI, but the lab had pre-setup jump box to get to the APIC so I don’t know for sure.
  • Very importantly, the APIC is NOT in the data path and it is NOT the control plane.
    • The APIC is very much like a VSM in a 1000v deployment. Its only there to define policy and then push that policy out to the Leafs/Spines (or VEMs in the 1000v example)
    • If the APIC(s) go offline everything keeps going as defined; you can’t define policy with the APICs offline though. Again, very 1000v VSMesq
  • Management access to the APIC does normalRBAC type stuff — AD/ACS/RADIUS
    • Perhaps importantly you can also do full-blown RBAC on a per tenant basis
  • Observer:
    • This is a process that runs all the time to monitor the ‘health’ of the fabric
    • There is essentially a ‘health score’ that looks at things like link state, drops, latency, health score of dependent objects, remaining bandwidth, etc.; this is all monitored by ‘Observer’
    • In the future it sounds like there are plans to do this with weighted metrics per application; i.e. track jitter for voice calls and weight that higher than some other metric
  • API Magic Northbound
    • JSON and XML duh
    • Sounds like there is already a Python SDK to poke the REST API on theAPICs
      • It was stressed that literally EVERYTHING you can do in the GUI you can do in the CLI or via the API
    • DevOps Libraries; Chef and Puppet libraries are coming soon
  • API Magic Southbound
    • Not entirely sure how this part works, but APIC can obviously control ACI enabled devices
      • Right now this seems to mostly be Cisco stuff of course
      • Open vSwitch is coming (if not available now, don’t know for sure)
      • Hyper-V support is coming; there’s also some serious hooks with Azure (private cloud not the Public cloud service)
    • Vendors will be able to help develop more ‘L4-L7 Scripting APIs’ for APIC to control things like F5, Citrix, Sourcefire gear

Integrating with Hypervisors/Servers:

  • vCenter/ESX integration is shipping
  • Hyper-V is in production testing, and will be coming soon (6-9 months it sounds like)
  • KVM with Openstack integration is coming as well (perhaps similar timeline to Hyper-V?)
  • APIC ties into the hypervisor manager — again very much like VSM — vCenter, SCVMM, Redhat, others?
  • APIC can create port-groups in vCenter (VSM!!)
    • APIC does NOT need any particular hypervisor switch to do this; it will work with DvS, default Hyper-V switch, Open vSwitch, etc.
    • AVS (Application Virtual Switch) will enable more functionality, but is NOT required
    • It does this with API calls to vCenter — it creates its own “ACI” DvS
    • For existing VMs you can just assign port-groups to them as normal
  • APIC works with physical servers too; you still configure server facing ports in basically the same way access/trunk/vlans/etc. the configuration can be done through the GUI quite easily — there is basically no magic here, its just a port
  • Different flavors of hypervisors (i.e. vCenter, SCVMM, etc.) are grouped into VMM Domains (virtual machine manager domains)
  • EPGs can stretch across VMM domains – we haven’t talked about EPGs yet, but basically just because the hypervisors are logically separated into VMM domains we can still do any and all policy stuff between them and have a VM in vCenter and a VM in Hyper-V grouped into a single policy element.
  • As stated before, right now we do NOT have a way to prevent hosts in the same subnet on the same hypervisor switch from communicating — this will come with vShield integration and/or the AVS (seriously Cisco… VSG does this today…)
  • APIC is intended to be managed by other automation tools likevCAC in the future (guess it could be done now since the APIs are open)
    • Goal being to be very service provider-y where there is a catalog and customers just ‘click click next finish’ to deploy services/applications
  • Hyper-V
    • Azure can be used as this automation tool described above – its limited at the moment, but can deploy services and the like already
    • Initial Hyper-V support will be only for VLANs — no NVGRE just yet, but it will come
    • APIC integration REQUIRES the Azure pack
    • You can actually manage some ACI things in Azure
      • This is mostly basic stuff at the moment but it looks cool
      • You do NOT have to use Azure to do things though — particularly important if you want single pane of glass but have vCenter and Hyper-V — you can still do all the ACI stuff via the APIC

 

Thats all I’ve got time to clean up this morning. Still to come:

  • Policy Framework in ACI
  • Traffic Flow and Fabric Load Balancing
  • Service Insertion
  • ACI Connectivity to the Outside World
  • Misc. Notes