Cisco ACI Bootcamp: Notes and Thoughts Pt.3

Here comes part 3 of my ridiculously long notes taken from the ACI boot camp with Cisco this week. You can find part 1 here, and part 2 here. There is a TL;DR in the first post, so I’ll just jump into the details again here.

Before going too much further I wanted to have a tiny section to outline some of the ACI hierarchy as it will get a bit confusing going forward without a high level understanding.

ACI Heirarchy

The above (not very beautiful) diagram outlines the rough hierarchy of ACI. At the highest level, we have the ‘root’ this is just where everything lives like it sounds. Within that, we can define our tenants, more on this below. Within each tenant we have a ‘Private-L3’ domain which is essentially a VRF. I think that you could imagine it as a VRF within a VRF because I *think* that the Tenant level is also a VRF.

Within the VRFs are bridge-domains. These aren’t actually VxLANs, or VLANs, or what you would normally define as a bridge domain. The bridge-domain is like a container for subnets — it’s used to define a L2 boundary, but not like a VLAN, its more a container for L2 domains. You could run legacy L2 services such as Appletalk or IPX within a bridge-domain. Basically the core of this layer is that it allows L2 broadcasts within the contain. It was explained that there will almost always only be a single bridge-domain per tenant, but you could have multiple bridge domains if desired. The use case for having multiple bridge domains would be to support some legacy server to server heartbeat or something that runs its own proprietary L2 protocol that you want seperated from the rest of your network.

Finally, within the bridge domains we have subnets — VLANs, VxLANs, NVGRE (in the future), etc. This is basically ‘normal’ networking stuff. I added EPGs to the drawing, as they essentially group things across the subnets, more on that below.

Policy Framework in ACI:

  • ACI is all about abstraction — again I’m no UCS guy, but it was said multiple times that there are a lot of similarities to service profiles in UCS
    • EPGs are a central piece of ACI; anEPG is a group of endpoints (End Point Group — makes senseeh?)
      • EPGs can span physical and virtual, and different VMMs (virtual machine manager domains) i.e. you can have a physical server, an ESX host, and a Hyper-V host all in one EPG
      • EPGs can be defined as a subnet, VLAN, vPort (in a hypervisor switch), VxLAN, orNVGRE <– these are available at FCS
        • In the future we will also be able to define hosts for an EPG by: DNS, DHCP Pool, VM Attributes, and probably others
      • All members of anEPG have unfettered communication with each other
        • In the future, we will have a private-VLAN type functionality within an EPG
  • Multi-Tenancy
    • Support for 64k(!) tenants
    • Each tenant can have their own (non fabric-admin) dashboard to manage their slice of the ACI fabric
    • ACI supports overlapping of address space across tenants
      • It wasn’t specifically called out, but I believe this is done with normal MPLS VPNs — MP-BGP is used internally to the fabric, so it would make sense
      • A more practical use case for a standard enterprise customer would be to have a Dev/Prod/QA tenant, all of which use the same addressing for testing purposes; once a service passes through Dev/QA, it could be ‘promoted’ into Prod without having to re-IP.
  • Application Network Profiles
    • A set of defined contracts between EPGs — basically a parent object to store policy for a set of EPGs
  • Contracts
    • All traffic is denied by default — you can also run the fabric in a mode where everything is permitted by default
    • Contracts are meant to be re-usable; the whole contract policy model seems similar to MQC in that we can define classes and then re-use classes across different policies etc.
    • ACI has a very ‘provider’ / ‘consumer’ focus when creating contracts and application policies. Something (EPG) provides a service, something else (another EGP) consumes that service
      • Web servers would provide a contract that users would consume, the web tier would consume services from an app tier, the app tier would consume services from the DB tier — going the other way, the DB tier provides services for the app tier, the app tier provides services for the web tier, and the web tier provides services to users
      • CONSUMER = using a service PROVIDER = providing the service — pretty obvious stuff, but very important when building contracts and application profiles
    • Contracts can be bundled; you can create and import/export bundles of contracts; the thought here is that a ‘management’ bundle could be created once and re-used across other ACI fabrics
  • Subjects
    • Subjects are contained within a contract; it’s basically what you are talking about (I believe this means EPGs)
    • Within subjects you ‘do’ things to the subject; filter, action (mark, drop, redirect, etc.), or label
    • There will be more ‘things’ available to do within a subject as ACI develops

 Traffic Flow and Fabric Load Balancing:

  • Internal to the ACI fabric all traffic is ISIS+MP-BGP+VxLAN
    • My best guess is that ISIS and some sort of IP-unnumbered hack provides basic reachability for the VTEPs, and the MP-BGP is just there for multi-tenancy/VRFs
    • VxLAN in the fabric uses the reserved bits to add some proprietary magic; in theory even with this proprietary addition ACI VxLAN should be able to interoperate with other VTEPs (although not sure of a use case for that)
  • Policy is applied to traffic flows on ingress to the fabric IF the leaf node has already received the policy for a particular flow from the APIC
    • If the ingress leaf does NOT have the policy to enforce on the flow, the traffic is sent to the egress leaf where policy is applied — there is a single bit in the reserved field of the VxLAN header that is used to identify if traffic has had policy enforced on it.
    • It seems that this scenario where policy is enforced on the egress will only happen in an initial learning state — after some flows have hit the leaf nodes all the leafs will get the policies pushed to them and be able to apply policy on ingress.
    • All of this policy logic is flow based — much like a stateful firewall — the first packet in a flow is sort of ‘process switched’ and packets thereafter are ‘CEF’ switched (not real CEF but that’s the idea anyway)
  • ACI supportsvPC from leaf node pairs
    • No peer-keepalive of peer-link is required! All the magic to make vPC works happens in the fabric
  • ACI can be thought of almost as a single large bridge – like FabricPath sort of. Because of the internal fabric magic, every leaf is a default gateway for every subnet. This holds true it seems when connecting ACI to ‘external networks’ as well.
  • Although ACI requires a leaf/spine topology, it doesn’t rely solely on good old fashionECMPfor load balancing across the fabric.
    •  APIC maintains atomic counters for all flows to measure the exact time it takes to cross given links in the fabric; because APIC has this granular knowledge of time to cross each link, it can dynamically load balance across all of the available links.
    • Dynamic Flow Prioritization is also used to ensure that elephant flows don’t scare/crush the mice flows.
      • The gist of this is that the first few packets of EVERY flow get put into a priority queue — this allows those mice flows to ‘sneak by’ the elephant flows — and after those first few packets the elephant flows get relegated to the ‘normal’ queue
    • Flowlets
      • I need to read more on this, but they sound very interesting. The gist seems to be that since the fabric has these atomic counters and can so precisely know how long each path through the fabric takes to traverse, we can chop up a TCP flow and send different sections of the flow on different paths.
      • Normally this wouldn’t happen because we don’t want to have out-of-order delivery of the packets (TCP can of course handle that, but that’s extra work), but because we know the exact time it takes to send over each link, the fabric can send packets over different links and ensure in order delivery
      • Instructor said to read this: LINK to learn more

Service Insertion:

  • Service insertion is configured using contracts in Application Profiles — here we can redirect traffic to a firewall or load balancer etc.
  • Vendors (and Cisco for now) can write Device Packages — I liken this t aPDLM forNBAR sort of; Device Packages contain information about the device (F5LTM for example), that helps APIC understand what it is and what it can do
    • Device Packages contain an XML file that has generic information about the device, and a Python script
      • The Python script is a generic script that contains functions that the APIC can call — these functions are things like ‘add endpoint’ ‘remove endpoint’ ‘check health’ etc.
      • The vendor of the device is left to fill in the blanks in the script — all that APIC cares about is that it can call these functions and get information or cause an action to the node
  • At FCS we can do service insertion ‘linearly’; but in the future, we will be able to SPLIT packets off (copy them) to hit multiple services simultaneously. I can see this being used to split traffic and send some of it to a TAP while having the ‘main’ flow unaffected and without having to physically go through the extra hops required to get to the TAP.
  • Basic round robin load balancing is being built into the ACI fabric natively, but is not available yet — heres looking at you NSX
  • Service Graphs:
    • Define how a flow goes through service insertion type devices
    • Service Graphs are referenced in Contracts (which make up Application Profiles) and are assigned to Subjects within the Contracts (again, all very MQC feeling to me)


In the next (and final!) post in this section I’ll wrap up with ACI to ‘the outside world’ connectivity, and some miscellaneous notes/thoughts.

5 thoughts on “Cisco ACI Bootcamp: Notes and Thoughts Pt.3

  1. Pingback: Cisco ACI Bootcamp: Notes and Thoughts Pt.4 | Come Route With Me!

  2. The bridge domain being wider than the EPGs, would this mean that broadcasts will be flooded between EPGs?

    Seems odd if you declare two EPGs that shouldn’t have direct access to each others.

    • Heres my understanding (I am subject to being wrong!):

      The bridge-domain is all about weird L2 stuff — you could use it for some proprietary L2 keep alive or something — because it spans a tenant, you can pipe this L2 traffic across the whole ACI environment.

      As you point out, that gets a little weird from an EPG perspective. I believe that once you allocate hosts/subnets/whatever to an EPG only hosts within that EPG will ‘hear’ each others L2 type stuff. So the bridge-domain is ‘wider’ than the EPG, but once you drill into an EPG it won’t allow non EPG member L2 junk to get to its EPG members. Does that make sense?

      Somebody please correct me if I am wrong 🙂

      • Carl,

        Flooded traffic (broadcast & multicast) will be sent to all (or a subset of, in the case of multicast) endpoints in a bridge domain, regardless of EPG membership. Unicasted extra-EPG L2 junk, as you well put it, will not be received by the endpoints.

        About having one bridge domain per tenant, that sounds natural in the context of ACI and it’s probably where things will eventually evolve: one (or a few) bridge domains, with lots of EPGs in it (them). Presently, however, I am seeing early adopters preferring to have one bridge domain for each EPG, and thus potentially lots of bridge domains per tenant. The ACI term for this is “legacy mode” and it essentially emulates a traditional VLAN – your endpoints who have direct reachability are also in the same broadcast domain. This is a network-centric approach, where the network is still the one dictating application flows, because this is what engineers are used to. And Cisco supports this as a “transition phase” if you will. In this model, the EPG is essentially a VLAN.

        However, in the application-centric approach, the EPG behaves more like a private VLAN, and is more clearly distinguishable as a port group than as an entire VLAN. More emphasis is placed on contracts and filters for policy and security.

  3. Awesome info, thanks for the clarification! Seems like it would be a bit of a pain to have a bunch of bridge domains, but perhaps the benefit of having it feel more traditional helps people wrap their head around things a bit more.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.