The Mess that is Micro-segmentation

In the last few years, with the advent of “micro-segmentation,” there has been a recurring challenge in data center design and deployment… I think I can summarize that challenge pretty concisely: “WHAT THE HELL DO MY APPS ACTUALLY DO!?” Okay, okay, so maybe we know *what* they do, but who the hell knows how they work… seriously… if you know who knows please let me know.

This was never really an issue for networking folk before the last little bit and all this fuss about microseg, so we never really figured out how to address it. We’ve generally asked the server guys what VLAN they want the ports to be in, and then sometimes we have to ask about ports for public facing things (stuff in the DMZ perhaps), but that has been more or less the totality of our involvement with the applications on our networks. Now you can obviously look at that and say well that’s maybe not the best thing… maybe we should have had more involvement and a more thorough understanding about what rides atop our (hopefully) super awesome networks. Yeah… probably, but let’s be honest, it’s really hard.

So while we never cared enough (or insert excuse/reason here) to make the effort to understand our applications before, we now are being forced to. Why? Well, it’s probably for the best to start, but also the obvious trend toward micro-segmentation is the big driver here. I won’t deign to try to define “micro-segmentation” as I think that it’s yet another buzzy buzz word, but regardless, the reality is in a modern DC design, you really can’t (and arguably shouldn’t) avoid it. So how do we go about understanding what our application flows look like when 90%+ companies really don’t understand their own apps?

The way I see it you really only have a handful of options, and none of them are terribly great. You could be cheap and rely on packet captures off of SPAN ports or via ERSPAN (yay Wireshark!). I actually kinda love this (also hate it though…) and I am working on a little side project w/ Python+Tshark+ACI to do some cool stuff (stay tuned for that), so this is *viable*, but not great option. Limitations are of course SPAN/ERSPAN support on devices, the fact that you still have to SPAN the traffic to *something* which is likely an old pizza box you found in the corner of the data center and shook the dust out of, or to a VM. Either the dusty pizza box or the venerable VM both have the same limitation — capacity to consume data. You’ll never be able to SPAN a busy EPG in ACI to a single 10GB attached box… that one EPG could be strewn across your data center with tens or hundreds of devices attached to it. In a non clos topology it *may* be a little easier as you can SPAN at “choke points” but the fact remains that you are throughput limited and can’t scale SPAN/ERSPAN.

The other primary option in my mind is the use of TAPs and specialized tooling. This option is REALLY good in a traditional environment (again choke points are important), but only *if* you want to pony up! I don’t really have anything against this particular option other than the cost. That being said, any modern data center will likely be a clos topology which makes taking advantage of TAPs rather difficult (as in you would miss all traffic local to a leaf due to anycast gateway functionality).

So all that being said, how are we supposed to do application dependency mapping (ADM), in order to actually figure out what this whole microsegmentation config should look like? I have no good answer! I think it will ultimately be a combination of more educated network folk, better cross-team communication (i.e. app guys talk to network guys and visa versa), and better tooling/analytics within the network itself.

Tooling is perhaps easier than the layer eight stuff (people!), but there is still no simple solution. In my mind there are a few critical aspects to address in whatever the tooling ends up looking like:

  • Distributed Systems work.
    • As I’ve said before the only way to do things at scale is to do things in a distributed nature. With respect to visibility/ADM this probably mostly means that the flow data needs to come from the EDGE of the network. This is even more critical in the modern spine/leaf style data center… you can no longer rely on stuffing a TAP into a choke point, or SPAN’ing all traffic at that choke point — you will lose all visibility into traffic local to a leaf or a pair of leafs.
    • Not having a choke point drives a need for distributing capture points by itself, but there is another just as critical reason for distributing the effort flow mapping — 40/100G pipes… yeah… can’t really SPAN a 100G link to a VM. I mean I guess you could, but don’t think that VM would last long before turning into a smoldering pile of computer bits. What I’m getting at here is that you just simply can’t keep up with the amount of data modern DCs are capable of pushing, so the only way to handle that is to parse smaller amounts of data, and the only way to do that is to do it at the edge of the DC.
  • But Centralized Management is key.
    • All that hullabaloo about Distributed Systems and why we have to investigate app flows at the edge, but the reality is nobody has time to manage a bajillion SPANs or TAPs, so we still need to have some sort of central management plane where we can distill all the info we are gathering at the edge into something actually consumable by humans.
  • Flexibility counts.
    • SPANs are great because pretty much every vendor supports SPAN, but there are definitely limitations. TAPs are great, but again they’re not very flexible. Hardware sensors (thinking about things like Cisco Tetration here), are great, but often costly and still not super flexible. Software sensors are cool (and like ALL THE WAY at the edge which is good!), but agents are sometimes a PITA at a minimum and sometimes not viable in all environments. Moral of the story here is that you likely need to support multiple flavors of these options in order to be dynamic enough to adapt to multiple environments.
  • API all the things!
    • This is just table stakes now. I know that not everyone cares, and I know that not everyone needs things to be all magic and programmable, but dear lord, just have an API… it’s not that hard and it will make things better for some people. I want to be able to consume that pared down data in some easy to consume fashion, that really means an API, because I am so freaking done with spreadsheets and CSV files đŸ™‚

One possible tool to assist with the Mess that is Micro-segmentation is Illumio’s Adaptive Security Platform. At the recent NFD12 Illumio presented their take on not just ADM, but also how they take that data and turn it into real life security policies. From what we saw they have a pretty slick solution that uses endpoint agents on Linux or Windows guests. These agents report information back up to a central controller (the Policy Compute Engine), from which you can choose to enforce security policies via the guests native firewall tooling (IP Tables or Windows Firewall). Take a look at the video from Illumio’s presentation (and the other presenters from NFD12!) here at the TFD NFD12 YouTube channel. Or dive right into the Illumio demo right here:

Selfishly, I can see Illumio ASP being really useful for me — working predominantly with ACI, I often work with customers who need exactly the kind of information Illumio can provide. Moreover, I found it extremely convenient that the language and structure that Illumio employs is very similar to that of ACI, with a focus on a provide and consume relationship between tiers of applications (not that provide/consume is some mythical/odd thing, just seemed nice and handy). All of that is wrapped up with an API, which based on the presentations, should hold all the data I would need to build out contracts in ACI, you could even employ both Illumio and ACI from a security perspective if were so inclined. The possibilities really are quite interesting — I’ve been envisioning building out net-new servers/services with the Illumio agent already installed, plopping it into a “dev” type tenant and letting it run for some amount of time, then (all programmatically) promoting that workload into “prod” and automatically generating contracts based on the information Illumio has gathered for you. Exciting possibilities indeed!

All of my selfish ACI musings aside, Illumio really ticks a lot of the boxes for me in terms of what an ADM tool should be, then it goes above and beyond by actually providing a mechanism to enforce the policies. Illumio may not be a magical ADM panacea (if it is, figure out when/if they’re going to IPO!), but it sure looks intriguing!

 

Disclaimer: I was at Networking Field Day 12 as a delegate. The event was sponsored by vendors, including Illumio. But all the garbage I write here is my own opinion. I’m bad at disclaimers.

Advertisements

ACI Fabric Access Policy Best Practices

I’ve been meaning to write this post for quite a while now as I think this is one of the most important pieces to a solid ACI deployment — having well-defined fabric access policies. If you screw this up early in your ACI deployment you can always fix it but it can be a big pain in the ass. Over the course of the last two years(ish) and a bunch of ACI deployments this is the best strategy I’ve seen; big thanks to my friends @highspeedsnow and @therealaciguy for coming up with the guts of this “design” (or whatever you wanna call it).

Before we dive in, lets review some basics by going over the relevant bits we will discussing:

Interface Policy Groups:

  • An Interface Policy Group is basically a folder that contains L1/L2 traits that you would like to apply to a port or group of ports.
  • Interface Policy Groups come in three main flavors: access, port-channel, and virtual port-channel (vPC).
  • Access Policy Groups can be reused as many times as you want (this will hopefully be more clear as we keep going if it’s not already), whereas port-channel and vPC Policy Groups cannot be reused. PC/vPC Policy Groups aren’t reusable as they are kind of the “interface port-channel” config from traditional NX-OS — what I mean by that is interfaces get associated the Policy Group, and then become part of the port-channel — obviously if we want to have unique port-channels we use unique port-channel IDs in a traditional switch, its much the same in ACI.
  • By themselves, Interface Policy Groups do pretty much nothing! We’ll need to apply the Interface Policy Group to an Interface Selector, don’t worry it all seems kinda crazy and like lots of work at first, but its all quite organized and logical.

Interface Profiles:

  • An Interface Profile by itself kinda doesn’t really do anything either. It’s really a container for Interface Selectors.
  • Interface Profiles ultimately get tied to Switch Profiles — I swear this will all start to make sense shortly đŸ™‚

Interface Selectors:

  • Interface Selectors tie an Interface Policy Group to a port or ports.
  • An Interface Selector is a child object of an Interface Profile.
  • Note that the Interface Selector “selects” a port (via port ID), but not a switch — i.e. Interface Policy Group “Carls-Access-Port” gets associated to port 1/1, but that doesn’t tell us which Switch…

Switch Profiles:

  • Switch Profiles tie all the stuff we’ve talked about together and actually associates things to a specific switch or switches.

Okay so now that we’ve gotten that out-of-the-way, lets jump right into how I like to design this and why. We’ll go in the same order outlined above.

Interface Policy Groups:
Screen Shot 2016-07-24 at 3.26.29 PM

  • As outlined above, only access-port Interface Policy Groups are reusable, so I always create a “Standard-Acess” Policy Group for my access-ports.
  • You’re probably used to having port-channels/vPCs be identified by a number — don’t do that! The name field for the Interface Policy Group accepts numbers so you could easily just name your PC/vPC “1”, “2”, “3”, etc., but its a PITA later as ACI will automatically allocate a PC ID and it will almost certainly not be what you entered!
  • Do yourself a favor and name the PC/vPC policy groups logically. In the above picture I have “F5-PR-A” and “F5-PR-B” — these represent F5 Prod A and B (nice and easy right?). Often times we end up naming the Interface Policy Groups after the hostname of the server they are connecting to.
  • You’ll need Interface Policy Groups for your L3 Outs as well, so I always just name those “L3-Out-XXXX” again you can see I have some L3 Outs to ASAs and a “Subifs” Policy Group as well for the primary L3 Out.
  • Note that in 2.0+ you will see a folder for both “Leaf Policy Groups” and “Spine Policy Groups” – if you are on code prior to 2.0 you will simply have the parent “Policy Groups” folder.

Interface Profiles:
Screen Shot 2016-07-24 at 3.51.05 PM

  • So far nothing too exciting — really just some common sense naming stuff, but this is where it starts to get a little more interesting.
  • My first few times through I was building out Interface Profiles that were specific to a function — my logic was if I have an Interface Profile for lets say my F5 Load Balancers, and I eventually expanded so much I had to add more F5s I could simply deploy that Interface Profile again to the next set of Leaf switches that would connect the new additional F5s. Good logic I guess, but I don’t advocate it!
  • My current strategy is to create an Interface Profile per Leaf switch and per PAIR of Leaf switches (vPC pairs) — this should be pretty obvious in the pic above.

Interface Selectors:
Screen Shot 2016-07-24 at 3.51.21 PM

  • This is perhaps the most important piece to my overall strategy — I always create an Interface Selector PER PORT. So even though we could do a range of ports I always always always do a single port at a time.
  • At first glance that doesn’t make a ton of sense right? Why create more work (building individual Interface Selectors) when we can simply have a range of lets say 10 ports that are all access-ports that all have the same Interface Policy Group attached to them? The answer is simple and pretty obvious (maybe not unless you’ve been burned by this before which is why I’m writing this!) — if I want to make a change to port 1/3, but I’ve previously created an Interface Selector that spans from port 1 to port 10, I would have to delete that Interface Selector (causing some faults and things to happen), create a new Interface Selector for port 1/3, and then another two Interface Selectors for ports 1/1-1/2 and 1/4-1/10. Not cool!
  • Another less drastic example would be if I had a port-channel that was 8 total ports and I created an Interface Selector for ports 1/1-1/4 in my Interface Profile L101-L102 — what happens if I decide that an 80G vPC to my QA A10 is overkill and I want to reclaim some of those ports? You guessed it, gotta delete, rebuild the previous Interface Selectors for the port-channel, then a new one for whatever else I want to do.
  • So the moral of the story here is to be granular as it will save you a lot of hassle in the future.
  • Finally, I always just name these nice and simply “Port-1”, “Port-40”, etc. — keeps it easy to read, and my thinking is that the EPG and/or Interface Policy Group should be the real descriptor (or hey, just use the description field that is available pretty much everywhere in ACI!).

Switch Profiles:
Screen Shot 2016-07-24 at 3.51.35 PM

  • Okay bringing it all together!! So as you can see the Interface Profiles I created mapped to individual switches and vPC pairs of switches (in name only, because they aren’t actually “hooked” to a switch yet), and as you can see I’ve created Switch Profiles that have exactly the same naming convention.
  • And of course, as the name implies, this is where we actually tie all of the interface level stuff to an actual switch. Logically, each Switch Profile is associated to the switches in the name — i.e. Switch Profile “L101-L102” is associated to the vPC switch pair of nodes 101 and 102.

In closing, some of this may seem like extra work, and I suppose it is. That being said blowing all this config in  via the API is SUPER simple (check out the ACI Power Deployment Tool, a Python library I wrote to help folks get started w/ ACI/ReST). Hopefully this all makes sense and you can see the value in being nice and granular with your naming/deployment strategy – it will help you out a ton later down the road!