NFD16 – Gigamon and Splunk (with a Dash of Phantom)

I had the opportunity to take a trip to the Gigamon mothership (no, not like the Apple Mothership, it’s a normal HQ) last week at Networking Field Day 16. I was pretty excited as I’ve not had a ton of hands on with Gigamon, but I’ve run into their products at a goodly portion of all the customers I’ve worked with over the years.

If you’d like to take a peak at some Gigamon presentations before continuing you can check out a ton of them here on Tech Field Day’s Youtube channel.

This events presentations were focused on leveraging Gigamon, and some of their partners (Splunk and Phantom), ability to react to security incidents. The core idea in their presentation is that security is a very hard problem (they ain’t wrong!), and that many organizations spend a large quantity of time and money, generally across a broad set of tools. In order to make security simpler the idea is that you can use Gigamon and Splunk to in concert to detect things that you would prefer not happen on your network, and that would require tons of other tools to do without these platforms combined power. The integration between the platforms allows for Splunk to fire off these triggers which Gigamon can then react to by dropping traffic or alerting on this traffic. Taking things a step further via the integration with Phantom these events as seen in Splunk could fire off a whole host of mitigation tactics/processes in order to automate away a lot of the manual tedious work that SOC personnel may have to deal with. All told, this is a pretty cool story. The integration with both of these other platforms seemed from the demo to be pretty smooth; flexible, but not super open-source-y (i.e. your average Joe/Jane could probably figure it out without pulling out too many hairs).

In a perfect world, I can certainly see Gigamon (and team) supplanting possibly many other products by consolidating functionality. Gigamon itself is a pretty powerful platform, couple that with Splunk which is a beast and can provide very interesting data correlation/insights, and finally wrap it all up with Phantom to put the sexy bow of automation on this and things look interesting (unfortunately time was limited for the Phantom portion of the presentation so I don’t have much insight there but it really did look awesome!). That being said, there are some challenges…

Gigamon at its core relies on being inline with traffic, or at least receiving traffic (of course if you’re not inline you can’t drop things so keep that in mind). This has historically been more or less a non-issue — data centers always have had choke points, so go ahead and plop your Gigamon appliance right there and you’re in business. That whole Christmas tree type topology where we had easily defined choke points is not really a thing anymore (at least in data centers being deployed now — of course they still exist). Most data centers, and certainly the ones I’m involved in, are opting to build out in a Clos topology. In a Clos topology we can have crazy things — like 128 way ECMP!! Not that 128 way ECMP is common, but even in small 2-4 spine node topologies there aren’t any especially good places to place a device like a Gigamon. You can of course put Gigamon/IPS/whatever inline between leaf and spine nodes, however this is an atrociously expensive proposition — for several reasons, firstly just for the sheer amount of links that may entail, and secondly from a capacity perspective — if you’re going 40G, 100G or looking toward the crazy 400G you’re going to have to pay to play to run that kind of throughput through a device (Gigamon or otherwise). Depending on the topology, it may be easy to snag “north/south” traffic (border leaf nodes -> whatever is “northbound” for example), but with an ever-increasing focus on micro segmentation within the data center this is *probably* not sufficient for most orgs.

One option to address some of this that was not mentioned, or was mentioned only briefly, at the Gigamon presentation is the GigaVUE-VM. The idea here is that this is a user-space virtual machine that can be either in-line with VM traffic, or sitting “listening” in promiscuous mode. Because this is living in the user-space there are no hypervisor requirements/caveats, it just kind of hangs out. If used in “inline mode” (which I’ve actually not seen so maybe thats not a thing?) there is the potential for this to replace the big iron hardware appliances, and fit more neatly into a Clos topology. I would have liked to see/hear more about this… a bit more about why at the end…

I had two major takeaways from the Gigamon presentation, firstly — Splunk is like a magical glue to tie things together! The data being fed into Splunk could have come from any number of sources (syslog of course, clients on agents, http events, etc.); in this case it was from Gigamon, and Gigamon performed drop actions based on the rules created in Splunk. I suspect that Splunk could be (relatively?) easily configured to make an API call to a firewall or other device/platform to react to data being fed into it. Splunk without data though, is not really all that useful — and here is where Gigamon showed their value. Being able to capture LARGE amounts of data, and then do something to it (really just drop, but thats an important thing) is very valuable.

That being said, my second takeaway was that this felt largely out of sync with what most customers I see are doing, at least in the data center. When pressed for how to practically adapt this into a Clos topology the answers were thin at best: (paraphrasing) “just tap all the links to leaf/spine”, “tap at chokepoints” etc.. This is all well and good and depending on requirements and budget may be just fine, however I didn’t exactly get warm fuzzies that Gigamon knows how to play nice in these Clos data centers. Obviously tapping everything is a non-starter financially, and chokepoints are well and good but that means that the substantial investment in Gigamon/Splunk (because it really does seem like they  need to be deployed in unison to justify the expenditures) doesn’t actually do you much/any good for securing east/west traffic.

Having ran into Gigamon in several Cisco ACI deployments I’ve been a part of I can say that customers really love — or at least have invested so much that they feel the need to continue to get value out of — Gigamon, but each time I’ve seen this there has been a big struggle to find a good home for the appliances. This is why I really would have liked to have seen and heard more about the GigaVUE-VM — my knowledge is quite limited on it but it certainly seems to be a possible work around for the challenges of finding choke points in a  Clos fabric. The big caveat to this is that the Gigamon folks did mention that the VM does NOT have feature parity with the HC hardware appliances. It sounds like they are investing in adding these features though which would obviously be helpful.

One final note, as I have very data center focused goggles on I’ve more or less ignored the campus/WAN, but I definitely think this could be useful in those areas, perhaps much more so than in the data center.

 

Advertisements

Guest Post! WTF Are all those Checkboxes? (ACI L3 Outs) – Part 2 of ???

My friend and colleague Mr. Jason Banker recently ran into some good times with the mysteries of the ACI L3 Out Checkbox Madness! He Slack’d me and told me he’d found some clowns blog post about it (yours truly) and that some updates and additional information was needed, so he kindly volunteered some time to help out! Without further ado here is Jason’s Checkbox Madness:


 

As we continue to deploy fabrics we always joke about these damn routing checkboxes shooting us in the foot.  We play with different scenarios in the lab to ensure we understand how these pesky boxes work and what other options we have for future deployments.   The scenario here was to use get different OSPF areas connected to the same border leaf using ACI as the transit.  This scenario brings up some certain challenges and hopefully my testing will help others understand it a little better as well.

Design:

We have two external routers coming into a border leaf on ACI, two L3Outs (required because of multiple areas), one is Area 0 (backbone) and one is Area 1.  Here is the breakdown of routes on each router:

External Router 1 (Area 0):

  • Loopback0: 2.2.2.2/32
  • Loopback1: 4.4.4.4/32
  • Transits: 192.168.0.0/29

External Router 2 (Area 1):

  • Loopback0: 3.3.3.3/32
  • Loopback1: 5.5.5.5/32
  • Transits: 172.16.0.0/29

 

Using ACI as a transit we want routes from Area 0 to be imported into Area 1 and vice versa across the two L3Outs.   We will skip the build of the L3Out portion but I want to focus on those pesky checkboxes again.  Whenever I build an L3Out my network EPG usually looks something like this:

By default, “External Subnets for the External EPG” is checked (this checkbox simply enforces policy on this L3out and contracts are applied to the specific subnet) and I am using 0.0.0.0/0 network as a catch-all.  Moving along with the defaults, I show full adjacency:

 

As well as a full routing table within ACI, receiving the networks above in the fabric as expected:

Note:  Anything received from area 0 is shown as backbone and everything from area 1 is 0.0.0.1.

 

Now if look at the routing table for External Router 1 we see no routes across the fabric being received from External Router 2.

 

Let me check my OSPF Neighbors:

 

So, we have no OSPF routes but we have a neighbor relationship.  Let’s go check External Router 2:

 

So, we are showing some OSPF routes but they are only the loopbacks of the ACI Fabric (Area 0), not what we are necessarily looking for.  ACI blocks transit routes between different L3Outs unless permitted by policy via an OSPF area filter-list (to verify ssh to the border leaf and run “show route-map”).  Let’s go look at the Network EPG checkboxes again and see if we can get routing to occur between OSPF Areas across the fabric.

 

As we showed earlier we are using a catch-all 0.0.0.0/0 with “External Subnets for the External EPG”. If we select the “!” on the upper right portion of the screen we will receive the properties of this screen:

 

Based on what this screen states, “Export Route Control Subnet” – controls the export route direction and “Import Route Control Subnet” – controls the import direction.  This sounds like what we need to get routing to traverse the fabric.  Let’s go ahead and select them for Area 0 but before we can select import there is another configuration we need to apply before we can get import to not be grayed out.  If we go back to the top of the L3Out in the navigation pane we need to select the “Route Control Enforcement:” import checkbox:

 

Now if we go back to the Network EPG we should have both options available to us:

 

Now let’s see if we have any changes within our routing table:

 

We still have no change in the table.  Remember that we are using a catch all 0.0.0.0/0? This would require us to also select the aggregate export and import features on the subnet/network epg we have created for Area 0 and Area 1:

 

Time to verify:

 

This looks great.  Now we will verify External Router 2:

 

If we want to take it a step further we can do network specific routes instead of a catch-all:

 

Router 2:

 

Verify Router 2 is receiving 2.2.2.2/32 from Router 1:

 

Now we can send 3.3.3.3/32 from Router 2 into Router 1:

 

Router 1:

 

As I stated earlier these checkboxes are updating route-maps and prefix-lists within ACI.  Prior to us selecting the import/export feature our route-maps had a deny all so no routes would traverse areas.  Upon selecting these checkboxes we can see the change:

 

Instead of giving you route-map blah I will try and breakdown the map for you just focusing on export route-map. Prior to import/export our route-map looked like this:

route-map exp-ctx-2392064-deny-external-tag, deny, sequence 1

  Match clauses:

    tag: 4294967295

  Set clauses:

route-map exp-ctx-2392064-deny-external-tag, deny, sequence 9998

  Match clauses:

    ospf-area: backbone

  Set clauses:

route-map exp-ctx-2392064-deny-external-tag, deny, sequence 9999

  Match clauses:

    ospf-area: 0.0.0.1

  Set clauses:

route-map exp-ctx-2392064-deny-external-tag, permit, sequence 10000

  Match clauses:

  Set clauses:

route-map exp-ctx-2392064-set-external-tag, permit, sequence 2

  Match clauses:

  Set clauses:

    tag 4294967295

route-map imp-ctx-bgp-st-interleak-2392064, deny, sequence 1

  Match clauses:

    tag: 4294967295

  Set clauses:

route-map imp-ctx-bgp-st-interleak-2392064, permit, sequence 10000

  Match clauses:

  Set clauses:

 

You can see that we had “deny” for backbone and area 0.0.0.1 preventing us from using the fabric as a transit.  After we selected import/export features our route-map is updated as such (just focusing on the export route-map):

route-map exp-ctx-2392064-deny-external-tag, deny, sequence 1

  Match clauses:

    tag: 4294967295

  Set clauses:

route-map exp-ctx-2392064-deny-external-tag, permit, sequence 9801

  Match clauses:

    ip address prefix-lists: IPv4-ospf-rt2392064--0-dst-rtpfx

    ipv6 address prefix-lists: IPv6-deny-all

    ospf-area: backbone

  Set clauses:

route-map exp-ctx-2392064-deny-external-tag, permit, sequence 9802

  Match clauses:

    ip address prefix-lists: IPv4-ospf-rt2392064--1-dst-rtpfx

    ipv6 address prefix-lists: IPv6-deny-all

    ospf-area: 0.0.0.1

  Set clauses:

route-map exp-ctx-2392064-deny-external-tag, deny, sequence 9998

  Match clauses:

    ospf-area: backbone

  Set clauses:

route-map exp-ctx-2392064-deny-external-tag, deny, sequence 9999

  Match clauses:

    ospf-area: 0.0.0.1

  Set clauses:

route-map exp-ctx-2392064-deny-external-tag, permit, sequence 10000

  Match clauses:

  Set clauses:

route-map exp-ctx-2392064-set-external-tag, permit, sequence 2

  Match clauses:

  Set clauses:

    tag 4294967295

route-map imp-ctx-bgp-st-interleak-2392064, deny, sequence 1

  Match clauses:

    tag: 4294967295

  Set clauses:

route-map imp-ctx-bgp-st-interleak-2392064, permit, sequence 10000

  Match clauses:

  Set clauses:

Now our route-map has been updated with prefix-lists to allow our traffic across areas, we will look at the prefix-list itself:

 

Leaf-103# show ip prefix-list IPv4-ospf-rt2392064--0-dst-rtpfx

ip prefix-list IPv4-ospf-rt2392064--0-dst-rtpfx: 1 entries

   seq 1 permit 0.0.0.0/0 le 32

 

The 0.0.0.0/0 catch-all has been added and our routes can traverse the fabric.  I suggest you also peak at the import route-maps and see what is happening under the hood there as well.