This post, as the title suggests, is an open question to whoever out there can answer/have a dialogue about this with me.
Unless you’ve been living under a rock lately you have undoubtedly heard about Puppet, Chef, Ansible, and any other number of automation/orchestration engines. The goal of course of all these tools is to simplify how we deploy and manage different pieces of equipment across our network. Tons of ‘stuff’ is happening in server-land, and as always, there are gripes that the network isn’t adapting quick enough etc. To that end, Puppet in particular seems to be making a lot of noise about getting into more of the network space (in fact there are already Puppet agents for some Cisco Nexus platforms).
So that’s cool right — Puppet can do some network stuff, Ansible can do even more stuff since it’s just SSH basically, but what the hell are we supposed to do with them? Ansible makes a lot of sense to me from a template-ing perspective; build the template and just change variables, and even deploy it to live gear with a bit of intervention from Python (supposedly that’s how to do it, I’ve not done it first hand though). That is some super cool stuff, but again, it’s not ‘the SDNs’ (note sarcasm please).
So in terms of the dynamic network of the future, what are we automating? Take this example that is very similar to a real life scenario of a customer I was recently at:
We host pictures of famous cats. We have different tiers of storage. Cats that are less popular are stored in a slower storage tier and/or have fewer resources dedicated to serving those pictures up. A cat that was not that popular dies/gets married/has a new viral cat video/etc. and is now super popular. I want to pull that cat’s images from the slow storage to put into my fastest tier of storage and/or allocate more resources to serving this cat up.
In this example, we could automatically provision more VMs to serve these images up (vCAC/vApp+Puppet or something), or possibly shuffle the storage to the faster tier. We could do this based on some external input — such as top 100 Google searches or something; when our favorite cat is popular we unfreeze the storage, when his star falls we put it in mothballs. This is a bit outside my realm, but this could conceivably all be accomplished with tools that are available today with a bit of extra programming work.
So now we’ve finally got to my question — what the hell needs to be automated in the network? What is it that the app guys are complaining about? That’s a sincere question. Lets take a look at what I think at this point is a modern (maybe not bleeding edge Facebook type) data center — what does it look like? If you asked me it would be almost entirely virtualized, employ L3 ECMP all the way to the top of rack, and if required use VxLAN via OVS/Nexus 1000v/NSX to provide L2 adjacency over the fabric if required, and possibly some service chaining type functionality.
My position is that if we, as network engineers, have done our job well, the network should be 100% transparent to the app guys. It almost pains me to say it, but the network should just be plumbing, it should just be the highway… its not sexy, but it is necessary. You can have the fastest car money can buy, but if you can’t drive it anywhere because there is no roads, whats the point?
Taking the 1000v as an example, we should be able to provision port-profiles, either VLAN or VxLAN, on day-1. These VxLANs or VLANs would be then available to whatever tools are reaching into the hypervisor to automate deployment of virtual machines. Going a step further, we should also be able to take tenants or resource pools, or other VM or network attributes and define security policies surrounding those attributes. Again, as new virtual machines are deployed these automation tools should be able to use the network policies already in place.
All this is well and good, but what happens when you need to define NEW policies, or NEW VxLANs? I don’t know! That’s a great question! Deploying a VxLAN can probably be automated fairly easily (VMware NSX does this for sure), but what about security policies? I gotta think that there is enough complexity here that abstracting the process won’t really help…I don’t think at least. My thought would be that you will still have to understand and define the ports/protocols/VM attributes/etc. that needs to be matched and acted on in any policy and therefore it would be basically impossible to automate since you are going to have to type/click things no matter what. Thoughts?
Outside of the network edge configurations, what else would need to be automated in a network? If the network has been defined as I outlined above, it really is just like a service provider network, or voice networks before that (isn’t technology cyclical??) where the complexity gets pushed as far to the edge as possible, what the hell needs to happen in the core? In the service provider network the core would be nothing but ISIS/OSPF and MPLS labels — in our example the core would be nothing but an underlay with some routing protocol that does ECMP. Why would you need to change it? You wouldn’t unless it was something that would require manual intervention anyway like hardware changes/re-cabling etc.
This post is getting a little long-winded, so I’ll try and wrap up. I think at the end of the day I still have the open question about what the hell it is we are going to automate in the network. I think the network is just a road. The road should be awesome. It shouldn’t have a bunch of pot holes, but it doesn’t need to do anything exciting. It should just allow the overlay (if required — it may not even be necessary if the applications are capable of being distributed and not rely on any silly L2 requirements) do what it needs to do. We can do service chaining type stuff with 1kv/NSX/OVS/etc. So what needs to be automated from your perspective? I’d love to have some good dialogue about this since its fascinating stuff!