Its been far too long since a post… it was before the last post too. Turns out its hard to blog! Who woulda thought!? 🙂
I get on my soapbox a lot. Just ask anyone who knows me! I wanted to do a quick post to just discuss some of the things that influence my design decisions that I don’t necessarily get a chance to talk about all the time. These are totally just my personal guidelines for designing networks, but they’ve come from lots of poor decisions (of my own and of networks I’ve inherited or had to help out on), and so I get pretty adamant about them often. So, without further ado, here are some of the things I always try to keep in mind when designing networks:
- Route ALL the damn time!
- I’ve talked about this a bunch… routing (to ToR/IDF) = tiny failure domains = when things break it affects less stuff
- Since EIGRP is so dumb, I would never say do that… OSPF/ISIS = vender interoperability which is convenient
- When things are deployed in pairs (like they should be almost all the time), its easy to ‘flip’ traffic off a box by adjusting routing metrics — basically allowing you to easily decommission a box for software/hardware upgrades
- Easy fast convergence — I’m sure LACP is faster, but who cares… built in sub-second failure in OSPF, easily tune-able timers in every protocol ever = easy and fast failover
- Always push complexity as far to the edge as possible
- This is how the Internet and every respectable ISP is built — L3/MPLS core, with all the magic happening out at the CE devices — this scales, and keeps it very simple to maintain
- QoS is a perfect example of this in every respect — you always want to mark traffic and enforce policing decisions as close to the edge as possible, the core then only has to look at tags and provide bandwidth as configured
- Always avoid shared fate scenarios — unlikely as faults in these configurations may be
- Stacked switches and VSS (or any other control plane sharing thing that lumps multiple switches together) represents a large single point of failure, instead of multiple small single points of failure. I’m not saying stacks and VSS suck, they’re just fine. When VSS came out and there were bugs it maybe wasn’t fine, but the technology is mature now and stable and running in many Enterprise customers. It’s there almost always to be able to do MCEC/MLAG (whatever you want to call it) to downstream devices. If we followed design mantra #1, we would be able to route to our IDFs or whatever and wouldn’t need the multi chassis etherchannel to cope with spanning tree sucking.
- vPC is the next best answer… vPC gets us the ability to do multi chassis port channels, but we have two separately managed switches without a shared control plane… meaning you can break one and not break the other. I’d still rather do L3 everywhere, but this is an acceptable compromise.
- Always plan to scale more than you think you will need to
- Simply put, never paint yourself into a corner. Just because you don’t think the data center will ever grow, doesn’t mean that it won’t. Which rolls right into point 2 here…
- Designing a network that CAN scale should be easy. You can “right-size” a network without sacrificing the ability to scale out. The easiest example of this in my mind is to do L3 everywhere. It’s VERY easy to add devices into a fully routed domain where you don’t have to worry about silly stuff like L2 loops and what VLAN goes where.
- Good enough is good enough
- Greg Ferro wrote about this mid last year (LINK!) and I’ve thought a lot about it ever since — good enough is always good enough. Sure you want to design the Ferrari of networks, but if you do it right, and build it out in a responsible way that will allow you to scale and easily upgrade the network in the future (L3 people… I’m just saying…), then you don’t necessarily need the Ferrari… The Subaru may be just what you need. (Disclosure… I own a Subaru, and its a great car!)
- Distributed Systems ALWAYS scale better
- I don’t think Openflow has worked…. obviously there are enhancements that allow for decentralized forwarding, but at the very least the initial spec seems AWFUL. Centralized forwarding is just a bad idea.
- NSX is a good example of doing things in a distributed fashion — in kernel firewalls, and forwarding at each hypervisor. 1000v/VSG is similar — I like these things. Central management is okay, probably even preferable in most cases. Just don’t decentralized forwarding please.
- Monolithic anything is bad. I’ve written about this. See next point.
- As always, complexity to the edge is better — see previous points.
- Monolithic anythings are terrible
- All of the above things basically lead to this one truism — monoliths are awful… poignant eh? They’re hard to upgrade, hard to replace, failures are cataclysmic, they smell funny, often cost more (7k vs 9k = perfect example), and I just don’t like them!
That was a bit more than I originally thought I had, and I think some are a bit repetitive, but overall I feel like these are the principles that guide me, and so far they’re doing pretty well for me. Would love to hear what folks think, and if they have anything to add!