Post-TFD Segment Routing Roundtable Thoughts

My brain has now had a bit of time to recover from the information overload that was the Tech Field Day Segment Routing Round Table, so it is most definitely time to write a bit about what I learned. You may want to get a listen in on the Software Gone Wild Podcast with Ivan Peplenjak for a solid foundation of what SR is before jumping into things. After that, head over to the TFD YouTube channel to check out the recordings from the event. We had some really great presentations from Walmart, Microsoft, and Comcast, each of these companies explained how Segment Routing is helping them in their particular environment. I would start with the presentation from Mark Pagan of Walmart as it goes over a lot the real world day 1 benefits of SR. Then take a listen to the Microsoft and Comcast presentations, they really kicked it up a lot in terms of complexity of their overall solutions, but also really highlighted a lot of what is possible with Segment Routing.

I’m not going to try to write anything too technical about SR because I am definitely not enough up to speed on it to talk about it at that level. What I am going to do is jot down my view on it as a technology, and its applicability (in my mind I guess) in day-to-day network world. I also want to respond to my own thoughts/questions from my previous post before the TFD event.

I’ll try to address my own previous points first:

– What ever happened to NSH: Guess I didn’t really get a solid answer here. As far as I can tell NSH is still technically a thing, but really seems to be fading away. I think ultimately its too big of a problem (or I guess solution) to really successfully implement. Somebody please chime in if there’s something new/interesting happening w/ NSH that I should be reading about. In any case, as compared to SR, they really are different beasts. I think there is some overlap in terms of what NSH was promising and what SR can do. Sure SR can direct traffic through a network, and maybe even to or through some devices on the network but it’s not intending to do “service-chaining” in the same way that NSH was/is.
– Config nightmare: Nope — think that was my biggest takeaway is that SR is pretty much MPLS 3.0. I mean 3.0 because it is just that much simpler, not only to configure, but to troubleshoot as well. I bring up troubleshooting since I think this is/was the biggest and most important part of the whole event – SIDs (Segment IDs) are globally significant. Sounds not very exciting/important by itself eh? Well the reason I think that is so huge is if you’ve ever worked w/ MPLS and you are troubleshooting and trying to understand the end to end label switch path (LSP) then you will know that the labels are all over the place and are significant to the local router — now they are unified across the whole LSP… that’s pretty badass! I should also note that instead of using LDP, SR distributes tag data via TLVs in OSPF or IS-IS, kinda sorta like LDP auto-config.
– Granularity/Service Chaining: I think you can do some of this with SR, but it’s really not its intended use case — a bit more on this later.
– Isn’t MPLS dead?: Heh… yes? No? Obviously it’s not dead in the service provider world, and likely won’t be for a long long time. In the data center… mostly dead is maybe fair? I can say I personally don’t see much/any MPLS in the DC at least. I think that part of why I was bringing this up before was because I was thinking more about an Enterprise DC (as that’s my day-to-day focus). I think you could absolutely use SR in an enterprise DC but I don’t think it’s really the best tool for that job. If you take a look at who presented though, you’ll see that while these are “Enterprises” (well plus Comcast as a Service Provider), but they’re freaking huge, and they’re really their own SPs doing SP type things. (MS is using this in the DCs but in a very hyperscale/SP type way)

Alright so I guess that addresses the points from my previous post, now on to a bit more wordy words to recap my thoughts on SR.

I feel like SR is kind of no-brainer in the SP/WAN world, it really does just seem like a way better way to do MPLS. You’ll still have to layer “stuff” on top of the SR bits (vpnv4/6 address family type stuff or whatever it is you’re running atop your MPLS), but SR just makes the rest seem so trivial. TE just got owned also… seems like there is basically no point to TE as we know it today if you can just use SR-TE to make your life so much easier. All that is well and good but, I don’t live in a provider-centric world really, I focus on data centers so…

While I am now a fan of SR, I feel like it doesn’t have a place in the data center. I know that the folks working on it will probably disagree, and I would like to agree with them but I can’t at this point. The current biggest challenge in the data center (at least at normal enterprise scale) is we still have to have L2 in some capacity. This is a super super super lame requirement, but it is what it is. This requirement is the reason we have jenky spanning-tree kludges (MLAG, vPC, VSS, etc.), FabricPath/TRILL, OTV, VPLS, and now VxLAN. Now from what I understand, there isn’t technically any reason you couldn’t use SR w/ some AToM or VPLS (maybe PBB?) to provide L2 over L3 in the data center, but that sounds like a freaking headache. VxLAN has pretty much won the DC overlay wars, and I don’t see any reason to introduce SR into the DC. Between data centers SR certainly could have a role in providing transport services, or even L2 extension, however even then as VxLAN continues to mature and grow into that role it doesn’t feel like its worth it to tack on another protocol/feature to support that requirement. If SR was the panacea for service chaining that I was kinda hoping it would be, then perhaps I’d feel differently. So at this point, given our stupid “requirements” for L2, I think SR should/will likely stick to the WAN/hyperscale folks. Theres nothing wrong with that of course, but I do feel like its important to delineate where SR is best suited (at least in my mind!).

 

PS – Go watch Paul Mattes presentation (Microsoft), they’re using link-bandwidth in BGP which has always seemed to me to be the best kept secret of BGP. I was very exited to hear they’re really taking advantage of it in production, I rarely see it so that was fun. /end nerdgasm

Pre-TFD Segment Routing Roundtable Thoughts

Next week I will be attending the Tech Field Day Segment Routing Round Table (that was a mouthful) in San Jose. As is clearly evident by the title of the event, I can only imagine we will be discussing Segment Routing. At this point my exposure to Segment Routing is limited to a few blog posts, and a few YouTube videos just to get the lay of the land, so I’m very excited to go and hear from some super smart people a lot more details about this beast.

I figured I would just take this opportunity to jot down some thoughts/questions/comments about SR at this point, however unintelligible/ignorant:

  • My first thought is WTF ever happened to Network Services Headers?? My maybe not great summary about SR is that its in a nutshell source-routing + way way way easier MPLS-TE — the end goal of doing source routing and the ability to do traffic flow manipulation could be to route certain traffic over certain paths of course, or it could be to route traffic through transparent devices like an IPS or something, or even to an active IPS/firewall/ADC/etc. Wasn’t NSH going to solve all of this?
    • Follow up thought/comment: I can totally see NSH being near impossible to implement since I would imagine we would be relying on applications/hosts to insert appropriate information into the header. I suppose SR is easier to implement/more realistic as we are handling control of this at the network layer.
  • Oh man, this is going to be a config nightmare. It seems like this could easily spiral out of control into a massive unmanageable config (obviously depending on how granular you want to be I guess). If we are going to do some of the same stuff NSH is/was intending to do (L4-7 redirection more or less) then I can imagine SR configs are going to get nutty… that leads to the next question:
  • How granular can we be? If we are going to do some of the flow redirection stuff, how do we classify flows? What I’ve seen so far is source prefix X.X.X.X gets a label that means it goes to point A, then from point A to point B etc. which is cool, but that’s a whole prefix. What if we wanted to redirect only HTTP/HTTPs traffic? Possible?
    • Part of my question/concern here is that one of the biggest issues I personally see right now is how and what traffic do you shove down your firewall/load balancer… those devices (because of the complicated stuff they are doing) will never be able to handle the same traffic load as a clos 100g network… just not happening, so it would be really really nice to be able to redirect only the things I’m interested in seeing over to my firewall. This is IMO the biggest (only if I’m being snarky) reason NSX is powerful — the PAN integration (and I think now CheckPoint?) is really powerful — distributed, in software, selective firewall-ing is freaking hard. (Note that you can do that (statefull in kernel firewall) w/ AVS as well, just not as advanced as PAN+NSX)
  • If I already have MPLS/TE, why bother? I say that for a few reasons:
    • Isn’t MPLS dead? I feel @etherealmind screaming at us about how nobody uses MPLS anymore 🙂
    • I feel like this is maybe most useful in the data center (where that redirection would be needed quite heavily), but will it be supported there? Would I even want it there? I feel like we already have a lot of flexibility/tools to do this in the DC
    • If you already have MPLS/TE investment, this certainly seems WAY easier, but can it fully replace TE (bandwidth reservation, class-based tunnel selection, etc.), and if so, how can you/will you migrate toward it?

That may have come off a bit grumpy about SR, but that’s certainly not the intent (yet! Wait till after the Round Table hah!), I just want to makes sure I’m fully understanding this beast as I have for sure head a lot of cool things about it. A friend in the Seattle area actually messaged me after I tweeted about going to this event to tell me how much he loves SR and how his customers are jumping all over it! I’m very much looking forward to learning more about SR next week, so tune into the live stream and poke us (the delegates) on Twitter so we can harass presenters accordingly 🙂 See you next week!

You can find out more about Tech Field Day here.

Disclaimer: Tech Field Day is being super cool and flying me down to San Jose for this event, probably even buying me some beer if I’m well behaved… just a heads up.