"This post is going to talk about managing networks in a whole new way. The concepts in it will change your life. They changed mine."
Ever since then I've been working in some way or another to bring the Awesome to networking. In this post I am going to look back at the sysadmin journey, and look forward to the journey ahead for networking. Awesome, in a business context, is the representation of how quickly the business can go from having an idea to making money on that idea. As a friend said: going from the AH-HA to the CHA-CHING. As that relates to networking, Awesome is the representation of how quickly the network team can provide network services to keep the business idea making money. That is, going from the AH-HA to the PING to the CHA-CHING.
Most people in networking are likely at point A — expending the maximum amount of effort for the least amount of awesome. Everyone wants to be at point Z — expending the least amount of effort for the maximum amount of awesome.
What does the journey to Awesome look like for networking? To better understand our future, we can study the past. Let's briefly reflect on the history of DevOps. Before DevOps, "a long time ago", sysadmins were manually installing servers. Some may have written Bash or Perl scripts to help automate basic installation tasks. (sounds about where we are in the networking industry) The history of DevOps is not my story to tell, but you can find a a great, short video here. What I find most interesting is that DevOps, as a named movement, is only about seven years old. The inception is attributed to Patrick Debois back in 2007. Tracing this history of DevOps, something important happened: a turning point that launched the transformation.
- People began to think differently. It was a complete paradigm shift to think of managing server infrastructure configuration (packages, files, services, users, etc.) the same way an application developer thinks of managing code. This different way of thinking was called "Infrastructure as Code". But just thinking and talking about it was not enough.
- People were able to practice this new way of thinking. New force-multiplier technologies came on the market. Tools like Puppet and Chef gave sysadmins the ability to practice this new way of thinking. Puppet was founded back in 2005, and it wasn't until six years later in 2011 that they had their first commercial product. Chef was another company founded in 2009. Both Puppet and Chef took inspiration from an even earlier project, CFEngine, which first started as far back as 1993.
Gartner first reported on the rise of DevOps in 2011, predicting the start of mainstream adoption in 2015. Somewhere in 2015 the DevOps movement appears to have made the turn from niche to mainstream. An article in the Wall Street Journal suggests that 2016 will be the breakout mainstream year.
I would submit that it is very important to keep in mind that this transformation from manually installing, configuring, and operating servers to "The Awesome" was a long journey of iteration — not only in different ways of thinking, but advancement in tools and technology that made it humanly possible.
The networking industry is reenacting the history of the DevOps movement. As a guidepost I would submit we need to find our own turning point. We must find our new way of thinking about managing the network, and create new force-multiplier technologies that are focused on the needs of the networking team. But before suggesting what this new way of thinking might be, we should first review how we go about networking today.
While many people in networking would suggest that their network is a "unique snowflake" as a whole, many of the specific parts of the network are based on well known reference architectures. Ten years ago, for example, we were building data centers using a "Three-Tier Core/Agg/Access" architecture, and now we are using "IP Clos" based architectures. So as a general practice for building and managing a network, the network team would use the following types of documents:
- Architectural Reference and Design Guide - This document would outline the purpose and benefits of the architecture. The document would then cover specific design decisions that the network engineer would need to make within the context of the architecture that would align to the specific needs of the business. From these sets of design decisions the network team would formulate one or more specific reference designs.
- Implementation and Operations Guide - This document would start from a specific reference design, and then provide the specific configuration steps needed to implement that design in the network. In addition to the specific configuration steps, the document would describe the specific "show" commands that the network engineer would use to ensure that the network was in fact running as expected and could provide the network service. Depending on the design, there may also be specific tasks outlined, for example "how to add a new leaf switch" — and these are often referred to maintenance tasks. Each of these types of tasks have their own specific set of configurations and operational "show" validation steps.
- Troubleshooting Guide - This document would, in the context of a specific reference design, provide the necessary steps to identify issues (i.e. "show commands") and provide procedures to remediate problems. These procedures may be a specific set of configuration changes on a number of devices, executing more show commands, making additional configuration changes … repeat.
- Best Practices Guide - This type of document provides the network engineer with specific recommendations. Often these have to do with scaling and capacity issues. For example, a best practice may be to only utilize an uplink on a specific device running a specific version of code only at 80% due to "reasons".
A network team would need to sift through hundreds, if not thousands, of pages of documentation in order to go from "AH-HA" to "PING". Making matters much worse is if they want to work with different network vendors. Each document set is specific to each network vendor. The amount of complexity and minutia that a network team needs to manage is simply staggering. It is, in fact, at the same scale as being both an infrastructure engineers (as compared to server/DevOps) as well as being an application developer, noting that networking protocols are effectively distributed multi-vendor applications. In general, it is the operational complexity of the network that keeps a customer locked into a specific equipment vendor.
If we think about the use of the above documents, consider that they can be applied loosely into three basic types of network engineering efforts: Composition, Construction, and Consumption. When the network team is designing a network, making use of the Architecture and Design Guides, they are composing the design of the network service — they are making decisions which in turn will determine what is needed to actually create the network. Composition is one of the most difficult intellectual processes given the complex nature of networking services. Even when Architectural Reference and Design Guides are provided the magnitude of information and design choices can be overwhelming.
When the network team builds the configurations, creates artifacts like cabling spreadsheets, identifies specific values like IP-address, ASNs, VLANs, etc. and creates operational procedures for verifying services they are constructing the the network service.
When they put the network into production, and perform the actual verification, then the end-user of the network, i.e. the business applications, are consuming the network. At this point, the effort of the network team shifts to keeping the network operational so the business continues to make money.
At Apstra, we believe a new type of technology is needed that models the existing way networking teams build and manage networks. These tools need to be force-multipliers that deliver maximum Awesome with minimal human effort. We believe the way networking needs to think differently is to think holistically, vendor-agnostic, and distributed:
- Network teams need tools that provide computer assisted design automation starting with a reference architecture concept, and then holistically approach the management of the network services as a contextually-aware composition of configuration managements, state gathering operations, performing monitoring telemetry, maintenance operations, reporting, analytics, etc.
- These tools need to be vendor-agnostic, meaning that the network team's user-experience for managing the network is the same, regardless of the underlying vendor-equipment/NOS.
- These tools need to internalize that the network service is a composition of many devices, different vendor devices, providing different roles within the context of the service. As such, a network service is a distributed application, and the new technology must be designed to align with the distributed nature of both the underlying devices and the myriad of network protocols that compose the service.
- The most critical phase is Consumption, and the technology must provide Awesome capabilities to ensure the network is always ready for the business. These tools must provide the networking team the ability to quickly identify an issue exists before the end-user reports it ("Mean time to Awareness"), provide tools that quickly identify the context of the issue so specific action can be taken ("Mean time to Insight"), and provide maintenance tools that quickly assist in day-to-day operations.
At Apstra we understand the magnitude of building this complex technology. We are a company solely focused on bringing more Awesome to the industry in a unique away. With the DevOps movement as our guide, we know there was a time before DevOps and there was a time after — and the industry only going forward. The network industry is on the same journey. Our turning point is to take a new and different approach to managing the network.