A Story From Another Century
A few years after its initial release, I had the opportunity to discuss development of Jini (more recently known as Apache River) with one of the lead software engineers on the project. Jini has a very complex distributed network architecture and making it work was no trivial feat. While he told me the story of creating the software, the developer wasn't bragging about the technology, he was bragging how smoothly the development went. The key factor was an intensive upfront design process:
The Development Process
It took the software engineering team something like three months of design work before anybody got to write even the first line of code. The story goes that after defining the requirements (and I imaging developing use cases) the team sat around conference tables testing the design by simulating the system. (Note I said testing the DESIGN, not testing the software.) Every developer in the room would represent an agent running on the system and the table served as the network. To send "packets" of data, you would write the details on a post-it notes and put it on the table to be delivered. As packets arrived people would discuss what they got and what they would do with it. Missing or incorrect data would be a sign that the protocol needed to be refined. Unexpected behavior meant would result in changes to the definition of a call. Later on, as the design matured, the testing would get more elaborate. Some post-it notes might get removed from the table before delivery or re-ordered to test resilience in the face of a mis-behaving network. (These kinds of things are critical in a distributed application because there is no such thing a network that is 100% reliable.)
Only after the design was complete did anyone start writing the code. Writing the code was the easy part, and it went remarkably smoothly. Everybody knew what they had to do and got most of their individual debugging done with unit tests. When the pieces were ready, integration testing revealed a system that mostly worked the first time they tried it. Was it perfect? Of course not, but there was remarkably little pain. The initial coding time for the preliminary implementation was closer to a third of the time required.
The key thing that made this possible was that the testing of the design resulted in a creation of a very well-defined API that everybody on the team understood. There was a mandate about how the interfaces would behave: "If I send you this, here's how you should respond", "If you send me that, here's what I'm going to do", "If I don't get the response I'm expecting, I'll do this", and so on. These are software contracts that ensure that all the pieces of a system will interact smoothly. These contracts or invariants ensure the predictability of a system and promote is usability.
What is Old is New Again
At a client site last week, I was retelling this story to a Dev Ops colleague of mine. We’re working together on automating a new multi-account AWS environment that will rely heavily on Control Tower and Cloud Formation. The significance of the story comes from our mutual desire to create in a way that ensures we get the fundamental design points right from the very beginning of our deployment. If we get something important wrong, it will be very difficult to go back and change it: schedules will slip, security could suffer and the engineering team that we're encouraging to migrate into our new environment may decide that the hassles and risk are too great to bear. In short, an early failure could lead to abandonment of the project that we believe is critical for the ongoing growth of the company.
The design is elaborate, and we know that we'll be needing some custom resources in our Cloud Formation Templates (CFTs) to take care of details that are currently offered by Cloud Formation or Control Tower.
For example, networking between accounts will be done using a Transit Gateway (TGW). When Control Tower vends (i.e. creates) a new account it will deploy a CFT in the new account that creates the network resources needed by the new account (VPC, subnets etc.) Since we want the process to be fully automated, the understanding is that the new account is responsible for installing routed into various Transit Gateway Route Tables to enable access back to itself. However, the TGW isn't in the same account and there aren't any native resources that you can put into a CFT that will allow it to update a routing table in a different account. So, we need to define a customer resource that takes care of that. The expected behavior of said resource needs to be clearly (and correctly) defined or networking between accounts in the new environment won't work (or will have to be performed manually...yuck!) That behavior I'm describing is tantamount to a software contract.
So, we've got the requirements and our use cases show how much better the new environment could be, but we both agree that the way to make progress is to ensure, like the Jini Team did, that our design is carefully tested. We're breaking everything down and walking through it in detail. The results of our work have been a set of compartmentalized items that should be easy to code. After coding, it should also be easy to use unit tests to verify that the obligations of software contracts are being met. As the design is finished, early indications are that the technique is working...so far, the coding has been easy and it seems to be meeting expectations.
My colleague and I are in agreement about the importance of this design-first methodology, and we're working to demonstrate its importance to other members of the DevOps team. In these days of infrastructure as code it's not enough to write a configuration like a programmer, you have to write it like a software engineer.
If you would like to discuss more about how DevOps teams can learn from Software Engineers and their methodologies, please contact us.
 Jini was first released in the late 90's so the conversation I'm basing this article on was not quite two decades ago. I'm recounting it to the best of my ability, but I can't swear that I have all of the details exactly right. When I can tell you is that the gist is correct. Any inaccuracies in the description are my own fault.