I know a guy named Ben. Ben is a white-hat hacker who works to protect billion-dollar assets. The adversaries he defends against are hardcore.
Ben and I worked at the same place a few years back. He taught me a very important lesson about deployment security: Never trust a pre-compiled binary.
Ben had a justifiable fear that any pre-compiled binary could have the digital equivalent of the bubonic plague hidden deep in its bits. Once stored in a company’s repo, it’s only a matter of time until disease runs rampant through the hosting data center.
Ben thinks the best way to deploy code is to build all artifacts from scratch against a verified source, then put those binaries into a well-controlled enterprise repository. Also, you should make sure the binary is signed and the build information is documented in an auditable manner. Never trust the world outside the enterprise. There are just too many accidents waiting to happen.
It’s a good lesson, one that’s stuck with me to this day as I think about DevOps testing.
I wish more companies were as concerned with safety and security as Ben. It’s been my experience that too many enterprises will sacrifice safety to save some time. Faced with a choice to delay a deployment because of security concerns or release today to meet market demand, a “get it out” agenda will often win.
That’s when the accidents happen. Just ask Target and Yahoo. Jeepers! The RNC made voter records publicly available without bothering to protect them with a username and password.
What keeps you up at night?
Back to Ben. As you can probably guess, Ben is my go-to guy when I want to get the latest on what’s happening in today’s world of security. During a recent chat, I asked, “What keeps you up at night?”
“Well, other than security practices that are concerned with nothing more than getting past the next audit, and companies investing more in AI than in people, hoping that AI will solve all their security problems for fewer dollars, the biggest thing that keeps me up at night is the code DevOps is writing for the infrastructure.
Imagine this: What if I can get into your CI/Jenkins environment or your Git repo and put in a malicious piece of friendly-named code. What is the likelihood you’re going to discover the intrusion using nothing more than a visual inspection? Slim, I’d say. Tens, maybe a few hundred engineers have access to the deployment. On the face of it, you might just assume that the malicious code is just some plain-old developer code.
Same with config DBs. Let’s say I don’t want to take down your system by destroying the data connection. Rather, I want to create a backdoor so I can have my way with things whenever I feel like it. So, I just get into the DB and add a few port numbers, which will allow me entry to a set of servers any time I want.
Would that keep you up at night? It keeps me awake.
A basic premise of DevOps is that we treat the infrastructure like code. Data storage, artifact deployment and environment provisioning — these are all objects to be manipulated. So we write scripts and programs to do all the work for us. But, as with any programming, there’s good code and there’s bad code. Faulty loops, poor memory allocation, buffer overruns, race conditions, crappy exception handling, bad cryptography. These are the things that can cause a security nightmare.”
I asked Ben, “So then, tell me, how do we address the infrastructure as code issue, in terms of security?”
“First we need to write code to test the code. Sounds strange, but that’s what we need. And, that code needs to be tested against the unhappy path.
Most programmers don’t think defensively; everything is a happy path. I take the opposite approach. I test for the exceptions. I test using bad data. I try to do SQL injection. I inject a string when an integer is expected. I want to see what happens. Test the edge cases. Do penetration testing. Attempt code injection. Do it all, and do it in a formal, verifiable, automated manner.
Read The 24 Deadly Sins of Software Security by Howard, LeBlanc and Viega. Also, become aware of projects such as Google’s OSS Fuzz and Netflix’s Simian Army, a suite of testing tools that evolved from Chaos Monkey. These guys are dealing with security at scale. You don’t have to reinvent their wheel.
The important thing is to have tests that are comprehensive and execute in a DevOps infrastructure that has a good audit trail. The code needs to be built in a trusted manner and that code needs to be properly signed.”
I sat silent. My mind wandered back to the past struggles I had at the programmer level getting more than one inexperienced project manager to give me the leeway to do adequate testing. Now, Ben is telling me that such rigorous, automated DevOps testing needs to exist at all levels, that infrastructure as code demands it.
It’s déjà vu all over again.
TDD and DevOps testing
I am a big supporter of test-driven development (TDD). I can’t imagine not doing TDD. Over the years, I have become a bit more flexible in my approach. Sometimes I’ll write a test that exercises the case I’m trying to implement before doing the code. Sometimes I’ll write the test after the code.
But I always write the test, and I always go for 100% code coverage. Also, I test for both the happy path and unhappy path. I design testable code. I hate getting a call at 3 a.m. from an upset production engineer claiming my code is blowing up.
So as I listen to Ben, it’s obvious that we need to apply the sensibility a TDD developer brings to programming for DevOps and DevOps testing. As we’ve learned on the terrain, testing is not something you do when you arrive at the destination; it’s something you do continuously throughout the journey. This means testing any automation artifact at the time of creation, whether it’s a small piece of code that merely compiles some source out of a GitHub repo or a monster script that configures containers for autoscaling under Kubernetes. DevOps testing needs to happen often and now, not sometimes and later.
As I mentioned earlier, TDD can be an organizational struggle. The perception is TDD slows things down. Yes, TDD will slow things down if testing an artifact designed without testing in mind. Design infrastructure to be tested and your delays will diminish. Eventually, the deployment speed will attain a usual level of velocity.
Still, many times perception wins over facts. The pressure to get product into production can make us cut corners. Too many times, security testing is one of those corners.
Snip, snip. Then one day, after cutting the corner and deploying the code, an enterprise wakes up to find that hackers have hijacked all the credit card and Social Security numbers on record. The phones start going off at 3 a.m., and the lawyer gets a call at 9 a.m. The drama begins.
It’s the stuff that can keep you up at night. Ask Ben. He knows.