Friday, June 28, 2013

SilentCircle and technical debt

I hate to beat up on Silent Circle -- when they come for our crypto, it'll be Silent Circle testifying in the Senate defending our freedoms (that's why I give them money). However, their software is horribad.

That was displayed this week with Mark Dowd's discovery of 0day vulns in the ZRTP library used by Silent Circle and others. That Silent Circle has vulns isn't the problem; vulns happen to the best of companies. The problem is Silent Circle's response to the vulns, which has been craptastic. They've done a poor job informing customers of the problem (a traditional press release and a twitter update is needed). That they could not immediately release a fix demonstrates that their software has enormous technical debt.

The phrase "technical debt" is misused/misunderstood by the cybersec community in order to mean that companies should spend more effort to get rid of 0days vulns. This is nonsense. Instead, technical debt is about how companies respond to 0days.

Here is the way that a company like SilentCircle should function. I (a coder) should be able to go into their source code, make a few changes to fix the Mark Dowd 0days, and hit the "go" button. A couple hours later, the Android/Apple updates should hit the appropriate stores.

They can't do that, because of their technical debt. Among the things they lack is automated testing. Before a new build of the product can be shipped to customers, it must go through manual testing -- a process that can take days.

The term "technical debt" is the measure of how far your engineers are from ideal engineering process. It encompasses a lot of things, from version control, to documentation, to modularity. The further an engineering organization is from the ideal, the more costs they incur maintaining code. The culture of most companies is to tolerate some technical debt. For example, most engineering organizations have some automated testing, but very few can ship a product update with no manual involvement.

Google can do this. In the latest pwn2own contest where hackers discovered vulnerabilities in Chrome, an update was shipped to all customers within hours. Chrome is a massive, complicated product, but fixes can be shipped in quasi-real-time because the entire testing process is automated instead of manual.

Likewise, I accomplished this with my own software. Back in the day with BlackICE, we could make changes and ship to customers within hours.

In the short term, creating automated regression tests is expensive. In the long term, they pay off enormously. That "long term", though, is often "within the next year".

In 1998, I left Network Associates on a Monday to start my own company. Tuesday morning I woke up, and started writing my regression test for BlackICE. This was before the first line of real code for BlackICE was written -- this was even before the design process started. Indeed, the needs of the regression test largely dictated the design.

This was a difficult decision. The company was self funded (by me and two friends). Every day late shipping the product meant more money flowing from my pocket. I wanted more technical debt -- I was  eager to make sacrifices that would incur costs in the long-term in order to ship the product earlier, to start getting revenue, to stop the drain on my life's savings. Once revenue starts coming in, then we can repay the technical debt. So why waste time on the regression test? Because it starts paying off on the time horizon of 12 months. Spending the extra week at the start creating the regression test meant being able to meet the ship target in 12 months rather than 13 months.

SilentCircle didn't do the same at the start, and will find paying back this technical debt difficult. SilentCircle is still at the stage where all risks are existential. They don't know if they can create a sustainable business. Effort spent now creating a fully automated regression test means missing development targets, which means lack of sales, which means possibly going out of business before that regression test can pay off. I can see what's wrong with their company, but I have no clue how to fix it.

I'm critical of the proprietary software firm SilentCircle here, but the same applies to open-source. The ZRTP library that everyone uses has instructions on how to build the software, but nothing on how to regression test it. As far as I can tell, it has no automated regression test for the library. That's just stupid: no security library should exist without automated regression tests.

Conclusion

This post was two-fold. The first was to talk about SilentCircle's problems, which are instructive to any startup in the cybersec industry -- technical debt with a 12 month payback should be solved first. The second was to talk about technical debt, which is rarely (if ever) used correctly in the cybersec industry -- reducing bugs is not the point of technical debt, but how bugs are addressed.




Update: At BlackICE, we'd never heard of the term "technical debt". We called it "code deficit". At the start of the project, we avoided code deficit. As the deadline neared, we started piling on the deficit.

We had three time frames: 1 year (the ship date), 3 years (sell the company date), and death (when we no longer cared). For example, we discussed what to do about 'time_t', the 32-bit time variable that expires in the year 2038. Since that's after we retire, we didn't care, so added that to our list of code deficit. On the other hand, we spent a little (not much) effort at Y2K, because that was within the 3 year window before selling the company.

So, 15 years later, IBM is still selling the product, since paying back the code deficit we created in 1998.

Update: At BlackICE, the frequent updates would get pretty obnoxious. While on the phone with a customer, I'd fix a bug, check it in, run the full regression, and ship it directly to the customer. This forced us to create weird version numbers like 1.8.3ebj, where the "version" indicated changes in major features, while the final letters was just the build number, which incremented several times a day.

1 comment:

Matt said...

Great post. It is reprehensible that companies or projects release untested code. Especially security software.

The company I work for has a (non-security, 3-tier business application) that was started in 1998. No tests were written at the time. Much of the application is now untestable. We are re-architecturing it to make it testable, but that will take years.