A 14-year-old could build 1998's Google using her Dad's credit card!

Enki 1/16/2012

Technology speeds everything up - more so in some areas than in others. While it's easy to see that erecting a 15-story building in two days is faster than it's used to be done, it's harder to get a grip on the giant strides we're making in software construction. Nonetheless, software is probably the discipline with the most reliably accelerating productivity.

Just a few years ago software wasn't moving all that fast. In those days most programming time was spent on squeezing the maximum performance from CPU and memory and programmers like John Carmack achieved lasting fame for their knack for teasing just a little bit more performance out of machines. And a few years before that, as prog21 points out, even writing a working basic spellchecker was a major achievement. You simply didn't have the memory to store a list of words. Never mind building a really good spellchecker, you'd be busy with optimizing the base case.

But whenever hardware advanced, it only took a jiffy till programmers would again use up all that extra power. At that point in time, software's progress was held back by the (comparatively) slow-moving physical nature of hardware. However computer networks were able to free software from the shackles of hardware: If you built your software right, you could just add more servers as you needed them, instead of relying on manufacturers to provide faster and faster hardware.

Google was among the pioneers of networked architecture and built its service using arrays of cheap commodity hardware from the start. But it wasn't until 2006, when Amazon introduced the Elastic Compute Cloud (EC2), that the chains of physicality were finally broken for everyone. Now you could add and remove machines from software as needed. This development opened up a whole new way of thinking about software to every programmer on the planet.

Three years later, in 2009 I proclaimed to a friend:

Today a fourteen year old could build 1998's Google using her dad's credit card!

To put that into perspective, there's three parts in play here: 1998, her dad's credit card, and the software engineering knowledge and tools required to build what Google built in 1998. Let's start with 1998: The internet was much smaller back then. By the end of 1998, Google had an index of about 60 million pages, and Larry Page's web crawler had been exploring the web since 1996. The total amount of data downloaded was probably in the ballpark of 200 gigabytes by the end of 1998. Storing 200GB of data on 33 bleeding edge 6GB hard drives at $370 a piece would have cost about $12,210 non-inflation-corrected at that time. That's without the hardware, electricity, nor bandwidth to actually operate machines to fetch and store the data. In 2009 you'd have gotten a 1000GB drive for $74.99 and could have run it from a single computer, while people regularly download more than 200GB a month on unlimited cable broadband.

So getting the data had become cheap enough for our hypothetical teen's dad's credit card. Crunching the numbers would require a few more resources, but 1998's internet was small enough to be crunched on EC2 with her parents's credit card. Serving the data and answering queries on a popular website without feeling sluggish would also require a few more servers. Let's assume 30 servers would handle the peak load for a start (Altavista was running off 20 multi-processor DEC 64-bit Alpha's in 1998). At not-the-cheapest-at-the-time $70 per server-month on Amazon, that would give us an estimated cost ceiling of $2100/month, with the expected cost being more along the lines of $500-$1000/month. Now I'm sure Dad wouldn't appreciate it, but for many people that's in the ballpark of what they pay for rent and thus affordable.

So theoretically speaking it could have been done in 2009. More than that, price-wise it probably could even have been done in 2006 without ever touching any hardware yourself. But of course money wouldn't have been the only difficulty for a hypothetical fourteen year old trying to build Google in 1998 or even 2009. Just because you can build anything you want in theory, doesn't mean you know how to do so in practice.

But not only has hardware gotten cheaper and easier to handle (through software), but during the same time our collective understanding of best practice software engineering has progressed by leaps and bounds as well. In between 2006 and 2009 many more people became acquainted with highly distributed architectures. More tutorials, libraries, and opensource software were written than ever before. Distributed Databases and NoSQL systems garnered a lot of attention. And with distributed version control systems like Git, code reuse and collaborative development became more commonplace. Many people stopped thinking about how to program individual computers altogether, and started thinking only in networked systems. Because of the wealth of information and code available, and a large community concerned with distributed system problems, a lot of things that were difficult a decade earlier, had become fairly easy to do.

Somewhen towards the end of 2009 I realized that writing Google as it was in 1998 didn't seem as daunting a task anymore. Not only was the hardware cheap, and the information on how to do it widely spread, but a lot of the functionality had already been implemented. Distributed Datastores? Check. Batch Data Processing? Check. Web Crawlers? Check. Web Servers? Check. The age of big data had arrived, because self-taught kids who wanted to could now play with it without asking for permission.

So do I think a fourteen year old could have pulled it off? I sure think so, but I realize in 2009 it would have taken a very tenacious fourteen year old. But that was 2009. If you are skeptic about my predictions for the past, how about applying them to today or 2015? How long till the technology that, at the time, precipitated the fastest wealth generation humanity had ever seen, can be replicated by a teenager with next to no budget during summer break? How long till it's something you write as a pet project while learning to program. Our tools are still getting more powerful, and our collective understanding of software problems is still increasing at a rapid pace. Do you think there's an upper ceiling for how good we can get at writing software? Are we close?

A few years ago, it was hard to find a library providing the functionality you'd need to write your application. A few decades ago the concept of reusable shared libraries had to be invented before you could write one. And if you'd wanted to use a hash table, if you even knew what a hash table was, you'd have to write your own. Today programmers have a wealth of collaboratively developed libraries for almost any purpose at their disposal. When you write software today, you're standing on the shoulders of countless giants.

Personally I believe that a large part of the reason we're collectively getting better at building software is that a portion of software written is software for building better tools for building better tools. We're not even close to peak productivity yet. We're so much drowning in solvable problems, that we don't even know what peak productivity in software engineering would look like.

If you like my writing or have comments, you should follow me on Twitter and Facebook!

Comments