Wave, the new X-Windows?

Enki 8/9/2009

While at the GTUG Campout in Mountain View, Jacob Rus and me turned Google Wave into a collaborative Scientific Computing Environment using scipy.

Working on Wave there and later independently on Operational Transformations (the algorithms behind wave), has led me to conclude that the protocols and ideas behind wave are far more interesting than the promise of the current implementation

Most observers are portraying Google Wave as the future of email or instant messaging, and celebrating or criticizing it as such. In the blog posts I've read, Wave’s merits are analyzed solely based on the current developer UI preview. Calling it a preview is giving it a lot of credit - let's maybe call it a debugger or proof-of-concept instead. Or in other words, people are discussing Wave like it is the newest social networking website, and not an infrastructure protocol under heavy development.

So let’s ignore the high-level “next-gen E-mail” sales pitch for a moment and take a look at the state of Wave today. At its heart is a technology called Operational Transformation (OT). It defines a set of operations, transformations, and the documents they can be applied to.

For Google Wave in particular, the components involved are:

  1. Documents that can be rendered to HTML.
  2. Operations and series of operations that can be applied to documents: For instance, insert a character ‘c’ at position 5, then delete the character at position 9 (Note: this is an example, not how wave's operations really work).
  3. Transformations of Operations: These are rules to change operations, so that if operations are received in different order on multiple clients, the document still eventually converges to the same consistent state.
  4. Robots that can monitor operations and create new operations based
    on the operations previously applied to a document.
  5. An early draft of a federation protocol that allows documents to be
    hosted on arbitrary servers instead of a central location, for example
    at Google.

That's pretty much all there is to it. It essentially defines the data structures of a collaborative document editor, and we don’t really know yet where many of the components attached to them will lead: The federation API is still a moving target, there's no client/server protocol, and robots speak yet a different moving-target protocol. Add a web UI (which right now doesn't even properly speak OT, and is built on an unspecified and GWT-specific protocol) and what you have is a collaborative web document editor with big ambitions. That’s exactly where Wave is right now, and the competition has been here before (maybe without the big vision, but who's to say?).

Now where can we go from here? HTML is a document display system repurposed as an application platform. Wave seems aimed at the same route. So how could that application platform look?

Let me explain using an analogy.

Traditional programs can either directly access input and output devices like keyboards and graphics cards, or they can use the X-Windows protocol, which is more abstract, but has certain advantages. With the X protocol, programs request graphics operations instead of instructing the hardware directly. This level of abstraction enables multiple programs to work on the same screen, cooperate with copy&paste or enhance each other like window managers do.

Programs can even use the X-Protocol over a network, which
means that a program and its server with the screen can run on different
machines. I say in theory, because graphics operations are ill-suited
for high-latency, low-bandwidth environments such as the Internet, and
X-Windows is a particularly chatty protocol. More efficient solutions
like RFB/VNC and NX have been developed, but in general, the pixel transfer paradigm has proven inferior to the high-level web application paradigm that has parts of the UI logic running on the clients in JavaScript.

Now I'm arguing that this new webapp paradigm is still like talking to the graphics hardware directly. Developing custom JavaScript code for changing documents requires re-inventing the collaboration and interoperability wheels over and over again. Wave could be the heart of protocols and standards upon which we can develop the next generation of applications without re-implementing everything over and over again for each new app. Apps would be robots, possibly partially running within the browser, and all I/O would be operations on documents.

This means that every Wave application would be collaborative "for free", and
you could combine multiple apps to interoperate on the same documents, like a latter day OpenDoc. Eventually, documents could be more than just static data, and using the same protocol and algorithms to build, let's say, a collaborative Photoshop in Wave is entirely possible, just like classic web applications have transcended the original scope of HTML.

To give you a little background: I am a software engineer working on distributed systems, and particularly on OT algorithms.  I am posting this from the GTUG Campout at the Google Campus in Mountain View where I have just built a few Wave-based apps with my co-conspirator Jacob Rus. Among them is a collaborative scientific computing environment (think Mathematica or Matlab - but based on the awesome Python Scipy environment - screenshot here), and a Wave Robot IDE, which allows you to write wave robots collaboratively from within Wave. Both projects will be opensourced soon. You should follow me here on twitter, if you want to get updates.

Thanks to Christiane Ruetten, Jacob Rus and someone knowledgeable who can't be named for reading drafts of this.

Discussion and comments on news.yc!