What if we were to design a web-application (say, for example, a blogging program) that, from the start, was designed with the same goals that we would design a mass political organizing campaign? Not a small, local political organizing campaign but a massive, global campaign.

Let's say:

At the moment, there are precious few web applications that come near these goals (to be fair, there are precious few political organizing campaigns that adhere to these goals either). Most of our web applications (like blogs, or Drupal web sites, or wikis, or web-based databases) rely on a single server which, if removed, would cause the application to fail. Similarly, once a web site gets too many users, it can be very difficult to resolve that problem short of buying a new and more powerful server. Few of our currently designed programs enable us to securely accept help from others without fully trusting them.

In the corporate world, this type of design (minus the security/ownership part) is fairly standard. However, the tools are either proprietary or they are designed from a politically hierarchical perspective (rather than a single database server there is a cluster of database servers that still play a central role).

What would it mean to follow these goals in an web-application environment?

Authentication

Almost all web-applications have at least one piece of private data: every user's password. In an ideal world, we would all use a different password on every web application, however, we all know that world doesn't exist. Most people use the same password everywhere, making it an important piece of private information. For our purposes, we just can't use passwords this way without the risk of people having them compromised.

Open ID, a distributed authentication system that allows you to specify a third party to authenticate you, provides an elegant solution to this problem. If our application used OpenID for authentication, then the web application itself would never need to know the actual user's password.

However, I think there's an even better solution. The oldest distributed authentication system is Pretty Good Privacy (PGP) - commonly known by the free implementation called Gnu Privacy Guard (gpg).

If our application used gpg for authentication, the process of getting an account would go something like this:

The primary added benefit of using gpg is: every time the user posts a blog, they can digitally sign the blog, so that if the blog is every altered by a malicious system administrator, the signature verification would fail making that alteration easy to detect.

Distribution

How would this application work on multiple servers in a coordinated way?

I'll start by suggesting some terminology:

Suppose I announce to the world: I'm starting a new blog server for the purposes of supporting a global, coalition-based organizing campaign. One aspect of the campaign is to get coalition members, most of whom have never blogged before, to blog about the campaign. Anyone can get a blog by going to blogs.example.org.

I have already setup a at least one spoke server (a.spoke.blogs.example.org), so when you go to blogs.example.org, you get a functional blogging application.

However, since I'm expecting a lot of growth (and I want to respect the three goals above) I also put a call out for tech collectives and organizations that are interested in supporting the project by donating server resources.

I am then contacted by Group B who says: We have a server connected to the Internet that we'd like to donate to the project.

I provide them with a script that they run on their server which downloads the source files, runs some tests to demonstrate that it is fully working, and then reports back to me that it is ready to go.

Then, I update the DNS record for blogs.example.org so that it will, round robin style, include the new server's IP address when a user goes to blogs.example.org.

So far so good. Now what?

User Maria goes to blogs.mayfirst.org (and lands on b.spoke.blogs.mayfirst.org). She logs in with her gpg key and then clicks the button to create a blog. The server generates links in a form to ensure that she stays on the same server, although to Maria, it appears as though she is always on blogs.example.org.

She's asked what username she wants for her blog and she selects "maria" (maria.blogs.example.org). She hits submit and recieves a message that her request is pending and asks her to check back in a few minutes.

Meanwhile, b.spoke.example.org executes a dns query (the oldest distributed database system in the world) to see if maria.blogs.example.org is taken. If not, it submits a request to the authoritative name server for blogs.example.org requesting that maria.blogs.example.org be created and given it's own IP address. The authoritative DNS server could do any number of checks (complicated key exchanges or it could simply see if the IP address is an IP address belonging to a server in the DNS system), then it would create the record (provided nobody has slipped in earlier) and respond that the record was created.

Now, Maria can start blogging. When Maria publicizes her blog (http://maria.blogs.example.org), it is already setup to go to the right server, thanks to the domain name system.

Additional features could include: every user has a primary spoke plus one or more secondary spokes. All write requests are re-directed to the primary hosts, read requests are redirected to any of the secondary spokes. Perhaps when the original DNS record is created, it also creates edit.maria.blogs.example.org which points to the primary host, while maria.blogs.example.org is setup round-robin style to go to all of the secondary spokes. The secondary spokes are responsible for pulling in data from the primary spokes to stay in sync.

This approach is redundant: if one server goes down, a routine can be run on the name server to remove that IP address and, if the server is a primary server, then promote one of the secondary servers to take over as the primary server. It's also scalable. Servers can be configured to refuse new bloggers if they start running low on hard disk space, or they could add additional secondary servers for popular bloggers if they run low performance-based system resources. Ideally there would be administrative scripts that could transfer blog accounts from server to server.

Aggregation and Indexing

There are still a few features that we would want that don't scale very well. The whole purpose of building a giant blog network is to provide a sense of unity - we want all of the blogs to be aggregated and search-able to build this unity. No matter which server you land on, it should show you the most recent 20 blog posts from accross the entire network. You should also be able to type a search term in the search box and search all the blogs.

Aggregation and indexing would get significantly more difficult and resource intensive as we add more servers. Aggregating blogs on 5 or 10 servers is not so difficult, but doing that for hundreds of servers each of which have hundreds of blogs could become a monumental task. Although technically this is a scalability problem, given our model, it could be addresses by throwing more computers at it (as opposed to hitting a wall that can't be overcome without hardware upgrade). We could organize concentric circles of aggregating servers, so each server is only aggregating 5 - 10 servers and then an upstream server aggregates their aggregate. The same model could be applied to indexing. These network could produce an inner circle of servers that contain the entire aggregate and index for all servers to access.

Unsolved Problems

How would we establish lines of trust. While the system works well if one untrusted spoke flakes out, what happens if several spokes (which encompass the primary and secondary servers for a particular blog) all flake out together?