Lessons in Collaboration: nytimes-se.com

2008-01-08 8-minute read

On Wednesday, November 12, a collaboration of activists and pranksters distributed a remarkably high quality spoofed copy of the New York Times along with an equally artful website.

The reaction was phenomenal. News of the project was forwarded, blogged, discussed, txt’ed; subjected to rants and raves; and picked up by news organizations all over the world.

The server hosting the website, run by May First/People Link members The Yes Men got slammed. Almost immediately. And, was practically unreachable for the first day.

By the next day, the website was being distributed over four servers in three locations (later extended to six servers in six locations) and was being supported by activist techies spanning close to a half dozen radical tech collectives throughout North America, including May First/People Link, Riseup, Indymedia, Koumbit, Guerillartivism and more. And, the site was delivering page views faster than anyone expected.

What happened? How was this turn-around possible?

Day 0

Before this idea was even born, The Yes Men chose to host with a politically progressive provider, not a corporate provider. This step is one of the most crucial steps any politically focused organization can choose. Here are a few reasons why:

Activist providers have more resources. This assertion sounds counter-intuitive to the way we think about the left and capitalism - we’re used to being in situations where the left is under-resourced and the capitalists are awash in venture capital funding. However, when it comes to the Internet, hardware/capital is only one of many resources we need. Labor, particularly highly skilled labor available on a moment’s notice is far more critical. And activists providers are impressively organized with high caliber skills. Furthermore, the hardware costs are increasingly coming down to a point where even under-financed activist groups can afford them. And finally, in case you haven’t noticed, venture capital is waning these days. Really. Waning.
Activist providers are more flexible. We’re used to calling our corporate service providers and getting someone on the phone relatively quickly. That’s great. Except the person we get on the phone usually knows less than we do and has no power or authority to do anything. Activist providers don’t have the resources to be on call 24/7, but once we got on a problem, we have the flexibility, authority, and knowledge to help. Furthermore, we have the ability to call in our networks to bring in resources beyond our organization.
Activist providers actually want you to get more traffic so that you’ll win. We experience a lot of strange hand-wringing from our members over what happens if we get a lot of traffic. That’s understandable since commercial providers want you to get more traffic so that they can charge you more. Activist providers want you to get more traffic so that you’ll win. Most of the activist providers who stepped in to support the Yes Men will be getting higher than usual bandwidth bills this month. Although we’re all strapped for cash, this is a good thing and the very reason why we exist.
Activist providers extend the organizing project to the Internet. By choosing an activist provider, the Yes Men made a conscious and active decision to share their success with the radical tech movement. We’re stronger as a result, meaning the next time we have a similar situation with another site or another group, we will be that much more prepared and ready thanks to this decision by the Yes Men.

By hosting with an activist provider, the project had a different level of access to the network of people and organizations that eventually made the site sing.

Day 1

The site, which is running a free content management system called WordPress, is on a mostly dedicated, very powerful server (4 processors, 4 GB of RAM). Nonetheless, by the middle of day one the server was on it’s knees.

When the server went into overload, we immediately created an Internet chat room open to the public to help figure out how to get the site up and running. Although the handful of people working on the server were able to make small improvements, the real change happened when we were joined by our allies from Riseup and the network of tech activists they had access to. May First/People link has at our disposal two dozen machines in 5 locations around the country and an impressively skilled tech team. When we were joined by our allies, the resources at our disposable (directly and indirectly by our ability to grow even larger) became incalculable.

We spent the next several hours setting up caching servers around the country to reduce the load on the primary server. We experienced a lot of serious technical hurdles in the process, however, between the half dozen techies involved, we were able to declare success at 1:30 am. We experienced a few blips and minor problems, however, for the most part, the site was being successfully server to everyone who came.

Day 2

The next day we all monitored the site and the caching servers, eventually increasing the number of servers involved roe 4 to 6. We experienced a few minor problems, however, the site worked so well that conversation on the chat channel turned to brain storming new content to add.

Lessons Learned

“If we could do it over again” is a common refrain on all projects and this one is no exception.

It is very difficult to predict success and even harder to predict server load of a successful project, since every year the numbers of people who might view our site changes along with the software and hardware we use to power the site. At May First/People Link we have members predicting huge spikes that never materialize, while at the same time experiencing spikes they had no idea were coming. Nonetheless, with more advance warning, we could have had the caching network in place for the day of the launch. Additionally, we should have put a call out to our network for help immediately, which may have resulted in the caching system setup happening earlier on the first day.

For obvious reasons, the project chose to use the commercial DNS provider joker.com to host their domain name so that if anyone queried who was handling their domain name, the query would return:

a.ns.joker.com b.ns.joker.com c.ns.joker.com d.ns.joker.com

Yes, irresistible.

However, since the domain name was under the control of a corporation, not one of our allies, we were unable to properly control how long an IP address was assigned to the domain name nytimes-se.com. This lack of fine grained control made if difficult to switch to the caching system we had in place, and made it difficult to remove mis-configured caching servers.

Technical details

For the technically curious, below is a brief synopsis of what we did.

Our first move was to install and enable the WordPress supercache. For a while we thought it was broken because we experienced a lot of Redirect errors:

Request exceeded the limit of 10 internal redirects due to probable
configuration error. Use 'LimitInternalRecursion' to increase the limit if
necessary. Use 'LogLevel debug' to get a backtrace., referer:
http://www.nytimes-se.com/nytse/wp-content/themes/nytimes/style.css

However, we later realized that the errors were caused by an errant .htaccess file one directory up from the root directory.

Next we worked on getting the right number of MaxClients set in the Apache configuration. Too many clients and the server load sky rocketed. Too few clients and the server would start refusing connections. After a lot of back and forth we settled on 256.

Also, the following sysctl parameters were changed from 60 to 15 and from 7200 to 1800 respectively:

/proc/sys/net/ipv4/tcp_fin_timeout
/proc/sys/net/ipv4/tcp_keepalive_time

Next we tried to off load some of the large files (a few pdfs and mov files). After struggling for too long trying to get a fancy ModRewrite to work that would have allowed load balancing between multiple servers, we just put in a RedirectTemp to a single server.

Finally, we moved the database driving the WordPress site to another server.

For a few extra CPU cycles, we temporarily turned off two services: munin, and cron.

Despite all of these efforts, we were hitting loads of up to 170 and nobody could access the site.

We next worked on what appeared to be some WordPress rewrite rules going awry, which may have been contributing to the load problems (loops and loops and loops). At the same time we started setting up Squid proxy servers to help balance the load. Between the pool of techies working on the project, we got four squid servers up and running very quickly.

The next period was the most frustrating. It was really hard to trouble shoot the rewrite/looping rules after we had the proxy system in place - we couldn’t tell if the looping was caused by the proxy setup or was an undetected error prior to moving to the proxy setup.

Eventually, we stopped the loops with a one line WordPress plugin.

However, Squid and WordPress still gave us problems. They simply do not seem to get along very well (specifically - Squid re-writes http requests that is passes back to WordPress in a way that WordPress cannot handle). After hours and hours of trouble shooting, a brilliant 1:00 am suggestion was made: let’s switch to varnish (an alternative caching server). Varnish was setup in a matter of minutes (much simpler to configure) and worked extremely well. In the end, we got three varnish servers up and running.

The stats

A look at the stats is quite sobering. Normally, in our colo center in Telehouse, our combined membership uses just under 10 Mbits. Below is the graph for just the nytimes-se.com server:

[[images/nytse.png]]

And here are statistics for just one of the caching servers:

[[images/cache.png]]

The offline world

Fortunately, thanks to some members of the Rude Mechanical Orchestra (and many many others) there were other methods for people to access the paper:

[[images/paperboys.jpg]]