Sparkleshare

2012-01-08 5-minute read

Sparkleshare is a file sharing utility that keeps folders in sync on multiple computers in a similar fashion to the commercial and proprietary Dropbox. Sparkleshare recently released a Windows client, finally allowing me to start switching my co-workers at the Progressive Technology Project away from Dropbox.

Overall, I’m very impressed. In particular, I appreciate the Sparkleshare authors’ decision to build on top of existing tools (git for storage and revision control, ssh for transport, and ssh public key infrastructure for authorization and authentication). That means I don’t have to learn new tools and protocols to debug and it means Sparkleshare can focus on the file sharing pieces.

Despite my overall enthusiasm, I do have some serious concerns.

Protecting your credentials

A researcher found a startling security flaw in Dropbox - if you can copy a particular file from a user’s computer to your own computer, you can impersonate them flawlessly, accessing all their Dropbox files without having to know the user’s password. Unfortunately, Sparkleshare suffers from this same vulnerability if you use the default configuration.

When you first install Sparkleshare, it creates a password-less ssh private/public key pair and then makes the public part easily accessible to you so you can add it to your server (or gitorious account, etc). Very convenient. But it also means that all an attacker needs to do is copy your private key (and your Sparkleshare configuration file) and they get complete control over your files.

This problem is easily avoided. If you are running Linux or Mac OS X and you have your own key loaded in your ssh agent, Sparkleshare will happily use that key. So - simply by using your existing (presumably password-protected key) on your remote Sparkleshare servers, you can mitigate this problem. Sparkleshare will still load your Sparkleshare generated key, but if you don’t provide that key with access to anything, no harm is done.

Confirming each use of your key

However… that leads to a new problem. If you are like me, your ssh agent is configured to ask for a confirmation every time your key is used. And, Sparkleshare regularly polls the remote git repository for changes. At best repeatedly clicking to confirm is tedious. At worst, it prevents you from intelligently rejecting malicious requests, thus defeating the whole purpose of the check.

It is possible to launch Sparkleshare via ssh-agent in an environment in which you are not requiring the confirmation when Sparkleshare uses your key, but still requiring it for all other uses. However, given the trade offs, I’ve decided to add a password to my Sparkleshare provided ssh key rather than using my existing key:

ssh-keygen -p -f ~/.config/sparkleshare/sparkleshare.jamie@progressivetech.org.key

Now, I am prompted to enter my passphrase when I start Sparkleshare and don’t have to confirm every use of the key. And, I continue to confirm each use of my main key.

git was designed to store code, not documents

Just because something is designed for one purpose doesn’t mean it can’t be used for another. However, there are a few limitations.

Large files are one. git can handle files over 100MB, but may run into memory problems. I had to alter the git windowMemory setting, raising it higher than the size of the largest file.

Another problem is disk space. Since git keeps full revision history on every machine, you have to download more data than just the files that are checked out. The more edits you make to a repository, the more disk space beyond what is checked out is needed (and deleting files won’t help).

Lastly, you can’t use file modification times. With git, the file modification time will depend on when you checkout the files. With some fancy hook writing, you could tweak things so that the file modification date is the same as the commit date, but that still won’t help you if you add an existing directory to Sparkleshare because all the files will have the same commit time.

This “bug” has been reported to git and it has been rejected because mucking with the modification time of files can have bad results when you are using make to compile code. As Linus colorfully put it:

I'm sorry. If you don't see how it's WRONG to set a datestamp back to something that will make a simple "make" miscompile your source tree, 
I don't know what definition of "wrong" you are talking about.
It's WRONG.
It's STUPID.
And it's totally INFEASIBLE to implement.

Well, did I mention that git was designed for source code?

No server validation

Of all the problems I encountered, this one is by far the most serious.

Perusing ~/.config/sparkleshare/debug.log is very informative. On a default installation, after you have added a project, you’ll see:

19:32:27 [Fetcher][/home/jamie/SparkleShare/.tmp/bar] Fetching folder: ssh://foo@bar.org/foo/bar
19:32:27 [Fetcher] Disabled host key checking for bar.org

Wah.

Once the project is added, whatever key fingerprint is given is stuffed into your ~/.ssh/known_hosts file. Host key checking is only disabled when you initially create the project, so if you connect to the proper server the first time, you are protected from subsequent man-in-the-middle attacks. However… if the initial fingerprint is wrong, your personal ssh configuration is now poisoned. This seems like a very bad idea. I’ve opened a debian bug to address it.