Using git to collaborate on a paper

I’m working with others on a paper written in LaTeX.  It’s stored in a git repository.  I figured it would be easy for all of us to track and merge each other’s changes this way.  So my friend clones the repo, commits a change, and sends me an email to let me know.  At this point, I thought, great, let’s grab his changes, review and merge them.

I’m sure this is obvious and is documented elsewhere, but it wasn’t to me. It’s not conceptually different than two programmers sharing code, which I’m sure is one of the common usages of git.

So I thought, well, I can just track his ‘master’ branch in my repo and merge the changes.

# this doesn't work
git branch --track friend-master/path/to/other/repo master

No, this doesn’t work.  I have to add my friend’s repo as a remote in my local repo first.

#this works
git remote add friend /path/to/other/repo

This says “Hey, I’m interested in the repo at /path/to/other/repo , and from now on I’m calling it “friend.” Does this mean I can track the branch now?

#not yet
git branch --track friend-master friend/master

No, first I have to fetch the remote so that my local repo is aware of what branches exist there.

#this grabs stuff from my friend's repo
git fetch friend

Specifying “friend” is necessary, because git will pull by your default repo if one is not specified (and your default is probably “origin”). At this point, we can track the branch and merge.

git merge friend/master

Summary:

git remote add friend /path/to/other/repo
git fetch friend
git merge friend/master

It’s actually pretty simple. It might even be obvious, provided you already understand the way git works. Otherwise, you might find this useful.

Posted in development | Leave a comment

Evaluating Kyoto Cabinet

Today, I wanted to see whether Kyoto Cabinet could do a better job with table lookups than MySQL.

I have a 1.7+ billion row table in MySQL that has three columns, a 64-bit int and two 32-bit ints. This yields about 28GB on-disk in MySQL with MyISAM, which is about right if you multiply out 1.7 billion by 16 bytes.  The sole reason this table exists is to provide lookups on the two smaller ints from the larger.  Naturally, an index is warranted, and in this case, the index takes up about the same amount of space.  Okay, maybe a little more (29,478,814,827 for the table and 30,052,789,248 for the index).  Did I mention that the index takes about a day to generate on a machine with gobs (128GB) of memory and SSDs?

I figured, hey, I’m using it like a key-value store, so could I do better? Kyoto Cabinet seems like something to try, since I’ve heard good things about Tokyo Cabinet.

Here’s what I did:

For space efficiency, I stored the 64-bit int as an 8-character key and packed the two 32-bit ints into an 8-character value.  I used KC’s HashDB and set HashDB::TLINEAR, tune_buckets to 2 billion (2 * 2^30), and tune_map to 16GB (16*2^30).   TLINEAR is recommended for space efficiency, tune_buckets should be within a factor of 2 of the expected key count, and tune_map should reflect the expected overall db size.  I think my values were in the right ballpark.

What I found was that KC looked very fast for small data sizes, but its insertion time seemed to increase linearly as the number of keys inserted.  Here is a plot relating insertion time (in seconds) for a batch of 64K keys versus the total number of keys inserted.

You can see that there are a couple bumps in the curve, but the times seem to keep increasing: 222 seconds for 64K when 2M keys have been inserted, versus 1 second (or less) for the first few 64K batches.  This doesn’t seem like it’s going to work for 2 billion.

MySQL is looking pretty good here.  Though the index is large, it’s close to 16 bytes per record, which doesn’t sound that much bigger than KC’s 10 bytes.

I don’t know if I have any alternatives.  Perhaps MongoDB. I think it has about 12 bytes in overhead per record, just because of BSON, but if it has a more reasonable insertion time, it may be worth a look.

 

Posted in development | Tagged , , | 4 Comments