{"id":3,"date":"2011-05-21T01:33:25","date_gmt":"2011-05-21T01:33:25","guid":{"rendered":"http:\/\/imperfectcode.com\/?p=3"},"modified":"2011-05-21T01:42:53","modified_gmt":"2011-05-21T01:42:53","slug":"evaluating-kyoto-cabinet","status":"publish","type":"post","link":"https:\/\/imperfectcode.com\/?p=3","title":{"rendered":"Evaluating Kyoto Cabinet"},"content":{"rendered":"<p>Today, I wanted to see whether Kyoto Cabinet could do a better job with table lookups than MySQL.<\/p>\n<p>I have a 1.7+ billion row table in MySQL that has three columns, a 64-bit int and two 32-bit ints. This yields about 28GB on-disk in MySQL with MyISAM, which is about right if you multiply out 1.7 billion by 16 bytes.\u00a0 The sole reason this table exists is to provide lookups on the two smaller ints from the larger.\u00a0 Naturally, an index is warranted, and in this case, the index takes up about the same amount of space.\u00a0 Okay, maybe a little more (29,478,814,827 for the table and 30,052,789,248 for the index).\u00a0 Did I mention that the index takes about a day to generate on a machine with gobs (128GB) of memory and SSDs?<\/p>\n<p>I figured, hey, I&#8217;m using it like a key-value store, so could I do better? Kyoto Cabinet seems like something to try, since I&#8217;ve heard good things about Tokyo Cabinet.<\/p>\n<p>Here&#8217;s what I did:<\/p>\n<p>For space efficiency, I stored the 64-bit int as an 8-character key and packed the two 32-bit ints into an 8-character value.\u00a0 I used KC&#8217;s HashDB and set HashDB::TLINEAR, tune_buckets to 2 billion (2 * 2^30), and tune_map to 16GB (16*2^30).\u00a0\u00a0 TLINEAR is recommended for space efficiency, tune_buckets should be within a factor of 2 of the expected key count, and tune_map should reflect the expected overall db size.\u00a0 I think my values were in the right ballpark.<\/p>\n<p>What I found was that KC looked very fast for small data sizes, but its insertion time seemed to increase linearly as the number of keys inserted.\u00a0 Here is a plot relating insertion time (in seconds) for a batch of 64K keys versus the total number of keys inserted.<\/p>\n<p><a href=\"http:\/\/imperfectcode.com\/wp-content\/uploads\/2011\/05\/insertion.png\"><img loading=\"lazy\" class=\"alignnone size-medium wp-image-6\" title=\"insertion rate for kyoto-cabinet\" src=\"http:\/\/imperfectcode.com\/wp-content\/uploads\/2011\/05\/insertion-300x252.png\" alt=\"\" width=\"300\" height=\"252\" srcset=\"https:\/\/imperfectcode.com\/wp-content\/uploads\/2011\/05\/insertion-300x252.png 300w, https:\/\/imperfectcode.com\/wp-content\/uploads\/2011\/05\/insertion.png 440w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>You can see that there are a couple bumps in the curve, but the times seem to keep increasing: 222 seconds for 64K when 2M keys have been inserted, versus 1 second (or less) for the first few 64K batches.\u00a0 This doesn&#8217;t seem like it&#8217;s going to work for 2 billion.<\/p>\n<p>MySQL is looking pretty good here.\u00a0 Though the index is large, it&#8217;s close to 16 bytes per record, which doesn&#8217;t sound that much bigger than KC&#8217;s 10 bytes.<\/p>\n<p>I don&#8217;t know if I have any alternatives.\u00a0 Perhaps MongoDB. I think it has about 12 bytes in overhead per record, just because of BSON, but if it has a more reasonable insertion time, it may be worth a look.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Today, I wanted to see whether Kyoto Cabinet could do a better job with table lookups than MySQL. I have a 1.7+ billion row table in MySQL that has three columns, a 64-bit int and two 32-bit ints. This yields &hellip; <a href=\"https:\/\/imperfectcode.com\/?p=3\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[4,5,6],"_links":{"self":[{"href":"https:\/\/imperfectcode.com\/index.php?rest_route=\/wp\/v2\/posts\/3"}],"collection":[{"href":"https:\/\/imperfectcode.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/imperfectcode.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/imperfectcode.com\/index.php?rest_route=\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/imperfectcode.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3"}],"version-history":[{"count":7,"href":"https:\/\/imperfectcode.com\/index.php?rest_route=\/wp\/v2\/posts\/3\/revisions"}],"predecessor-version":[{"id":12,"href":"https:\/\/imperfectcode.com\/index.php?rest_route=\/wp\/v2\/posts\/3\/revisions\/12"}],"wp:attachment":[{"href":"https:\/\/imperfectcode.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/imperfectcode.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/imperfectcode.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}