An introduction

This is a semi-public place to dump text too flimsy to even become a blog post. I wouldn't recommend reading it unless you have a lot of time to waste. You'd be better off at my livejournal. I also have another blog, and write most of the French journal summaries at the Eurozine Review.

Why do I clutter up the internet with this stuff at all? Mainly because I'm trying to get into the habit of displaying as much as possible of what I'm doing in public. Also, Blogger is a decent interface for a notebook

Thursday, July 8, 2010

MongoDB

MongoDB (and nosql generally) is an appealing idea. The words written about it, though, are problematic: too much hype, too little documentation. That'll change soon; we're over the peak of the nosql hype cycle, into the trough. People are looking at the nosql systems they've eagerly implemented in recent months, noticing that they won't solve every problem imaginable. For now, though, every blogpost with mongodb instructions is prefaced with grumbles about the lack of information.



So, i spend a ridiculous amount of time figuring out how to do grouping. Have a bunch of download logs, want to break them down by country.
The simplest way I could find of doing this is:



db.loglines.group({ 'cond' : {}, initial: {count: 0}, reduce: function(doc, out){out.count++;if(out[doc.country] == undefined){out[doc.country] = 0;};out[doc.country] += 1;}});



Or, the version in pymongo:




> reduce_func = """function(doc, out){
out.total++;
if(out[doc.country] == undefined){
out[doc.country] = 0;};
out[doc.country] += 1;};
"""

> l.group(key = {},
condition = {},
initial = {'total':0},
reduce = reduce_func)
[{
u'AE': 215.0,
u'AG': 23.0,
u'AM': 140.0,
u'AN': 58.0,
u'AO': 56.0,
...
u'total' : 87901;
}]


[apologies for formatting; I've not really figured out how to edit js within a python repl]

No comments:

Post a Comment