An introduction

This is a semi-public place to dump text too flimsy to even become a blog post. I wouldn't recommend reading it unless you have a lot of time to waste. You'd be better off at my livejournal. I also have another blog, and write most of the French journal summaries at the Eurozine Review.

Why do I clutter up the internet with this stuff at all? Mainly because I'm trying to get into the habit of displaying as much as possible of what I'm doing in public. Also, Blogger is a decent interface for a notebook

Saturday, November 27, 2010

thesaurus

take a large sample of text. Run it through NLT, looking for passages with multiple adjectives describing the same noun. or, to keep it simple, just passages like a *big*, *strong* man.

For each such coincidence, record a link between the two adjectives. big and strong go together

[my initial thought was to do this geometrically. imagine an n-dimensional space, where n is the number of adjectives in the english language. Place each word at 1 in its own dimension, and for every other dimension/word at the point given by some function of how often the two co-occur.

but that seems silly. It's more like a standard regression data-mining kind of thing.

Anyway, a project for a rainy day. And there's still need for some usable dictionary/thesaurus based on data-mining

No comments:

Post a Comment