Why Collocation Matters

If you haven’t read “Words as Puzzle Pieces” yet, then please check out that post first.

Obviously collocation is the very crux of “solving” a puzzle. It also forms the basis of frames (frame theory) and scripts, which are crucial elements not only of stories and storytelling, but also of information storage and retrieval (whether using automated/”artificial” information systems or “natural” human psychology).

Language (and in particular here: written text) can be broken down into pieces. A story can be divided into sections, paragraphs are made up of sentences, and words are constructed out of even smaller morphological elements. Moving from smaller constructs to wider contexts, relationships become weaker: The words that make up a sentence are more constrained (by each other) than the sentences that make up a paragraph.

In a similar manner, words not only constrain each other but also implicate elements (as in “John rolled down the hill” could be interpreted as roughly equivalent [logically] to “John rolled John down the hill”). Sometimes, listeners can reliably predict the last word of a sentence before a speaker utters it (and/or before they read the last word of the sentence in a written text).

Noam Chomsky addressed this issue with his famous line regarding the notion that “thoughts sleeping furiously” seems grammatically incorrect — these words simply do no fit together in a similar manner that puzzle pieces to not correctly fit (in other words: they simply do not seem to collocate agreeably — in much the same way that arguments in a formula, algorithm or equation also need to match up in order to make sense).

Oddly, it seems intuitively plausible that such complements (or “grammatically available” arguments in a sentence) might be in a competitive relationship in an information retrieval setting — such that  a website like hotels.com might prefer to offer a user flights rather than to send them to flights.com, or books.com might not be interested in linking to a website about authors, publishers, or other implicated or semantically closely related websites. In other words: Relationships that might seem obvious may actually be underrepresented among the links between closely related websites.

One way to measure the sophistication of advanced information systems might be the degree to which they facilitate links between such “obvious” relationships — between such extremely closely correlated, more or less closely collocated phenomena.

