Dicey Concepts Leading to Muddy Waters: Recall, Precision and Filter Failure

The first two of these three blind, dicey concepts have a long tradition in information science: Recall and precision are both based on another rather sketchy notion — relevance. The basic idea of relevance has a noble motivation: to return what the searcher considers to be truly relevant. However: What a searcher considers to be relevant will most likely vary wildly from day to day. There is simply no “true score” of “relevancy” that is valid for all people, or even for the same person in different situations. Beyond that, recall and precision are also often purported to be in some kind of inverse relationship. This is hogwash — please: do the math!

While the above fallacies are widely known in circles versed in information science (and in particular information retrieval), there is a much more grave fallacy that was spread years ago by a professor of journalism, most of whose work I find quite alright — but I am sorry to say that Clay Shirky seems to have muddied the waters greatly with his notion of “filter failure” (note that he has apparently not published anything about “filter failure” on his own website, but there are many videos on the web documenting a talk in which he proposed this idea).

Designing a tool to filtrate muddy waters is far less efficient than simply excluding them in the first place. If you pour a mixture of spurious particles into a melting pot, you should not be surprised to find a rather shoddy amalgamation as the result. The main problem in information retrieval today is not, as Professor Shirky argues, “filter failure”. The main problem is the failure to adequately select the proper sources of information (sometimes also referred to as “information resources”) at the outset.

