Fuzzy Language and Fuzzy Vocabulary

Since I wrote about words vs. names a couple weeks ago (see “Names vs. Words: Strings for Identity vs. Strings for Information“), I have had an uneasy feeling that something is not well… and I think maybe I have figured it out now.

The problem is this: online, there is no clear demarcation of words vs. names. As I indicated in the post I linked to above, neither is this true in a strict sense offline. However, even though many dictionaries exist for each of the most common languages (and even though they differ in the vocabularies they document, how they document these vocabularies, etc.), there is nonetheless a somewhat reliable order … such that anyone can be expected to “look up” any word in any dictionary and get a more or less reasonable explanation. Part of becoming literate involves being able to use a dictionary — indeed: any dictionary (more or less). Of course there are dictionaries which are unusable (as they are not well researched), but they are exceptions, not the rule. Most people depend on the notion of some standard dictionary, and such standard dictionaries describe the standard language.

As I wrote about 10 years ago in my first “Wisdom of the Language” article, languages will always be moving targets. We have to be able to deal with such “facts of life”.

But upon reflecting on the juxtaposition of “strings for identity” (names) versus “strings for information” (words), I notice a much more severe issue: It leaves no room for dictionaries. In the back of my mind, I have reasoned that all of the registered strings in COM would make up the “commercial” dictionary, all of the strings in DE would make up the German dictionary, and so on. But each of these lists of registered strings also includes a significant number of brand names (in other words: “strings for identities”).

How will we know whether a string has been registered for a specific identity or whether it is registered for informative purposes? My gut feeling “hunch” reaction is that there may very well be attributes of the website / content that more or less clearly categorize the string as this type or that type. It might go like this: The more evidence there is of a “grass roots” type of community involvement in how the content is managed, the more the string would tend to be a word used by that community. Less evidence of this, and more evidence of a “top down” authoritarian management of the content would point towards an individual or organization identifying himself / herself / itself with the string.

I realize this seems rather wishy-washy. Maybe someday I will figure out something more clear, but until then I guess I will just have to cope with such fuzzy notions: fuzzy vocabulary and fuzzy language.


