Monthly Archives: October 2013

Riggr Mortis essay: Wikipedia as database

Note: this essay is no longer available online, but is reproduced here for its historical interest, and its applicability to current Infoboxes discussions. [See here for downloadable Word format.](republished under CC-by-SA license)

Surturz/Riggr Mortis on Wikipedia as a database


Revision as of 23:26, 12 July 2013

Context

I wrote this little essay on my user page in late 2012. I deleted my user page in 2013 (retired—no really!). Without my involvement, the text was retrieved at the request of someone with whom I have never crossed paths, and placed here in April 2013. Noticing this rather unlikely event and seeing a few inbound links to this page, I have edited it, also in April, to emphasize the main point. Riggr Mortis (talk) 08:18, 6 April 2013 (UTC)

Wikipedia as database: the lasting reason I’m retiring

I’m very uncomfortable with the ever-growing push to turn Wikipedia into a database masquerading as an encyclopedia.

I don’t mind contributing free encyclopedia text to the world; it is a “service to humanity”, and there is little chance that someone can take advantage of that volunteer work—to make meaningful money by selling what is already free.

But there is an ever-growing push for Wikipedia articles to be repositories of “structured data” that is meant to be parsed by computers, and not human readers. This structured data wears many hats; put simply, it allows computer programs to more easily deduce what the birth date of J.S. Bach is, or who directed The Godfather. When scaled to millions or billions of data points, it allows for the creation of interesting things like Freebase (owned by Google). You might say that as Wikipedia is to an encyclopedia, “Freebase” is to a computer database. But Wikipedia is now also acting as the source for such databases partially by the efforts of very single-minded editors who insist that is it within the scope of this project to create templates, infoboxes, and “microformats”[1] that specifically address the needs of those who wish to build online databases. The structured data, as with anything on Wikipedia, can be used by anyone, but it is most useful to—and more importantly, most obviously valuable to—technology companies like Google and Apple.

[Three-month update: the project manager for Wikidata announced in July 2013 that he will be stepping down to work at Google, which funded Wikidata development—which funded, in effect, a project that now harnesses free web workers to collect data about everything, for Google’s benefit. Coincidence, right? (That’s Google, the company whose motto “Don’t be evil” apparently doesn’t apply either to the never-satiated crony-capitalistic desire to externalize costs, nor to supplying the United States of Surveillance with the information of its clients.)]

In this regard, I happened across yet another “infobox war” that was explicit enough, in the following quote and other discussion, to make me acutely aware that I cannot ethically contribute here, so long as a brigade of editors forcefully and continuously add a “hidden back-end” to Wikipedia, that has nothing to do with the encyclopedic project, in order to aid the technologies of private companies:

Watson, SIRI, and Google all use the infobox data.” [links added]

This comment, one must remember, was being used to justify the addition of an infobox, a “summary box” that proponents say is useful to readers. Why, then, such regular references to “data re-users”, which always take up at least equal weight among the arguments in favor of infoboxes? Should you think the above is an isolated comment, it is not: since that time I have seen references to Google’s use of “structured infobox data” at least three times during various infobox wars. One must wonder why these editors are so adamant to make it simple for Google or Apple or IBM, all companies who employ the smartest programmers on earth, to conveniently parse the birth date of J.S. Bach[2] and feed it back to a user through whatever query mechanism makes them money (cell phones, search engines, and all that comes next).

I object strongly to this agenda on Wikipedia, because it is beyond the scope of the project, and because, with the endless desertion of people adding real encyclopedic content, Wikipedia will gradually become a pseudo-encyclopedia, with each page having been tailored more to pretty infoboxes and the “emission of metadata” than to human learning (yeah, you know, the hard job—to write and to share understanding). You want new editors? That’s a shame, because the wall of impenetrable templates and magic words will only increase under the current “vision”. (I will leave to the reader’s imagination the effect of Wikidata on this enterprise. But here’s a hint.[3])[4][5]

At the root, this is about failing to promote encyclopedism. A long-running trend on Wikipedia continues: emphasizing the simple man’s notion that there are nothing but “facts” (property–value pairs) in the world: no interpretation, no points of view, no context. Infoboxes do it; structured data does it; Wikidata does it. Almanacs also do it: of course such information has its interest, its purpose, and its place. But Wikipedia is an encyclopedia. It is trying to do something very unique, by original conception, and let me tell you, if the concept is watered down by people who mostly talk about “data”, it wears at the foundation of the encyclopedia, because data collecting is so much easier than what Wikipedia, the encyclopedia, ideally asks of you.

This is not what I signed up for. I like art, for example; it is interesting to read about who might be depicted in the Mona Lisa, but your favorite question-answering search engine or talking cell phone will not be able to tell you about that history. It will tell you this.[6][7][8]

As long as Wikipedia drifts from its origins as a tool for human learning to a second-rate quasi-database—apparently to the benefit of ADD-inducing tech companies—I will no longer participate as a volunteer. Neither should you.

Sincerely, Riggr Mortis (talk) 06:58, 27 November 2012 (UTC)

1.      Jump up ^ On Wikipedia, the presentation and content of these “metadata-emitting” elements have been highly entangled. Recent discussions have shown their proponents to be unwilling to admit that content and presentation can be separated; presumably, doing so would dilute their overall project, even as it might quell some of the concerns of people who dislike certain infoboxes. See MOS/Infoboxes talk page as evidence.

2.      Jump up ^ I couldn’t make this stuff up, yet it is presented by one of the more rational voices in the matter: “What it does do is allow Google (or whoever) to answer a question like “Was Bach a singer?”; “At present only infoboxes do the job we want for Google and other re-users” (emphasis added); “Did you even look at [the] Intelligence in Wikipedia [Youtube video of] a Google talk dated 2008?”; “in your mind other users like Google wanting – for more than five years – to make use of our content is ‘secondary’ ” – hmmm…. Yes? It that allowed?

3.      Jump up ^ This link purposely brings you to an edit screen, because you need to see what the wiki-text looks like. (This link is likely to go bad at some point, as it goes to a test version of Wikipedia where the Wikidata extensions are already enabled.)

4.      Jump up ^ Wikidata, incidentally, was partially funded by Google.

5.      Jump up ^ Long before Wikidata was proposed, I predicted on someone’s talk page that Wikipedia would eventually turn into a place where the article on Tokyo would have a lead sentence like “{{Japan.getCapital}} is a {{$.administrativeDivisionType}} with a population of {{$.getPopulation(2013)}}.” I can’t find that diff now, but hopefully you’ll accept a similar one as proof. Point is: we’re getting closer to that. If the community doesn’t reject Wikidata for all but the most obvious use cases (current, changing data), it’ll be a disaster. Why would you put the unchanging fact of Shakespeare’s birthplace as a magic word in his infobox, making it dependent on an obscure mechanism, when you can write “Stratford-upon-Avon”? Why do I have to ask that question? But I expect it will have to be asked, because once you have a neat technology, it must be used! Let the tail wag the dog for nth time on Wikipedia with regard to feature sets.

6.      Jump up ^ Scroll down to “depicts”.

7.      Jump up ^ Now do you see why some people dislike infoboxes in some topics? No, probably not…

8.      Jump up ^ I like IT topics; I like databases, and work with them. One editor called the infobox opponents at Talk:J.S. Bach “Luddites” [ok, “Luddite mentality”, if there’s a difference] – much more of an all-encompassing negative judgment, I expect, than simply being sworn at—then got upset at the lack of harmony!

Advertisements