Musings on crowd-sourced data

I've just finished a phone discussion with one of my Linux gurus. Topics included how best to cross the intellectual and philosophical gulf between my present use of a simple, tab-separated, ASCII data file of my "Videos" data,1 consequences of my quite long-term dalliance with DVD Profiler (while using Windows-based PCs) to catalogue my growing collection, and now (in late 2015) my aim of seeing its intended metamorphosis into a well-tempered and probably more flexible Kodi system.

It's a "genre" thing

The quality of crowd-sourced data depends, very much, on the quality of the crowd being used as the source2 of the data. ("GIGO" still applies, naturally.) And I'm unaware of effective crowd-sourced data quality control mechanisms that don't at some point call on the services of nit-picking pedants such as myself — in the publicly-expressed opinion of one (at least) of my all-too-many managers during my lifetime in IBM. In the case of the "genre" classifications that members of the crowd of DVD Profiler users have been applying (I'm tempted to say "with gay3 abandon") to the 600,000 or so titles in that online database it's clear that a hardcore subset of these users has no clear understanding of the meaning of the word "genre".

How else could over 650 or so of my DVDs and BDs have initially ended up with the unhelpful genre "Television" attached to them? There are, of course, many further instances of data malfeasance buried among the titles, and I've been combing them out by filtering the data views and seeing what ends up in each genre classification. My goal (not a terribly lofty one, I admit) is to assign each of my discs to a single genre. Take, for instance, that recent acquisition (on BD) of David Kelley's "Lake Placid". Comedy? Certainly. Horror? Certainly not.


When I added poor old Richard Wilson to the cast list of "Prick Up Your Ears", I could then see that although he's correctly cross-referenced to "How to get ahead in advertising" and "One Foot in the Grave" the Richard Wilsons who appear in the 1947 "Lady from Shanghai" as an assistant district attorney (and in "Incident at Oglala" as a real-life Tribal Council Chairman) are both equally unlikely to be our cuddly Victor Meldrew. Slightly adapting an appropriate XKCD image:

Duty calls

The "alt text" tag on the original cartoon shows "What do you want me to do? LEAVE? Then they'll keep being wrong!" — how true.


1  An historic remnant of my peculiarly simple-minded — some might say "naïve" — approach to "databases" (here in Technology Towers, though probably nowhere else in the world). It can be traced all the way back to late 1985, and the £29 "AtLastPlus" DB program that I ran on my initial £399 8-bit Z80 CP/M-based Amstrad LocoScript word-processing system whenever I wasn't too busy using it for the CICS book I had foolishly allowed myself to be contracted to write for Ellis Horwood. (Refusing to accept an advance for that was one of my wiser moves, as it gave me a perfect excuse to duck out of the horrid task several years later without troubling my conscience over-much.)
2  It's a great shame that John Brunner went into such tantalisingly little detail on the workings of the "Delphic oracle" worldwide network that he predicted (and deployed to such good effect) in his novel "Shockwave Rider" over four decades ago.
3  That subset of my titles that has a hint of homosexuality (for random example, the Alan Bennett-scripted "Prick up your Ears" made in 1987 and based on John Lahr's biography of Joe Orton) can be almost guaranteed to have the genre "Special Interest" attached to it. The database crashed while I was loading it to check my facts. That gave me time to contemplate the scan of that particular film's back cover artwork while I backed up a freshly-repaired copy. It showed me that the fact that Richard Wilson appears in "Prick" (as a psychiatrist) had not been noted by any of the crowd of data suppliers. See what I mean about nit-picking?