Out of Sorts, take 3

Having, erm, sorted out book list sorting back in January 1989 (on my Amstrad CP/M PC), and, a year later book list sorting (under Acorn RISC OS), how about some really modern book list sorting, undertaken with the help of a SQLite DB and some adroit Python snake charming? On a 64-bit Linux Mint 17.2 system?

Funnily enough...

... I recently found the book by Margaret Visser I mentioned last time. But that's a different story.

During recent walks, I've been teased by Brian for my Luddite approach to data file handling. He maintains it can always be done better using Python. "It" being text field handling... basically. For the last umpty¹ years I've been keeping simple ASCII data files of my books, videos, and music. Until 1994, I would occasionally print these out as jolly nice DTP catalogues that the print room staff in the basement of IBM Hursley house were happy to copy, collate, and bind for me in return for small-scale financial inducements. Or even just a few good jokes, leavened with some of my more scandalous gossip.

What happened in 1994? Basically, the web.

By the time it arrived, I was actually well-prepared for it. (And delighted by the way the IBM Lab adopted it so quickly.) The books I'd been writing in IBM (well, the official ones, at least) for a decade or so had all used a variant markup language, so HTML seemed like just another set of tags to handle. With less paper² to be printed.

Having seen the ease and usefulness of HTML and web pages, my book lists have lived on in cyberspace. With the invaluable assistance of the block and column manipulation facilities of the excellent TextPad editor on Windows I would sort them by Author, by Title, by Genre, by price, by date of purchase... and then lazily whack them in between a pair of <pre> and </pre> tags, with a bit of HTML / CSS wrappering, and just let the user's web browser page rendering engine do all the hard work. Tab-separated data. What could be easier?

"Why do it that way?" asks Brian, visibly offended and horrified at the thought of treating data in such a haphazard, cavalier way. (He was too polite to say "amateurish", but I've known him long enough to know that's what he was thinking. And he's right.) "Show me a better way", says I. He very soon did.

That's why I'm now the proud owner of a 9,000+ entries SQLite DB stuffed full of properly organised books data fields. And a matching set of five Python query programs to extract the data (which SQLite will kindly sort in any order, or set of orders, desired, when in the act of extracting what's been requested and handing it over to Python). Python then formats it neatly (with some CSS assistance from me), looks after all the special cases introduced by authors whose names contain accents, or who insist on a surname beginning with a lowercase³ letter, or book titles similarly afflicted. At the end of all this data mangling, of course, the finished web page files are indistinguishable from the hand-crafted horrors I've been laboriously producing at fairly regular intervals for the last couple of decades.

However, there is one crucial difference.

On my quad core i7 BlackBeast PC all five generated web lists are produced — ready for uploading straight on to my web server(s) — in less than two seconds of CPU time. And perhaps 30 or 40 seconds of total typing time at a terminal prompt. Now, even given my reasonable level of skill with TextPad it could easily take me a couple of hours to prepare one of these files manually on Windows. (Mind you, I still remember the first time I successfully sorted and merged⁴ my books database on CP/M, too, using Mallard BASIC. It took over 20 minutes to chew its way through just over 5,000 records. Let's not even think about how long the subsequent noisy impact printing would then take, too.)

The five Python programs are:
1. CreateDataBase.py — to populate the SQLite DB "David.db" with the contents of the NewBooksMaster.txt ASCII file
2. AllByAuthorHTML.py — to create the (internal) All Authors SHTML listing
3. AllByTitleHTML.py — to create the (internal) All Titles SHTML listing
4. AllByDateHTML.py — to create the (internal) All by purchase date SHTML listing
5. PublicByGenreHTML.py — to create the two (public) SHTML listings, subsorted by genre

Date: 7 September 2015

See how they run!

A recent example, run on 9th May, 2016:

Footnotes

¹  In the case of books, "umpty" is an integer number of years approaching 50 though the lists were (as mentioned here) originally hand-written in notebooks, and later listed on a card index system of Baroque complexity. I was an odd child.
²  I was greatly amused (though not in a good way) when summoned on one occasion up to the Lab Director's office and told, on arrival, to explain to her secretary the art of printing out web pages so that the Lab Director could then read them. Indeed, I have often been greatly amused by the evidence I've seen with my own eyes of senior IBMers who variously failed to engage with new-fangled technology of various sorts. Not that I'm squeaky clean in this respect. (Or any other, for that matter.)
³  Author ee cummings managed a Doubleplusungood with both his name, and his book title: "may i feel said he". As for John Cuneo's "nEuROTIC" (a lovely little book of vicious cartoons), what can one say?
⁴  Both disk and memory were at such a premium back then I was forced to keep three separate data files of my books: fiction, non-fiction, and science fiction. Hence the need for some adroit merging.