Librarians at large: Dave Lyons, Data Plumber

This post comes to us from Dave Lyons, ILN Ambassador to China

I am a librarian who does not work in a library – in fact, I only did once for about a year, cutting my teeth on Cutter numbers and spine labels. My job title, Data Integration Architect, requires explanation sometimes even within the software development community. My boss and I routinely try to think of something less opaque. I am seriously considering changing it to Data Plumber, because that analogy seems more accurate and straightforward: I find sources of data, I build pipes and storage and filters, and I make sure the caches flush. Civilians don’t see the connection to libraries, but in many ways I am a type of serial librarian. All the data I work with is related in some way to scientific journals, such as ISSNs, DOIs, ORCID IDs, impact factors, altmetrics, classifications, abstracts, and citations.

It’s commonplace now to talk about libraries being at risk, and in terms of budgets for brick-and-mortar institutions this is in many cases true. But while we debate the future of libraries, whether they are now to become community centers, cafes, and even bars based on their long-standing role as a “third place”, or transform into archives dedicated to preservation rather than user services, it doesn’t follow that librarians are anachronistic or obsolete. To the contrary, our mission remains as it ever was, to “ensure access to information for all”, and our profession core competencies – reference, user services, cataloging, metadata, indexing, classification – are extremely relevant and sorely needed in the digital world we now inhabit.

How I ended up working software development has a lot to do with why I went to library school. I had originally planned to pursue an advanced degree studying the history of Xinjiang, a far western province of China. The best resources are scattered in libraries and archives around the world, from Japan to Russia to Sweden to the United States. A complete history would draw on primary and secondary sources in dozens of languages, some dead or nearly so. For any given place or person or thing there could names in a dozen languages, each with their own variations in spelling or phonetic system, and I would Google these things and wonder if someone collected those sorts of synonyms and put them in a search engine. Soon enough I was more interested in information seeking than the information itself, specifically metadata. The sort of data I wanted to work with was curated and authoritative, and that was the purview of librarians, not computer scientists.

Software doesn’t do anything without data, whether an external input or generated by the software itself. Finding, evaluating, and cleaning data is a practically a full-time job for data scientists (I’ve dabbled in data science but I wouldn’t call myself one without a strong background in statistics, which I don’t have. Yet.) Data scientists sometimes call this “data janitorial work”. Designing seemingly simple data structures can be surprisingly complex, and programmers regularly stumble into problems that any cataloger will recognize. There’s a whole series of articles online titled “Falsehoods Programmers Believe About…” names, addresses, time, and geography, which raise issues that ought to be familiar to anyone who has looked at AACR2 or RDA. Software is basically lists of instructions according to lists of rules being carried out on lists of stuff, making more lists of more stuff and more lists of instructions. It’s lists, not turtles, all the way down. We know lists.

Librarians who work in software development are a growing community, gathering in groups like Code4Lib and DST4L, and interviewed by podcasts like Beyond The Stacks.

Increasingly, development works like lego building blocks and for very little money you can tinker with building a wide variety of fun and interesting things with barely any coding. Spin up a server on Amazon Web Services, Digital Ocean, or Docker Cloud, pull a Docker container for a Github project you like, and in 5 minutes you can build a website, an ILS, an academic social networking site, an academic journal, or a really impressive PDF to HTML converter (seriously, check out the Bible de Genève example). Play around on the free tiers of Heroku, Pantheon, Koding, Sandstorm, or

Want to learn to code? Codecademy is great free place to start. There are others that offer initial free classes for Data Science or MongoDB. Coursera is packed with courses and specializations related to data, computer science, UI design, and more. Learn in tandem with your patrons.

It’s not so much that I think all librarians need to learn to code, rather I think that there’s always been a substantial amount of overlap between us and computer scientists and we haven’t always remembered that. More and more, software development requires roles like taxonomists (Amazon employs quite a few librarians that way) and metadata specialists, people who not only know how to find the data, but can understand the uses, limitations, and structure of that information.

While libraries may once again be transforming as physical institutions – taxpayer-supported public libraries are historically a rather new phenomenon – librarians and their unique set of skills are more relevant than ever for emerging new roles and opportunities in the rapidly expanding world of data around us.

Dave Lyons, ILN Ambassador to China

Posted in Discussion topics, Round 2016A and tagged , , , , , , , .

Leave a Reply

Your email address will not be published. Required fields are marked *