Wednesday, 25 November 2009

ebooks, catalogues, and discovery tools

I started a discussion on twitter a few days back regarding e-books. The discussion was prompted by a question I answered on the Library Web2.0 mailing list (lis-web2@jiscmail.ac.uk) from Helen Leech (Virtual Content Manager - Surrey Library Service). Helen's question was regarding e-books from the Gutenburg Project and how best to import them into her catalogue. Here is an overview of the Gutenburg Project:
The Gutenburg Project is a web site where you can download over 30,000 free e-books as well as over 100,000 free e-books through partner and affiliation schemes.
Helen's question started me thinking about where is the best place to 'catalogue' e-books within the library. In terms of choice it really seems to boil down to two options:
  1. Catalogue e-books within the LMS
  2. Index e-books within a discovery tool
I'm using the term index for the discovery tool as opposed to catalogue, because it is more appropriate. However, the two operations can be thought of as similar in their nature...i.e: they both involve the recording of resource information within the particular system. Here is a short overview of how discovery tools work:
Discovery tools can be thought of as next generation OPACs, but can search far more than just holdings in the LMS. There are in essence web sites that are dedicated to indexing, searching, and displaying information about resources. This information is compiled and fed into the discovery tool prior to any searching taking place in a method called 'harvesting'. The information can be derived from many places including the holdings in the LMS, e-journals in SFX, institutional repositories, and a host of other resources both internal or on the web. This method of pre-loading the discovery tool with indexes of resource information makes the search process very fast and very flexible.
Primo is ExLibris' implementation of a discovery tool, but there are many alternatives out there (see my previous blog post here)

You can also see a few of these tools in action here:
Aquabrowser at Edinburgh University
Primo at University of East Anglia
VuFind at the LSE

So, we could catalogue the e-books in the LMS, or we could catalogue the e-books in the LMS and then export this information into the discovery tool, or we could just put the information straight into the discovery tool and bypass the LMS. So, where is the best place to catalogue e-books?...in the LMS or in the discovery tool? I posted a question along these lines on twitter and a discussion ensued with those in favour of the LMS and those in favour of the discovery tool. Two blog posts were identified during the conversation and are very relevant
Is an e-book a book? by Lukas Koster (Head of Library Systems Dept. Library Univ. of Amsterdam)
Library catalogues, search systems and data by Chris Keene (University of Sussex Library - developing information related web technologies)
Also Frank Vandepitte (Ghent University Library) emailed me with his views on the matter and a breakdown of where they catalogue their resources at Ghent Uni:
Ebook situation here in Ghent :

licensed stuff, stored in sfx (15.000)
licensed stuff, stored in aleph (136.000 ecco books)
licensed stuff, stored as static file (100.000 eebo books)
free stuff, stored as static file (30.000 gutenberg)
free stuff, stored as static file (2.500 dbnl books)
scanned stuff, stored in separate aleph database (40.000 ugent books sent to google)
scanned stuff, stored as static file (ca. 500.000 books from hathi)

My personal experience
  • in the case of the ecco & eebo books where we've bought the metadata, there's no advantage really in uploading this in aleph, since there’s no need to catalog, the metadata stay virtually unchanged (and so they should, you don't want your catalogers fiddling with these data)
  • in the case of licensed books metadata which we get on a regular basis from ex libris (sfx), there's a strong case not to integrate those in your ILS, in doing so you avoid the hassle of uploading, deduping, matching with print "manifestations" of the same title, etc. Keeping your ILS in sync with SFX is not easy, as I've discovered when we're still exporting e-journal data from SFX to Aleph and trying to match them with the printed ones.
  • the only case where I've found the ILS to be useful was to deal with digitized versions of print books. If you scan a book from your collection, it's logical to store the url in your ILS. As partners of Google Books we also have to catalogue some 50.000 extra books per year on top of the normal work volume. If we wouldn't be using the ILS cataloguing module we'd be in big trouble I fear...
ergo, in most cases just use your discovery tool to index the metadata directly bypassing your traditional catalogue and save yourselve a lot of trouble
I did upload a first batch of ecco data in aleph but wouldn’t repeat it
btw, the endusers just don’t care where it comes from as long as they find it and access it
Frank's breakdown is really interesting because it shows what works best in a real world example. It seems like at Ghent they favour the discovery tool as the primary way to catalogue e-books. However, it is important to stress that the Ghent example is what works best for Ghent and may not be the best example for every institution. Thus, I think it would be wrong to try and answer the question of "where is the best place to catalogue e-books?". Instead we should be asking:
"where is the best place to catalogue e-books at Canterbury Christ Church?"
This is obviously a huge question that need to be investigated and discussed, but Lukas did present me with a possible way to look at this question from a different angle. Rather than try to understand where is the best place to catalogue, perhaps it would be better to think about what is our primary interface for our users (or what will it be). At present we have two primary interfaces:
  1. The web OPAC for printed material
  2. MetaLib (coupled with SFX) for e-resources
If we decide that we are never going to move away from this model then it leaves us little option, but to put e-books in the catalogue. However, we have rarely explored putting electronic resources on the OPAC and this would take a lot of investigation to find the best approach. For e-books that have a printed counterpart we could use the 856 field, but I would imagine that there will be vast numbers of e-books where we do not have a printed version. So we would have to think about importing MARC records into the catalogue for e-books and managing e-book packages in a similar way to how we deal with e-journal packages in SFX.

The alternative is to develop a different approach to our primary interfaces, which would include a discovery tool:
  1. Discovery tool to search for everything that we can obtain an index for
  2. MetaLib for e-resources where no index is available (most databases at present)
The advantage of this system is that we would be a huge step closer to the 'one interface for search' that has been a goal of libraries for so long. All of our printed material, all of our e-journals, and any e-books we acquired would be searchable in one interface (databases would still need to be searched by MetaLib). Also it appears to be easier to index resources in a discovery tool than it is to catalogue them within a LMS. However, adding a new system such as a discovery tool would need a lot of work and would have a price tag attached, even if we went down the 'open source' route.

I think what this shows is that even a simple question such as: "where is the best place to catalogue e-books?", demands a lot of thought, discussion, and investigation in a whole host of areas. Especially considering that this discussion only focused on cataloguing and searching, without even considering licensing or purchasing models. It also highlights how important an 'e-library' strategy is in terms of ensuring that decisions made now are thought about in terms of current and emerging technologies and that disparate projects are steered toward a common goal.

2 comments:

  1. Of course current cataloguing systems were not designed for anything digital. Some vendors are currently working on new integrated systems for both print, digital and anything eles: Ex Libris URM, OCLC WorldCat Webscale, etc.

    ReplyDelete
  2. current ILS were indeed not designed for anything digital, however this hasn't stopped us from cataloguing our databases and web resources in Aleph and exporting them to the discovery tool ... you don't need MetaLib for this purpose. Biggest problem in reaching one interface to search them all are the articles, or rather the reluctance of the metadata proprietors to share these data

    ReplyDelete