MARC and machine readability (revisited)
I had an epiphany yesterday. I had given a class on FRBR then listened to a GREAT session “Cataloging: Where are we now? Where are we going?” from the Texas State Library (possibly accessible here http://www.tsl.state.tx.us/ld/workshops/teleconferences/webcastsarchived.html or possibly you’ll need to order it http://www.tsl.state.tx.us/ld/workshops/teleconferences/videotapes.html) with Karen Coyle (excellent blog http://kcoyle.blogspot.com/) and Renee Register (OCLC Senior Product Manager) as presenters. A small bit discussed MARC…well, Karen discussed MARC. And I got it. I did. I totally understand.
MARC is a machine readable format BUT it is heavily text driven. That is, there are tags and indicators and subfield codes nicely dividing MARC out but it is not enough! I think Karen used the example of the 300 field, subfield a. For a monograph format, the 300 subfield a is for pagination information. It may present thus:
300 _ _ $a xi, 567 p.
For us humans (especially us librarian-humans), we know this means the frontpiece is xi pages and the bulk of the book is 567 pages. To the computer these are pretty, pretty symbols to display. It would be like me looking at hieroglyphs. Pretty, pretty pictures.
In order for the computer to understand, we need to parse this out even more. We need a field that is only numeric that allows input of ‘xi’ and another field for ‘567’. Perhaps we label these as “front piece pagination (Roman numerals)” and “first pagination (Arabic numberals)” or whatever. Then, we instruct the computer to index these fields as numerical and searchable by number range, etc. This means when the fifth grader comes in looking for a book “about a historical figure” and containing “at least 50 pages” we could actually search on that field successfully! OMG. This would be awesome. Seriously. I am not being sarcastic, this would be great.
I can see great benefits to this. Parse it all out, make it all truly access points, completely searchable and able to limit to whatever field or just search particular fields, etc.
Can you imagine such a system? I can. It’s about time for it. Perhaps even a merger of the format standard and the rules for input <gasp>. Many catalogers (and most copy catalogers) are using what they see in MARC Bibliographic Records as what to do anyway (trust me, I do lots of teaching on basics and most are simply looking at what the records have and copying to the best of their ability but without understanding the why of it).
From here it is an easy leap to “freeing the data” and “linking the data” because it is in easy, parsed chunks in a standardized input.
I wonder which of the systems out there will think of this first and put it into action? VTLS seems pretty close…they’re doing wonderful, wacky things.