Home > marc > MARC and machine readability (revisited)

MARC and machine readability (revisited)

I had an epiphany yesterday. I had given a class on FRBR then listened to a GREAT session “Cataloging: Where are we now? Where are we going?” from the Texas State Library (possibly accessible here http://www.tsl.state.tx.us/ld/workshops/teleconferences/webcastsarchived.html or possibly you’ll need to order it http://www.tsl.state.tx.us/ld/workshops/teleconferences/videotapes.html) with Karen Coyle (excellent blog http://kcoyle.blogspot.com/) and Renee Register (OCLC Senior Product Manager) as presenters. A small bit discussed MARC…well, Karen discussed MARC. And I got it. I did. I totally understand.

MARC is a machine readable format BUT it is heavily text driven. That is, there are tags and indicators and subfield codes nicely dividing MARC out but it is not enough!  I think Karen used the example of the 300 field, subfield a.  For a monograph format, the 300 subfield a is for pagination information. It may present thus:

300 _ _ $a xi, 567 p.

For us humans (especially us librarian-humans), we know this means the frontpiece is xi pages and the bulk of the book is 567 pages.  To the computer these are pretty, pretty symbols to display. It would be like me looking at hieroglyphs. Pretty, pretty pictures.

In order for the computer to understand, we need to parse this out even more. We need a field that is only numeric that allows input of ‘xi’ and another field for ‘567’. Perhaps we label these as “front piece pagination (Roman numerals)” and “first pagination (Arabic numberals)” or whatever.  Then, we instruct the computer to index these fields as numerical and searchable by number range, etc.  This means when the fifth grader comes in looking for a book “about a historical figure” and containing “at least 50 pages” we could actually search on that field successfully! OMG. This would be awesome. Seriously. I am not being sarcastic, this would be great.

I can see great benefits to this. Parse it all out, make it all truly access points, completely searchable and able to limit to whatever field or just search particular fields, etc.

Can you imagine such a system? I can. It’s about time for it. Perhaps even a merger of the format standard and the rules for input <gasp>. Many catalogers (and most copy catalogers) are using what they see in MARC Bibliographic Records as what to do anyway (trust me, I do lots of teaching on basics and most are simply looking at what the records have and copying to the best of their ability but without understanding the why of it).

From here it is an easy leap to “freeing the data” and “linking the data” because it is in easy, parsed chunks in a standardized input.

I wonder which of the systems out there will think of this first and put it into action? VTLS seems pretty close…they’re doing wonderful, wacky things.

UPDATE, 31 March 2010: Karen Coyle appears on Talis too, addressing Semantic Web. Go listen. NOW!

Categories: marc Tags: ,
  1. Deanna
    March 11, 2010 at 11:48 am

    Interesting point about many catalogers just copying data into fields without understanding why it is there or how the format affects it. And that explains a lot about why downloaded records can be so unsatisfactory. I left grad school under the impression that OCLC had standards that controlled whose records were made available for downloading to protect the integrity of the shared records. The reality has been that OCLC records range from excellent to slapdash disaster, probably because too many people are creating records without truly understanding the “why” before attempting the “how.”

    • March 11, 2010 at 6:44 pm

      OCLC has never been a “clean” database. Understandable because of the manner in which they obtain records but it is a bit better than it used to be – not quite as many duplicates exist (well, if you disregard the newly approved and added “parallel” records). With the Expert Community, allowing a very open editing policy (I think we cannot touch the PCC records but most others are fair game) things are getting interesting. Lots of great things but there are so many out there who do not understand the rules or the whys of this that it becomes problems – I’ve seen so many posts to the OCLC-CAT regarding records that were changed from correct to incorrect OR book format records changed to sound recordings, etc. Without proper training you cannot expect people to correctly use something as complex as OCLC.

  2. March 19, 2010 at 12:52 pm

    I think we need to purge the “master record” paradigm from our heads before we move into a newer notion of data. That, and the notion of “records” being the proper (and only) medium of exchange. I’d really urge folks to look at Karen Coyle’s Jan./Feb. issues of “Library Technology Reports,” where she carefully and clearly explains why those two ideas are past their usefulness. It’s impossible, in my opinion, to really address the data quality questions properly in the current record sharing environment, and frankly I think we’re going to have to move on!

  3. Ivy
    May 5, 2010 at 6:39 pm

    Some very similar thoughts (and disucssion of some of the issues attached to change) here: http://www.librarything.com/topic/90309

  4. May 5, 2010 at 7:39 pm

    Thanks Ivy – I’m going to have to re-read Tim’s post. I think I get it but not quite. That is, I get the concept but not why it is difficult. I know I’m missing something that a re-read after a rest will resolve.
    Diane – I’ll look at the report. I do believe in master record but I also believe in parsing the data. Maybe what I’m calling a ‘master record’ is more of a link of information together to make a record. That is, huh, how to explain…I think we still need a record-type object. We need some way to manifest for display the information ‘about’ an item. Someway to link the bits and pieces out there to represent THIS thing rather than THAT thing.
    Oy, don’t know if I am making any sense. Best go to bed and think about it tomorrow.

  5. May 19, 2010 at 9:36 am

    Ivy – thanks again, I just re-read Tim’s post and yes, this is what I’m talking about and it is excellent to have a programmer-type view of this! Thanks so much!

  1. May 13, 2010 at 7:44 am
  2. June 9, 2010 at 2:22 pm
  3. October 19, 2010 at 8:59 am
  4. December 8, 2010 at 8:31 am
  5. January 10, 2011 at 5:37 pm
  6. March 15, 2011 at 7:31 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: