A Review of BIBFRAME and the Reasons for Its Creation
A Review of BIBFRAME and the Reasons for Its Creation
Cherry N. Lockett
University of Southern Mississippi
A Review of BIBFRAME and the Reasons for Its Creation
To provide accessible information to library patrons is the goal of the library. Because of this goal, libraries have always been innovative in finding ways to organize, and most often, including the use of current technologies. It is these technological advances that have led to the innovation of book catalogs, paper slips, and eventually cards. Eventually, technology allowed libraries to organize its holdings in the form of virtual catalogs or computer management database systems; however, analysis of information technology over the 20th century and into the 21st century reveals that library organizational technology has fallen behind the ever-evolving world of technology. In addition, a closer look at the most popular cataloging system, Machine Readable Cataloging (MARC), reveals that it has several operational problems. The library is in the complicated process of replacing the old MARC cataloging system with one that will transition from closed-database systems to one that will use Web data and the World Wide Web Consortium’s Resource Description Framework (RDF) (Coyle, 2017).
The Reasons For Bibliographic Data Change
To understand why there is a rush to change, one must first understand the issues of MARC. MARC was created for computerized printing presses that would print the catalog cards in the same format as the cards that had been printed before using the typesetting process. It was not until the 1980s, after libraries struggled financially and struggled with the burden of keeping up with the filing of the card catalogs created by MARC, that libraries created the first real computerized cataloging version of MARC. Moreover, it was how it was created that caused most of its significant problems. At the time, the relational database was the most prevalent database management system (Coyle, 2017). In a relational database, data is divided into tables, and librarians rely on the relationships between the tables to bring data together in a useful way (Harkins,2003). MARC does not use entities and relations; therefore, the conventional database design was used to store the data into MARC. The data was stored as a single whole unit which allowed the access of the bibliographic record through keyword indexes. The use of keyword searching for bibliographic data was a significant improvement, and library patrons were pleased. However, library catalogers were concerned. The advantage of using keyword searching came not without consequences: the loss of context and precision in a search (Coyle, 2017). Also, MARC had other issues such as:
• Complex Data elements. As mentioned by Tennant (2013), MARC 245 $a is not a title but an “AACR2-defined access point that may contain a concise bibliographic description” which is used to produce another one-to-many mapping.
• Redundancy. Data in MARC is repeated across fields (Tennant, 2013).
• Extra formatting requirements. An example of this are rules for ISBD titles. ISBD capitalizing rules for titles requires the first word to be capitalized; however, it must be removed when a MARC record is translated to a non-MARC standard because the format is not commonly used outside the library (Tennant, 2013).
• Useless Punctuation. Some punctuations, like the commas used to indicated and separate the inverted parts of a name, are necessary and some are not. As pointed out by Tennant (2003), the comma in the 500 field is used only as another character in a stream of text.
• Ambiguity. Tennant (2013) explained that “The MARC 300 field has an ‘extent’ sense when it appears in a record that describes a sound recording. However, it has a ‘page count’ sense in a record that describes a printed book. The Crosswalk has to make an unreliable check for data in a free-text field to disambiguate the two senses. When distinctions exist in 5xx fields, they may not be recoverable at all.”
• Hidden assumptions. In MARC a bibliographic record that describes a musical score will include the material type and the contributor’s role in a subfield; however, a bibliographic record of a printed book does not include the role of the author or the physical format of the work (Tennant, 2013).
• MARC is too extensive. MARC standards include a vast amount of data, but most of the fields and subfield are rarely used. The commonly used fields are the 5xx notes field which is in need of its data being more explicitly coded (Tennant, 2013).
The 1990s to the 2000s, brought about changes in data sharing that resulted from an emphasis of using World Wide Web, HTML formats, and later eXtensible Markup Language (XML). The library did begin to allow access to data over the Internet via web browsers, and it also developed a new version of MARC records that incorporated XML. However, the problem was that the new version was nothing more than the serialization of the previous MARC records. The library failed to take advantage of the opportunity to include improvements on the data model or format of MARC when replacing MARC with the XML version. Furthermore, by the time the library made this improvement, the next generation of data models, Resource Description Framework (RDF), was already being developed. RDF model bridges gaps in information by providing an open lined information environment across the Internet (Coyle, 2017). All of these points are indisputable evidence perhaps that MARC is no longer the best fit for bibliographic data (Tennant, 2013).
BIBFRAME The Replacement To MARC
Currently, the Library of Congress is building a new foundation for the future of bibliographic data. The Library of Congress is currently in the processes of creating BIBFRAME, Bibliographic Framework, a new format which will replace MARC 21. In this initiative, the Library of Congress has several focuses for BIBFRAME. One of which is to preserve the vast amount of data stored in the MARC records. BIBFRAME will be the foundation for sharing bibliographic descriptions on the web and in the networked world; therefore, it will integrate with and engage in the wider information community while still serving the needs of the library. To support this goal, the library of Congress must focus on making BIBFRAME a cost-effective data exchange that supports resource sharing. Also, the Library of Congress will try to avoid the pitfalls of MARC by differentiating clearly between conceptual content and its physical/digital manifestation(s), by unambiguously identifying information entities, and by using leverage and expose relationships between and among entities. Although BIBFRAME is a new way to present bibliographic data currently provided by MARC format, the Library of Congress is taking extra precautions to makes sure that BIBFRAME is free of any issues that plagued MARC by investigating all aspects of bibliographic description, data creation, and data exchange. Also, the Library of Congress is taking preventive measures by making changes to accommodate different content models and cataloging rules, exploring new methods of data entry, and by evaluating current exchange protocols (Library of Congress, 2018).
BIBFRAME current implementation issues.
Although the Library of Congress has many important goals for BIBFRAME, these goals are causing many issues. According to Smith-Yoshimura (2014), most of the time spent on BIBFRAME initiative is on data evaluation and identifying problems or errors in converting MARC records to BIBFRAME using either the BIBFRAME Comparison Service or Transformation Service. In addition, some BIBFRAME data are starting to be made from scratch using the BIBFRAME Editor. This led to concerns about time and the staffing needed to create and test BIBFRAME. To combat this issue, some staff members were either enrolled in Library Juice Academy Series- which focuses on using XML and RDF- or enrolled in Zepherira’s Linked Data and BIBFRAM Practical Practitioner Training course. The creation of BIBFRAME task force, whose primary focus on how to handle the conversion of music materials from MARC to BIBFRAME, also helps with the time issue.
One of the most difficult goals of BIBFRAME is to preserve the data stored in the MARC format. BIBFRAME may not handle all MARC fields and subfields. Nevertheless, this is not much of a concern because studies have already shown that currently many of the MARC fields and subfields are rarely used (Smith-Yoshimura, 2014 ; Tennant, 2014). The chief concern is how to implement RDA, Resource Description and Access, with BIBFRAME. RDA is thought to contain too many strings for linked data, and as a result, BIBFRAME metadata managers are considering using various other identifiers like id.loc.gov, FAST, ISNI, ORCID, VIAF, and OCLC WorkIDs (Smith-Yoshimura, 2014).
Among the many issues that are plaguing BIBFRAME is vocabulary control. There have been questions and some confusion as to how authority data, controlled uniform vocabularies that allow for shared access of data, fits into BIBFRAME (Tillet, 2003). A BIBFRAME Authority will be a resource representing a person, family, organization, jurisdiction, meeting, place, topic, or temporal expression related to a BIBFRAME work, instance, or annotation (Library of Congress, 2014). The Library of Congress explained that there are some benefits of vocabulary reuse but cautions that it is not easy to design a system that includes multiple vocabularies that will reach many different stakeholders. The Library of Congress goes on to explain that names and vocabularies drift over time and that the Library of Congress must think ahead to infrastructure to support the next 40 years of libraries (Library of Congress, 2018).
Another source of confusion among programmers and testers is how to differentiate FRBR works from BIBFRAME works. FRBR work is an abstract entity defined as a distinct intellectual or artistic creation; an expression is the intellectual or artistic realization of work realized through alpha-numeric notation, musical notation, choreographic notation, sound, image, object, movement, or any combination; a manifestation is the physical embodiment of work; and an item is one example of a manifestation (Tillet, 2003). BIBFRAME organizes information into three levels: work, instance, and item. BIBFRAME work is the conceptual essence of something, and examples of work include the name of authors, languages, subjects, agents or events. BIBFRAME defines an instance as a reflection of the material embodiment of a work, and examples of instances are publisher, place and date of publication, and format. Item is an actual physical or electronic copy of an instance and will reflect information such as physical or virtual location, shelf mark, and barcode (Library of Congress, 2018 ; Library of Congress, 2016). Although FRBR and BIBFRAME organizational entities are defined similarly, their embodiments are different as shown in the below FRBR illustration provided by Tillet (2003) and the below BIBFRAME illustration provided by the Library of Congress (2016).
There is a great deal of optimism about the future of cataloging. However, that future will not be achieved easily. There are several obstacles the BIBFRAME initiatives must overcome to achieve its goal of evolving bibliographic description standards into linked data in order to make bibliographic information more useful both within and outside the library community. If these obstacles are surpassed by the BIBFRAME initiative, this would mean an end to most of the problems that have plagued the library cataloging staff and the library community as a whole. Furthermore, it would mean the production of the first data model that would actually provide real change in cataloging and not simply a tweaked carryover of old data into new technology.
Coyle, K. (2017). Creating the Catalog, Before and After FRBR. Retrieved November 16, 2018, from http://kcoyle.net/mexico.html
Harkins, S. (2003). Relational databases: Defining relationships between database tables. Retrieved November 16, 2018, from https://www.techrepublic.com/article/relational-databases-defining-relationships-between-database-tables/
Library of Congress. (2014). BIBFRAME Authorities Draft Specification. Retrieved November 16, 2018, from https://www.loc.gov/bibframe/docs/bibframe-authorities.html
Library of Congress. (2016). Overview of the BIBFRAME 2.0 Model. Retrieved November 16, 2018, from http://www.loc.gov/bibframe/docs/bibframe2-model.html
Library of Congress. (2018). BIBFRAME Frequently Asked Questions. Retrieved November 16, 2018, from https://www.loc.gov/bibframe/faqs/
Smith-Yoshimura, K. (2014). BIBFRAME Testing and Implementation. Retrieved November 16, 2018, from http://hangingtogether.org/?p=4487
Tenant, R. (2013). What is the problem of MARC | SILAS – National Library Board. Retrieved November 16, 2018, from http://www.nlb.gov.sg/silas/what-is-the-problem-of-marc/
Tennant, R. (2014). The Variation and the Damage Done. Retrieved November 16, 2018, from http://hangingtogether.org/?p=4494
Tillet, B. (2003). The FRBR Model Retrieved November 16, 2018, from https://www.loc.gov/catdir/cpso/frbreng.pdf