Abstracts

KEYNOTE ADDRESS: Classifications, links and contexts
Michael K. Buckland, Professor Emeritus at School of Information, University of California, Berkeley (USA)

Links commonly refer to models developed for the World Wide Web Consortium, but these are a special case within the wider field of links and references used in resource discovery, including subject indexes to classifications, relationships used in vocabulary control, and search term recommender services. There is a tension between standardised relationships (symbolized by Paul Otlet’s modernist universalism and the Semantic Web) and the particular, subjective situations in which individuals try to make sense (symbolized by Ludwik Fleck’s emphasis on the influence of local cultural contexts). A subject index to a classification is a collection of links, sometimes qualified by context. Different domains (specialties) have their own cultural contexts and benefit from differently tailored links even when searching within the same resources. Making links is a descriptive, language activity. Probabilistic methods can create links from familiar to unfamiliar vocabularies economically. Links commonly use a limited set of relationships, mainly equivalence, inclusion, and inheritance. A far wider range of relationships would help resource discovery. Extending resource discovery requires not only same-facet links to reach additional resources but also links across different facets to provide explanatory context.

Complementarity of perspectives for resource descriptions
Barbara B. Tillett, Washington, DC (USA)

Bibliographic data is used to describe resources held in the collections of libraries, archives and museums. That data is mostly available on the Web today and mostly as linked data. Also on the Web are the controlled vocabulary systems of name authority files, like the Virtual International Authority File (VIAF), classification systems, and subject terms. These systems offer their own linked data to potentially help users find the information they want - whether at their local library or anywhere in the world that is willing to make their resources available. We have found it beneficial to merge authority data for names on a global level, as the entities are relatively clear. That is not true for subject concepts and terminology that have categorisation systems developed according to varying principles and schemes and are in multiple languages. Rather than requiring everyone in the world to use the same categorisation/classification system in the same language, we know that the Web offers us the opportunity to add descriptors assigned around the world using multiple systems from multiple perspectives to identify our resources. Those descriptors add value to refine searches, help users worldwide and share globally what each library does locally.

Libraries, classifications and the network: bridging past and future
Maria Inês Cordeiro, National Library of Portugal

The history of controlled vocabularies in libraries has long carried the promise of improved retrieval systems based on common goals and shared efforts. This is especially true of subject vocabularies whose data have been managed at two levels. Firstly, in management systems of their own, from which vocabulary products are derived, as in the case of the most commonly used subject heading languages, thesauri and classification schemes. Secondly, in the so called library authority files, that may apply such vocabularies or be used to develop local ones, to control subject access to a given library or bibliographic collection. This two-levelled model of building and sharing subject vocabularies’ data is about to change. For many decades the concept of ‘intellectual sharing’ has been prevalent in such activities, underpinning re-use by adoption and/or collaboration in the management of shared subject systems. Although the Internet has enabled the exposure of centrally managed subject systems and library authority files it has done little to alleviate the significant efforts needed for a full deployment of network shared vocabularies. In particular, the use of classification systems did not advance much. While in theory their potential for subject interoperability across domains and languages is recognized, classifications remain poorly used in practice. They still lack intelligent means to effectively connect and communicate data throughout the network, in a synchronic and non-redundant way. More recently, the development of web technologies has leveraged the concepts of connection, communication and sharing of data to a new level, through linked data. What changes will this bring to the architecture of library subject authority data? Will classification systems live to see the age of their full and easy deployment?

Linking library data: contributions and role of subject data
Nuno Freire, The European Library (Netherlands)

Linking and sharing data across organizations has been practiced by libraries for many decades, and can be observed in some of the most common data resources from libraries, such as union catalogues, authority files, and controlled vocabularies. In the new global data space, the benefits of library linked data (LLD) have been widely recognized within the library community. LLD practice emerged as a new approach to data sharing within and beyond the library environment, transforming the old models of distribution and reuse of subject library data. There are several parallel on-going activities towards establishing standards and best practices for the creation and publishing of LLD to meet the growing prospects of a more data-oriented global information space. In this context, the opportunities for libraries are twofold: not only do LLD initiatives bring librarians' data management expertise into the limelight, they also extend the value and reach of library data resources which become widely and easily reusable across domains. Libraries are already contributing open linked datasets which are being re-used by many different communities and for a variety of purposes. Some success cases already exist, such as library controlled vocabularies becoming linkable major reference sources for certain types of entities, or fundamental data service infrastructures based on, or derived from, library authority files. An additional field for LLD use is emerging within the new research data e-infrastructures, which provides new opportunities for the application of library classification and subject authority control expertise and resources in research data management.

Application of FRBR and FRSAD to classification systems
Maja Žumer, University of Ljubljana (Slovenia)
Marcia Lei Zeng, Kent State University (USA)

The Functional Requirements for Subject Authority Data (FRSAD) conceptual model defines entities, attributes and relationships as they relate to subject authority data. FRSAD includes two main entities, thema (any entity used as the subject of a work) and nomen (any sign or arrangement of signs that a thema is known by, referred to, or addressed as). In a given controlled vocabulary and within a domain, a nomen is the appellation of only one thema. The authors consider the question: can the FRSAD conceptual model be extended beyond controlled vocabularies (its original focus) to model classification data? Models that are developed based on the structures and functions of controlled vocabularies (such as thesauri and subject heading systems) often need to be adjusted or extended to accommodate classification systems that have been developed with different focused functions, structures and fundamental theories. The Dewey Decimal Classification (DDC) system and Universal Decimal Classification (UDC) are used as a case study to test applicability of the FRSAD model for classification data and the applicability of the Functional Requirements for Bibliographic Records (FRBR) for modelling versions, such as different adaptations and different language editions.

Relational aspects of subject authority control: the contributions of classificatory structure
Rebecca Green, OCLC (USA)

The structure of a classification system contributes in a variety of ways to representing semantic relationships between its topics in the context of subject authority control. We explore this claim using the Dewey Decimal Classification (DDC) system as a case study. The DDC links its classes into a notational hierarchy, supplemented by a network of relationships between topics, expressed in class descriptions and in the Relative Index (RI). Topics/subjects are expressed both by the natural language text of the caption and notes (including Manual notes) in a class description and by the controlled vocabulary of the RI’s alphabetic index, which shows where topics are treated in the classificatory structure. The expression of relationships between topics depends on paradigmatic and syntagmatic relationships between natural language terms in captions, notes, and RI terms; on the meaning of specific note types; and on references recorded between RI terms. The specific means used in the DDC for capturing hierarchical (including disciplinary), equivalence and associative relationships are surveyed.

Distributed person data: using Semantic Web compliant data in subject name headings
Violeta Ilik, Northwestern University, Chicago (USA)

Providing efficient access to information is a crucial library mission. Subject classification is one of the major pillars that guarantees the accessibility of records in libraries. In this paper we discuss the need to associate person IDs and URIs with subjects when a named person happens to be the subject of the document. This is often the case with biographies, schools of thought in philosophy, politics, art, and literary criticism. Using Semantic Web compliant data in subject name headings enhances the ability to collocate topics about a person. Also, in retrieval, books about a person would be easily linked to works by that same person. In the context of the Semantic Web, it is expected that, as the available information grows, one would be more effective in the task of information retrieval. Information about a person or, as in the case of this paper, about a researcher exist in various databases, which can be discipline specific or publishers’ databases, and in such cases they have an assigned identifier. They also exist in institutional directory databases. We argue that these various databases can be leveraged to support improved discoverability and retrieval of research output for individual authors and institutions, as well as works about those authors.

Organization authority database design with classification principles
Dagobert Soergel, University of Buffalo (USA)
Denisa Popescu, World Bank Group, Washington, DC (USA)

We illustrate the principle of unified treatment of all authority data for any kind of entities, subjects/topics, places, events, persons, organizations, etc. through the design and implementation of an enriched authority database for organizations, maintained as an integral part of an authority database that also includes subject authority control / classification data, using the same structures for data and common modules for processing and display of data. Organization-related data are stored in information systems of many companies. We specifically examine the case of the World Bank Group (WBG) according to organization role: suppliers, partners, customers, competitors, authors, publishers, or subjects of documents, loan recipients, suppliers for WBG-funded projects and subunits of the organization itself. A central organization authority where each organization is identified by a URI, represented by several names and linked to other organizations through hierarchical and other relationships serves to link data from these disparate information systems. Designing the conceptual structure of a unified authority database requires integrating SKOS, the W3C Organization Ontology and other schemes into one comprehensive ontology. To populate the authority database with organizations, we import data from external sources (e.g., DBpedia and Library of Congress authorities) and internal sources (e.g., the lists of organizations from multiple WBG information systems).

Machine-learning methods for classification and content authority control in mathematics
Ulf Schöneberg, FIZ Karlsruhe, zbMATH (Germany)
Wolfram Sperber, FIZ Karlsruhe, zbMATH (Germany)

The abstracting and reviewing service zbMATH (zbMATH, 1931- ) is the most comprehensive bibliographic database of mathematical literature. The database uses reviews, keywords and classification for content analysis of mathematical publications. Controlled vocabularies and classification schemes are important for a uniform and standardised analysis of the content and precise information retrieval. Over the last few years, the zbMATH team has started developing machine-based concepts and tools to create controlled vocabularies and to improve the Mathematics Subject Classification (MSC) scheme. Concepts of natural language processing and other machine learning methods, especially neural networks, were adapted to the specific requirements of mathematical information, e.g., named mathematical entities and mathematical formulas. The tools are used for key phrase extraction and classification of mathematical publications. Basing on the extracted key phrases, a prototype for a controlled vocabulary for mathematics was created. The tools and the state of the art are described briefly. These activities will help - in cooperation with authority control for authors, series and institutions - to automate the zbMATH workflow and improve the usefulness and information retrieval capabilities of the database.

Subject authority control supported by classification: the case of National Library of the Czech Republic
Marie Balíková, Czech National Library (Czech Republic)

From the very beginnings of library automation, subject authority control has been considered an important bibliographic tool in the Czech National Library (CNL). Effective subject access cannot exist without standardised access points. Subject authorities are considered an indispensable reference tool in supporting the selection of subject access points and normalizing content indexing. Most importantly, they are heavily relied upon when it comes to customisation of links between bibliographic records and subject access points in order to create a user-friendly subject browsing and searching environment. Because of the fact that the Universal Decimal Classification (UDC) is widely used in Czech Libraries it has become a readily available language independent subject framework which can be complemented by a more user-friendly subject heading system. In this context, the subject authority control offers a means of enhancing subject headings' access points with terminology and the semantic links available in UDC. Furthermore classification is used to enrich relationships between authority records themselves. The author will discuss in more detail the different aspects and advantages of subject authorities in which a classification and a subject heading system complement one another and the way this is implemented in the CNL.

Multilingual subject access and classification-based browsing through authority control: the experience of the ETH-Bibliothek, Zürich
Jiri Pika, UDC Editorial Team, UDC Consortium (Switzerland)
Milena Pika-Biolzi, ETH-Bibliothek (Switzerland)

The paper provides an illustration of the benefits of subject authority control improving multilingual subject access in NEBIS - Netzwerk von Bibliotheken und Informationsstellen in der Schweiz. This example of good practice focuses on some important aspects of classification and indexing. NEBIS subject authorities comprise a classification scheme and multilingual subject descriptor system. A bibliographic system supported by subject authority control empowers libraries as it enables them to expand and adjust vocabulary and link subjects to suit their specific audience. Most importantly it allows the management of different subject vocabularies in numerous languages. In addition, such an enriched subject index creates re-usable and shareable source of subject statements that has value in the wider context of information exchange. The illustrations and supporting arguments are based on indexing practice, subject authority control and use of classification in ETH-Bibliothek, which is the largest library within the NEBIS network.

Development of a classification-oriented authority control: the experience of the National and University Library in Zagreb
Ana Vukadin, National and University Library in Zagreb (Croatia)

The paper presents experiences and challenges encountered during the planning and creation of the Universal Decimal Classification (UDC) authority database in the National and University Library in Zagreb, Croatia. The project started in 2014 with the objective of facilitating classification data management, improving the indexing consistency at the institutional level and the machine readability of data for eventual sharing and re-use in the Web environment. The paper discusses the advantages and disadvantages of UDC, which is an analytico-synthetic classification scheme tending towards a more faceted structure, in regard to various aspects of authority control. This discussion represents the referential framework for the project. It determines the choice of elements to be included in the authority file, e.g. distinguishing between syntagmatic and paradigmatic combinations of subjects. It also determines the future lines of development, e.g. interlinking with the subject headings authority file in order to provide searching by verbal expressions.

TinREAD – an integrative solution for subject authority control
Victoria Francu, "Carol I" Central University Library of Bucharest (Romania)
Liviu-Iulian Dediu, IME Romania Ltd. (Romania)

The paper introduces TinREAD (The Information Navigator for Readers), an integrated library system produced by IME Romania. The main feature of interest is the way TinREAD can handle a classification-based thesaurus in which verbal index terms are mapped to classification notations. It supports subject authority control interlinking the authority files (subject headings and UDC system). Authority files are used for indexing consistency. Although it is said that intellectual indexing is, unlike automated indexing, both subjective and inconsistent, TinREAD is using intellectual indexing as input (the UDC notations assigned to documents) for the automated indexing resulting from the implementation of a thesaurus structure based on UDC. Each UDC notation is represented by a UNIMARC subject heading record as authority data. One classification notation can be used to search simultaneously into more than one corresponding thesaurus. This way natural language terms are used in indexing and, at the same time, the link with the corresponding classification notation is kept. Additionally, the system can also manage multilingual data for the authority files. This, together with other characteristics of TinREAD are largely discussed and illustrated in the paper. Problems encountered and possible solutions to tackle them are shown.

Alignment in medical sciences: towards improvement of UDC
Olívia Pestana, University of Porto (Portugal)

A classification scheme represents a powerful indexing and retrieval tool. Obsolete terminology and misalignment between widely used systems is key impediment to better use of classification. This paper looks into the issues caused by delay in the revision of UDC class of medical sciences and possible solutions. Following a short description of the Universal Decimal Classification (UDC) and of the National Library of Medicine (NLM) Classification, the author analyses the notations and captions included in 61 class of the UDC Summary. All the classes, subclasses and special auxiliary subdivisions are covered in order to find compatible notations between both schemes, out-of-date vocabulary and out-of-date subdivisions of UDC. As a result of this study and in light of the most recent developments in medical sciences, one subdivision is questioned and several vocabulary expressions included in the caption fields are proposed to be changed or updated.

Commerce, see also Rhetoric: cross-discipline relationships as authority data for enhanced retrieval
Claudio Gnoli, University of Pavia (Italy)
Rodrigo De Santis, Paraná Federal Institute of Education Science & Technol., Irati (Brazil)
Laura Pusterla, University of Pavia (Italy)

Subjects in a classification scheme are often related to other subjects belonging to different hierarchies. This problem was identified already by Hugh of Saint Victor (1096?-1141). Still with present-time bibliographic classifications, a user browsing the class of architecture under the hierarchy of arts may miss relevant items classified in building or in civil engineering under the hierarchy of applied sciences. To face these limitations we have developed SciGator, a browsable interface to explore the collections of all scientific libraries at the University of Pavia. Besides showing subclasses of a given class, the interface points users to related classes in the Dewey Decimal Classification, or in other local schemes, and allows for expanded queries that include them. This is made possible by using a special field for related classes in the database structure which models classification authority data. Ontologically, many relationships between classes in different hierarchies are cases of existential dependence. Dependence can occur between disciplines in such disciplinary classifications as Dewey (e.g. architecture existentially depends on building), or between phenomena in such phenomenon-based classifications as the Integrative Levels Classification (e.g. fishing as a human activity existentially depends on fish as a class of organisms). We provide an example of its representation in OWL and discuss some details of it.

Managing classification in libraries – a methodological outline for evaluating automatic subject indexing and classification in Swedish library catalogues
Koraljka Golub, Linnaeus University (Sweden)
Joacim Hansson, Linnaeus University (Sweden)
Dagobert Soergel, University of Buffalo (USA)
Douglas Tudhope, University of South Wales (UK)

Subject terms play a crucial role in resource discovery but require substantial effort to produce. Automatic subject classification and indexing address problems of scale and sustainability and can be used to enrich existing bibliographic records, establish more connections across and between resources and enhance consistency of bibliographic data. The paper aims to put forward a complex methodological framework to evaluate automatic classification tools of Swedish textual documents based on the Dewey Decimal Classification (DDC) recently introduced to Swedish libraries. Three major complementary approaches are suggested: a quality-built gold standard, retrieval effects, domain analysis. The gold standard is built based on input from at least two catalogue librarians, end-users expert in the subject, end users inexperienced in the subject and automated tools. Retrieval effects are studied through a combination of assigned and free tasks, including factual and comprehensive types. The study also takes into consideration the different role and character of subject terms in various knowledge domains, such as scientific disciplines. As a theoretical framework, domain analysis is used and applied in relation to the implementation of DDC in Swedish libraries and chosen domains of knowledge within the DDC itself.

Automatic interpretation of complex UDC numbers: towards support for library systems
Attila Piros, University of Debrecen (Hungary)

Analytico-synthetic and faceted classifications, such as Universal Decimal Classification (UDC) express content of documents with complex, pre-combined classification codes. Without classification authority control that would help manage and access structured notations, the use of UDC codes in searching and browsing is limited. Existing UDC parsing solutions are usually created for a particular database system or a specific task and are not widely applicable. The approach described in this paper provides a solution by which the analysis and interpretation of UDC notations would be stored into an intermediate format (in this case, in XML) by automatic means without any data or information loss. Due to its richness, the output file can be converted into different formats, such as standard mark-up and data exchange formats or simple lists of the recommended entry points of a UDC number. The program can also be used to create authority records containing complex UDC numbers which can be comprehensively analysed in order to be retrieved effectively. The Java program, as well as the corresponding schema definition it employs, is under continuous development. The current version of the interpreter software is now available online for testing purposes at the following web site: http://interpreter-eto.rhcloud.com. The future plan is to implement conversion methods for standard formats and to create standard online interfaces in order to make it possible to use the features of software as a service. This would result in the algorithm being able to be employed both in existing and future library systems to analyse UDC numbers without any significant programming effort.

Knowledge maps for libraries and archives - uses and use cases
Andrea Scharnhorst, eHumanities DANS/KNAW (Netherlands)
Richard P. Smiraglia, University of Wisconsin, Milwaukee (USA)
Christophe Guéret, eHumanities DANS/KNAW (Netherlands)
Alkim Almila Akdag Salah, eHumanities DANS/KNAW (Netherlands)

At the last Digital Library Conference in London two workshops took place - both (in parallel) devoted to the use of visualization in presenting and navigating large collections. One was entitled Search Is Over! and of the other Knowledge Maps and Information Retrieval. This anecdotal evidence stands for the growing and accelerating quest for visually enhanced interfaces to collections. Researchers from information visualization, computer human interaction, information retrieval, bibliometrics, digital humanities, art and network theory in parallel, often also in ignorance of each other, sometimes in interdisciplinary alliances are engaged in this quest. This paper reviews the current state-of-the-art, with special emphasis on the work of the COST Action TD1210 Knowescape. We discuss in more depth two examples of the use of visual analytics to create a fingerprint of an archive or a library, a data archive and a national library. We present examples from the micro-level of monitoring activities of users, over the meso-level to visualize features of bibliographic records, to macroscopes (a term coined by Katy Borner) into libraries and archives. We also discuss how different ways to perform visual analytics inform each other, how they are related to questions of data mining and statistical analysis, and which methods need to be combined or which communities need to collaborate. To illustrate some of these points we analysed Universal Decimal Classification (UDC) codes in bibliographic datasets of the National Library of Portugal. This is a potential still awaiting to be fully exploited in improving interfaces to subject access and management of classification data. It should be noted that UDC notation strings stored in bibliographic databases require specialist knowledge in both UDC and programming for any visualization tools to be applied. This UDC Seminar which is devoted to authority control is an opportunity to draw attention to the possibilities in visualization whose wider application depends on the readily structured, richer and more transparent subject metadata.

A second life for authority records
Shenghui Wang, OCLC (Netherlands)
Rob Koopman, OCLC (Netherlands)

Authority control is a standard practice in the library community that provides consistent, unique, and unambiguous reference to entities such as persons, places, concepts, etc. The ideal way of referring to authority records through unique identifiers is in line with the current linked data principle. When presenting a bibliographic record, the linked authority records are expanded with the authoritative information. This way, any update in the authority records will not affect the indexing of the bibliographic records. The structural information in the authority files can also be leveraged to expand the user’s query to retrieve bibliographic records associated with all the variations, narrower terms or related terms. However, in many digital libraries, especially largescale aggregations such as WorldCat and Europeana, name strings are often used instead of authority record identifiers. This is also partly due to the lack of global authority records that are valid across countries and cultural heritage domains. But even when there are global authority systems, they are not applied at scale. For example, in WorldCat, only 15% of the records have DDC and 3% have UDC codes; less than 40% of the records have one or more topical terms catalogued in the 650 MARC field, many of which are too general (such as "sports" or "literature") to be useful for retrieving bibliographic records. Therefore, when a user query is based on a Dewey code, the results usually have high precision but the recall is much lower than it should be; and, a search on a general topical term returns millions of hits without being even complete. All these practices make it difficult to leverage the key benefits of authority files. This is also true for authority files that have been transformed into linked data and enriched with mapping information. There are practical reasons for using name strings instead of identifiers. One is the indexing and query response. The future infrastructure design should take the performance into account while embracing the benefit of linking instead of copying, without introducing extra complexity to users. Notwithstanding all the restrictions, we argue that largescale aggregations also bring new opportunities for better exploiting the benefits of authority records. It is possible to use machine learning techniques to automatically link bibliographic records to authority records based on the manual input of cataloguers. Text mining and visualization techniques can offer a contextual view of authority records, which in return can be used to retrieve missing or mis-catalogued records. In this talk, we will describe such opportunities in more detail.

POSTERS

Subject information and multilingualism in European bibliographic datasets: experiences with Universal Decimal Classification
Nuno Freire, The European Library (Netherlands)
Valentine Charles, Europeana Foundation (Netherlands)
Antoine Isaac, Europeana Foundation (Netherlands)

The Europeana Foundation is a non-profit governmental organization that collects, enriches, innovates and promotes cultural heritage data in Europe. Europeana also oversees The European Library’s dataset that has been provided by the National and Research Libraries of Europe. This poster focuses mainly on The European Library’s dataset as it is the richest in terms of library’s classification systems and subject headings within Europeana. The European Library’s dataset is very diverse in the languages it contains and also very rich in classification and subject indexing data. One of the key activities undertaken by The European Library is to support the discoverability of library resources through the cross-language linking of classification and subject indexing data. The Universal Decimal Classification scheme, or UDC, is one of the most widely used classification schemes for all fields of knowledge. UDC is used in libraries, bibliographic, documentation and information services in over 130 countries around the world and is published in over 40 languages. A small subset of UDC has already been published as linked data - the UDC Summary, and the complete UDC scheme will be available shortly. The mechanisms for sharing semantic data nowadays underlie most of the key data distribution channels to users of library data. Examples include the research e-infrastructures and cultural heritage networks in Europe, which construct and maintain their data models under the principles of linked data. Through these mechanisms, the data aggregated by The European Library can reach end-users enriched with multilingual classification and subject indexing information, when linked to the UDC scheme. With this objective in mind, The European Library is currently working on exploring linking possibilities between its aggregated bibliographic dataset, UDC and other ontologies available as linked data (particularly with subject heading systems). UDC has the central attention of The European Library due to its great potential for providing multilingual enrichment of subject data within its bibliographic dataset. This is due to the high-level of language independence of UDC codes, its availability in over 40 languages and by being published as open data under a license that enables its reuse. This poster presents the data linking strategy and methods used at the European Library related to UDC, as well as the experimental results currently available and how they will be distributed across the Europeana network, the research e-infrastructures and publishers. A second experimental use of UDC is also presented in the application area of data mining subject information. An experiment was conducted, in which the potential to use the UDC classification within The European Library Dataset was investigated for its potential to help in locating bibliographic resources, from all Europe and in any language, that could fulfil the very specific information needs of humanities researchers, working within a virtual research environment and addressing specific topics related to the First World War. This work explored co-occurrences of UDC classification codes and subject data within First World War collections for later application in general collections, such as the library catalogues, in locating additional bibliographic resources for researchers. The poster also presents future work, at Europeana and The European Library, towards the establishment of a metadata enrichment service based on UDC RDF data to be available through the Europeana Cloud, addressing the multilingual enrichment of cultural heritage metadata from all domains represented in the Europeana Network.

Enhancing subject authority control at the UK Data Archive: a pilot study using UDC
Suzanne Barbalet, UK Data Archive, University of Essex (United Kingdom)

The UK Data Archive is an internationally acknowledged centre of expertise in data curation and holds the largest collection of digital data in the social sciences and humanities in the United Kingdom. To some degree, the application of subject authority control measures within the Archive has been influenced by the Data Documentation Initiative (DDI). The Archive adopts this standard for describing social science data and contributes to its development. The DDI makes provision not only for the maintenance of a controlled vocabulary of subject categories for classifying the data at data collection level, but also for a controlled vocabulary of keywords to index all topics included in the data at question/variable level. The keyword controlled vocabulary is the Archive’s in-house thesaurus HASSET (Humanities and Social Sciences Electronic Thesaurus). What the DDI does not do is provide guidance on the relationships between these separate controlled vocabularies. Here a faceted classification scheme may well provide the answer. A pilot study of an application of the Universal Decimal Classification (UDC) scheme demonstrated how successfully the scheme might ensure quick, controlled and accurate subject access to data collections for its community of data owners, producers, funders and users.

The BAsel Register of Thesauri, Ontologies & Classifications (BARTOC)
Andreas Ledl, University Library of Basel (Switzerland)

The Basel Register of Thesauri, Ontologies & Classifications (BARTOC) is a bibliographic database of knowledge organization systems, developed by the University Library of Basel, Switzerland. It is the largest database of its kind, multilingual both in content and features and it is still growing. It currently includes more than 1,500 vocabularies in 85 languages from different domains and areas of knowledge. The system is based on the bibliographic tradition, collects metadata and summary descriptions of controlled and structured vocabularies and then applies novel methods, tools and infrastructure to provide an advanced and user-friendly view of its content. BARTOC features a faceted, responsive web design search interface in 20 languages. Data are freely available in the public domain and are connected into a linked open data infrastructure with database dump and SPARQL endpoint. Each record is geo-referenced, so that a special "GeoSearch" function can be offered in addition to basic and advanced searches. What distinguishes BARTOC from other services such as Linked Open Vocabularies (LOV), VEST Directory, BioPortal, etc., is its inclusiveness with respect to vocabulary types (thesauri, ontologies, classifications, glossaries, controlled vocabularies, taxonomies), subject areas, publication formats and accessibility.

Visualisation of Warsaw University of Technology Main Library resources based on UDC
Agnieszka Maria Kowalczuk, Warsaw University of Technology (Poland)
Łukasz Skonieczny, Warsaw University of Technology (Poland)
Małgorzata Wornbard, Warsaw University of Technology (Poland)

The aim of visualization of information is to improve the perception of information and by doing so acquire a better understanding of the knowledge space. Our research project in the Main Library of the Warsaw University of Technology (WUT ML) explores how the visualization of knowledge classes by which the library collection is organized may improve resource discovery and reveal information in primary data that otherwise may remain hidden. The assumption is that this would improve the understanding of information and provide better access to knowledge. The first step was to look into ways in which we can present graphically subject data from our catalogue. The expectation is that through the visualization of the rich catalogue data, it will be easier for users to see the wider context of a sought class and its connections with other knowledge classes. Each document in the WUT ML catalogue contains a UDC notation, keywords in Polish and a symbol from the local classification scheme. The project uses three types of catalogue data: UDC notation, keywords connected with the notation and the number of occurrences of the keyword with the UDC notation. The presentation of classes is based on the data from the current UDC Online English schedules (http://www.udc-hub.com/en/login.php). Research results presented in this poster are prepared using a visualization program - Data Driven Documents (D3), a JavaScript library for handling documents based on data. The visualization shows the knowledge classes in the WUT ML in relation to the knowledge classes from the UDC scheme i.e. the classes of library documents are shown alongside classes taken from UDC Online. At the same time users can view the keywords in Polish and a local classification scheme that are mapped to UDC - which provides flexibility in accessing subject content of the collection. We hope that our research illustrates the potential for providing more user-friendly and effective interfaces and improved subject access in library catalogues.

Experience with UDC updates: the Slovenian perspective
Darija Rozman , The National and University Library - Ljubljana (Slovenia)

Since the beginning of the nineties a significant part of the Universal Decimal Classification (UDC) scheme has been revised and updated. Every major change in the standard classification scheme causes difficulties which are even more prominent in systems without authority control. In Slovenia, most bibliographic records in the Cooperative Online Bibliographic System and Services (COBISS) contain UDC numbers. Since 1996, a standarised set of the UDC scheme, called "UDC summary", has been implemented as an authority list of UDC notations for Slovenian libraries. Currently, the UDC summary list contains 1,051 UDC codes. In this way a standard and fixed level of the classification scheme is made available to be used for all resources in library collections and represents a useful model of classification authority control. Occasionally, UDC codes in this widely used authority list have been extended and edited following requests from librarians. This was especially the case when the new translations of the UDC into Slovenian were published, based on releases of the UDC Master Reference File (UDC MRF) in 2001, 2006 and 2011. This poster outlines the effects of UDC MRF changes on the content of the authority list of UDC numbers and other Slovenian update experiences.

Towards the creation of integrated authority files in the domain of science and technology: an Italian use case
Elena Cardillo, Institute of Informatics and Telematics, National Research Council, Rende (Italy)
Iryna Solodovnik, Institute of Informatics and Telematics, National Research Council, Rende (Italy)
Maria Taverniti, Institute of Informatics and Telematics, National Research Council, Rende (Italy)

Over the years, different organizations have developed and shared a number of authority files with normalized personal names (e.g., Virtual International Authority File - VIAF), inviting others to use these sources as a “common language” and contributing to improved interoperability among resources/systems. Nevertheless, numerous data providers continue to create and to take advantage of locally developed authority lists mined from the resources managed in local repositories and not aligned with external trustworthy sources. These authority lists often remain locked in local databases inhibiting sharing, re-use and interoperability of their data. This poster aims to present a use case on the creation of integrated and dynamic local authority files referring to the personal names of important Italian scientists and academics to be used within a federate Digital Library about Science and Technology. Terminology extraction techniques have been applied to a corpus of 400 documents in the National Centre of Electronic Calculation's archive. This resulted in an authority list of 700 personal names that was further aligned with VIAF and other authorities, such as Library of Congress Classification, via a manual mapping process, thus ensuring its interoperability and retrieval of bibliographic data and topics for each name. Future work will include the creation of CNUCE subject headings based on the list of keywords extracted from the CNUCE corpus and mapping to the Nuovo Soggettario and to LCSH.

Seminars
Overview
Classification & Visualization
Classification & Ontology
Classification at a Crossroads
Highlights
Ergon special offer for delegates
Proceedings
Keynote Address
Sponsorship
UDC
UDC Consortium
Multilingual UDC Summary
UDC Online Hub
Blog | Facebook | Twitter
seminar2015@udcc.org
© 2024 UDC Consortium