Illuminating chaos: using classification to harness the Web
, Professor Emeritus at University of Illinois(USA)
Paul Otlet (1868-1944) was a key figure in developing the UDC as a faceted classification that represented a new approach to knowledge organisation. As a young man in the early 1890s he had became acutely aware of a looming crisis that was occurring as a result of the strains that were being placed on existing systems for managing the sources in which a rapidly, relentlessly, diversifying, ever expanding universe of knowledge was being recorded. What was needed for the effective organisation, dissemination and retrieval of the information that these literatures both offered for consultation and obscured? Very early on Otlet became convinced that the idea of bibliography should be expanded to encompass not just written texts but whatever it was that contained information regardless of format, technologically-based expression or originating source. Whatever contained information he suggested, should be called a “document.” The study of documents, the new kinds of processes that should be investigated to release, order, integrate and disseminate their contents, and the new technologies, systems and institutional arrangements that were necessary for these purposes he suggested should be called, “Documentation.” For him a key aspect of documentation was visualisation, itself a kind of technological affordance. Visualisation involved not only the use of conventional illustrative materials of various kinds but schematic representations such as drawings, charts, diagrammes and graphs by means of which information could be visually represented, segmented, systematised, simplified and made instantly apprehensible at a glance. This notion is captured by the neologistic signification he gave to the term “atlas.” This paper as an historical introduction to the UDC seminar will outline Otlet’s theories of knowledge organisation and the role of classification and visualisation in them.
The biblical rooting of the trees of life and knowledge ensured the prominence of arboreal visual metaphors for centuries to come. By the twelfth century, a widely legible visual language existed which connected the tree to the order of the day: hierarchies and lineages (Klapisch-Zuber 2007, 294). Families, morals, and religious tenets came to be symbolized by the tree, and soon enough knowledge itself became ordered through its branches. Where once knowledge existed on a simple line, beginning with man and ending at the divine, hierarchies began separating and relating disparate areas of study. This structuring culminated with the encyclopedists, who organized the knowledge in their encyclopedias into vast hierarchically nested trees, a trend which continued and found its way into early classification systems. The advent of faceted classifications broke the strict hierarchy at a time when graph drawings, a form of tree with no discernible hierarchy or specific root, were becoming popular for the first time. As the World Wide Web gains prominence and visualizations of vast networks become the norm, representations of the order of knowledge begin to take similar form.
This paper discusses early experiments of Paul Otlet that visualize multidimensional knowledge organization and interaction. It examines their potential for future information retrieval. “Likeness” has been a recurrent theme in classification theory; here we discuss the concept of “likeliness” and illustrate the role of cognitive and cultural forces of perception in knowledge interaction with examples from artistic expressions in various media that are more or less likely to interact. The implications for information retrieval will be explored in two ways: empirically and theoretically. The empirical research will build upon the analysis of two types of experiments with multi-modal non-semantic information retrieval: 1) experiments with search engines that query for similar structural features of multimedia expressions (likeness); and 2) experiments with collaborative filtering technology measuring the likeliness of similar associations. In a previous outline of an elementary theory of knowledge interaction in a multiverse of knowledge, we challenged the universe of knowledge metaphor. Our next step will be to analyze two visualizations aimed at making classification of sciences compliant to the laws of quantum physics, and explore the possibility of combining these approaches with the UDC for entities in the multiverse of knowledge.
In this paper, we discuss how knowledge structures should be mapped, displayed, and visualized. Three different approaches to knowledge structure visualization are presented and discussed. These approaches include visualizing knowledge structures that exist in a conceptual space, visualizing knowledge structures that need to be extracted and learned from a conceptual space, and visualizing knowledge structures through visual metaphors that can be imposed to a conceptual space. Each of the approaches can be powerful and effective for different purpose and use of knowledge structures. Through several visualization prototypes that we built, we compare and discuss these different approaches and relate them to some common features of knowledge structures, including association, representation, organization, and access. The paper concludes that a good understanding of the impact of visualization on these features is essential in order to utilize the power of visualization to support effective, useful and meaningful visualization of knowledge structures.
How do we use data mining of massive cultural data sets to question our cultural assumptions and biases, and "unlearn" what we know? How can we do research with massive visual collections of user-generated content containing billions of images? What new theoretical concepts do we need to deal with the new scale of born-digital culture? In 2007 I established Software Studies Initiative (softwarestudies.com) to begin working on these questions. I will briefly present the techniques we developed for exploratory analysis of massive visual collections, and show examples of our projects including analysis of 1 million pages from Manga books and 1 million artworks from deviantArt (online community for user-created art). I will also discuss how computational analysis and visualization of big cultural data sets leads us to question traditional discrete categories used for cultural categorization such as "style" and "period." http://lab.softwarestudies.com/2008/09/cultural-analytics.html http://lab.softwarestudies.com/2010/11/one-million-manga-pages.html
A large part of our history as well as our daily lives are captured in visual data. Understanding visual collections requires careful categorization to reveal expected as well as implicit hidden relations. Manual categorization is a demanding and cumbersome process. On the other hand automatic methods still have limitations in performance. An optimal approach brings together the power of automatic bulk categorization with detailed and careful expert annotation. We will show how advanced visualizations can aid the categorization and subsequent exploration processes.
Changes in the technical and social dimensions of knowledge infrastructures are bringing diverse ontologies, classification schemes, and orderings of knowledge into contact and conflict with one another. A particularly energetic scene for this struggle for coherence is taking place in the library world, where emerging technical considerations—in particular the growing desire for open-data formats and the development of APIs (application programming interfaces) that make metadata in library information systems programmatically accessible—render local variations in classification schemes problematic for librarians and their patrons. A particularly fruitful site for observing these dynamics is the Digital Public Library of America (DPLA), a project seeking to make national digital scientific and cultural resources comprehensively accessible. As the DPLA brings digital collections from various institutional settings together, classificatory principles that organize those materials in their home collections come into contact—and even conflict—with one another. This talk will present research using data visualization and interviews with informants to discern the nature and structure of conflicts in ontological schema emerging in the context of the DPLA project, and to discover what such dynamics have to tell us about changing practices of knowledge ordering in institutional and networked settings.
In a market dominated by the the long tail model (Anderson), with an increasing variety of products and information, we have constantly to choose among a large number of options, not only in the web but also in the physical world. If, on one hand, this availability is a richness we wouldn’t renounce, it is also true that the excess of choice often generates stress, and stress, in turn, non-choice or non-purchase. It is the so-called paradox of choice (Schwartz). Some principles, however, show that the time and stress of choice does not depend so much on the number of options available, but overall on the way the choices are organized and presented. The paradox of choice is therefore a matter of quality rather than quantity. Through concrete examples, the talk will show some key principles to improve the choice in menus, catalogs and interfaces in general, by acting on the architecture of the choices themselves.
The term “Memory Islands” was inspired by the ancient “Art of Memory” which described how people in the antiquity and the Middle Ages used spatialization to increase their memory capacity. The method of “loci” (plural of Latin locus for place or location) consists of creating a virtual map and associating each entity to designated areas on the map. In this paper, we propose a new method in the field of automated cartography based on the notion of Memory Islands for hierarchical knowledge. We first describe our novel method for cartographic visualization of knowledge (e.g. ontology and its skeleton which is taxonomy), we then show how the technique of “Memory Island” helps to navigate through information contents to memorize their locations and to retrieve them. We also discuss the design principles of this approach. Finally, we present an experimental prototype that is intended to evaluate the psychological relevancy of Memory Islands. We present also some preliminary empirical results showing that the use of Memory Island provides advantages for non-experienced users tackling realistic browsing and visualization tasks.
In this paper we present our web application SeRE to explore semantically related concepts. Wikipedia and DBpedia are rich data sources to extract related entities for a given topic, like in- and out-links, broader and narrower terms, categorization information etc. We use the Wikipedia full text body to compute the semantic relatedness for extracted terms, which results in a list of entities that are most relevant for a topic. For any given query, the user interface of SeRE visualizes these related concepts, ordered by semantic relatedness, with snippets from Wikipedia articles that explain the connection between those two entities. In a user study we examine how SeRE can be used to find important entities and their relationships for a given topic and how the classification system can be used for filtering.
This reflection summarizes observations of faceted interface developments in various environments such as: websites, library catalogue interfaces, and two complex knowledge domains - film and folktales over a ten year period. Two persistent constraints that impede forward movement are identified: metadata quality and interface design. The wicked problem of interface design is the focus of this work. Lessons drawn from the author’s corpus of observations suggest fruitful and creative areas for development of more robust approaches to faceted interface development.
Information & Communication Technology (ICT), including Web Technology, are poorly exploited in French academic libraries, in general. They are limited to the library information systems (which are mainly for expert users, such as librarians, and anecdotally for end-users with an OPAC dedicated to document retrieval) and a very conventional federated Web portal for information retrieval from online databases. Nevertheless, the complexity of a university library has continued to grow and shows that inadequately trained users have great difficulty accessing collections. However, since the advent of Web 2.0, many new original and ergonomically enhanced interfaces have appeared and proved their usefulness for navigating complex websites, reducing information and cognitive overload. The Visual Catalog, a new OPAC generation, was developed to help increase the use of library collections and improve users' ability to retrieve information. The results of studies of two academic libraries implementing the solution have confirmed improvements in user experience. This Web catalog which links up classification data, authority file and other essential UNIMARC data is currently used by six French universities.
Classification offers a unique power in allowing for systematic sorting of information items thus playing an important role in visualization of documents's content and their relationships in the process of information retrieval. The majority of documents subjects are about combinations of more than one concept. Therefore, classification notation representing the content has to be synthesized. As a classifier combines two or more classes from the schedules, the citation order of the notation elements affect the position of the document in sorted display. Among the concepts discussed in any document, a base theme and several particular themes can be identified. A general rule is that the notation representing the main theme should be cited first, thus producing a "helpful sequence" of compound classmarks. We propose a general method of information retrieval based on a double query combined with an appropriate systematic result display: classmarks starting with the searched concept should be displayed before those having it as an inner part. We will discuss this principle on the example of a simple information retrieval interface currently being developed at University of Pavia.
Although developed primarily for controlled vocabularies, the Functional Requirements for Subject Authority Data (FRSAD) conceptual model has been extended to classification schemes, with a class corresponding to FRSAD’s thema, and a class notation and the hierarchically-contextualized caption of a class both corresponding to FRSAD’s nomen. This paper explores extending the FRSAD model to accommodate a topic-centered view of the Dewey Decimal Classification (DDC), in which topics are recognized as themas and Relative Index (RI) terms as nomens; a complex series of relationships involving topics and/or RI terms is also recognized. These subject authority data (which require local extensions to MARC classification and authority formats) support different user groups, including—in the DDC context—editors, translators, classifiers, information professional intermediaries, and end users. Use scenarios based on a topic-centered view of the DDC require system assistance for, e.g., an editor’s revision of a topic’s treatment throughout the DDC and an end user’s discovery of resources topically related to a known resource, but not necessarily assigned the same class number. Visualization strategies supporting these use scenarios are proposed.
The Chinese Classified Thesaurus (CCT) is a widely used knowledge organization tool in libraries of mainland of China, which is an integrated structure of Chinese Library Classification (CLC) and Chinese Thesaurus (CT). In practice, CLC and CCT share a common database and are managed by the same management system for synchronous updates. For a long period, CCT has key functions in library cataloging and indexing. However, its complicated knowledge structure and relation maps within are implicit to end-users and have urgent need for adapting to the Semantic Web (SW) environment. This paper discusses semantic representations of CCT's subject authority data based on Simple Knowledge Organization Systems (SKOS) modelling. Then, it explores a web-native dynamic semantic visualization interface implemented on terminology service platform. This could help the user to learn explicit and implicit Chinese knowledge structure with interesting search experience.
Users are overwhelmed by the linear presentation of indexing terms in catalogue records of COBISS, the Slovenian union library catalogue. About a third of all queries are subject queries, but understanding of subject description causes many problems for end-users and also for many librarians. The solution could be a properly designed web application based on ontology using visualization techniques to support indexing and retrieval. Our goal is to support the user in such queries by helping them navigate through large sets of retrieved records with tag clouds and similar tools for visualizing information. The first step is to transform the subject headings list. The hierarchy and relationships can be based on UDC numbers. A search for a subject heading would retrieve a set of records, which would be represented in the final view either with the image of the book cover or with the bibliographic reference and abstract of a journal paper. The tools can also help with indexing. We believe that new features in browsing and indexing in a new generation OPAC could enhance both users’ experience: cataloguers’ and end-users’.
In a passage of the Iliad Homer write a list of scenes represented on the shield of Achille. The list is so long that many artists have attempted to create the shield, triggering a creative process to find a solution. In the same way librarians have been struggling for years to shape classifications. A shape that has transmigrated from a paper book to a tablet computer following technological progress. The paper exposes the results of an innovative research to display and interact with classifications – targeting specifically the UDC – using tablet computers. Since these mobile devices force the designer to keep things simple, the classification will be displayed in the shape of a list, one of the simplest way to represent information. Later the list will be enriched for a complete navigation in the semantic and syntactic structure of the classification to create an advanced application able to manage the screen rotation and classification's manipulation.
The practical value of classification summaries in information management and integration
, University of Wisconsin, Milwaukee (USA)
, Royal Netherlands Academy of Arts and Sciences (Netherlands)
, Royal Netherlands Academy of Arts and Sciences (Netherlands)
In this paper we present different methods of analyses of sets of UDC numbers retrieved from library collections. We argue that quantitative methods and related visualizations can be used to compare different instances of the use of the UDC among each other and with the designed system. Mapping out basic statistics on UDC classes as used in libraries could serve as an overview and basic information for users among the nature and focus of s specific collection. A closer view into combined UDC numbers reveals the complex nature of the UDC itself, which deserves future exploration from a complexity theoretical perspective.