Abstracts

From the index card to the World City: knowledge organization and visualization in the work and ideas of Paul Otlet
W. Boyd Rayward, Professor Emeritus at University of Illinois (USA)

Paul Otlet (1868-1944) was a key figure in developing the UDC as a faceted classification that represented a new approach to knowledge organisation. As a young man in the early 1890s he had became acutely aware of a looming crisis that was occurring as a result of the strains that were being placed on existing systems for managing the sources in which a rapidly, relentlessly, diversifying, ever expanding universe of knowledge was being recorded. What was needed for the effective organisation, dissemination and retrieval of the information that these literatures both offered for consultation and obscured? Very early on Otlet became convinced that the idea of bibliography should be expanded to encompass not just written texts but whatever it was that contained information regardless of format, technologically-based expression or originating source. Whatever contained information he suggested, should be called a "document". The study of documents, the new kinds of processes that should be investigated to release, order, integrate and disseminate their contents, and the new technologies, systems and institutional arrangements that were necessary for these purposes he suggested should be called, "Documentation". For him a key aspect of documentation was visualisation, itself a kind of technological affordance. Visualisation involved not only the use of conventional illustrative materials of various kinds but schematic representations such as drawings, charts, diagrammes and graphs by means of which information could be visually represented, segmented, systematised, simplified and made instantly apprehensible at a glance. This notion is captured by the neologistic signification he gave to the term "atlas". This paper as an historical introduction to the UDC seminar will outline Otlet's theories of knowledge organisation and the role of classification and visualisation in them.

From trees to webs: uprooting knowledge through visualization
Scott B. Weingart, Indiana University (USA)

The biblical rooting of the trees of life and knowledge ensured the prominence of arboreal visual metaphors for centuries to come. By the twelfth century, a widely legible visual language existed which connected the tree to the order of the day: hierarchies and lineages (Klapisch-Zuber 2007, 294). Families, morals, and religious tenets came to be symbolized by the tree, and soon enough knowledge itself became ordered through its branches. Once knowledge existed on a simple line, beginning with man and ending at the divine, hierarchies began separating and relating disparate areas of study. This structuring culminated with the encyclopedists, who organized the knowledge into vast hierarchically nested trees, a trend which continued and found its way into early classification systems. The advent of faceted classifications broke the strict hierarchy at a time when graph drawings, a form of tree with no discernible hierarchy or specific root, were becoming popular for the first time. As the World Wide Web gains prominence and visualizations of vast networks become the norm, representations of the order of knowledge begin to take similar form.

Visualizing knowledge interaction in the multiverse of knowledge
Charles van den Heuvel, Royal Netherlands Academy of Arts and Sciences (Netherlands)
Richard Smiraglia, University of Wisconsin, Milwaukee (USA)

This paper discusses early experiments of Paul Otlet that visualize multidimensional knowledge organization and interaction. It examines their potential for future information retrieval. "Likeness" has been a recurrent theme in classification theory; here we discuss the concept of "likeliness" and illustrate the role of cognitive and cultural forces of perception in knowledge interaction with examples from artistic expressions in various media that are more or less likely to interact. The implications for information retrieval will be explored in two ways: empirically and theoretically. The empirical research will build upon the analysis of two types of experiments with multi-modal non-semantic information retrieval: 1) experiments with search engines that query for similar structural features of multimedia expressions (likeness); and 2) experiments with collaborative filtering technology measuring the likeliness of similar associations. In a previous outline of an elementary theory of knowledge interaction in a multiverse of knowledge, we challenged the universe of knowledge metaphor. Our next step will be to analyze two visualizations aimed at making classification of sciences compliant to the laws of quantum physics, and explore the possibility of combining these approaches with the UDC for entities in the multiverse of knowledge.

Challenges of knowledge structure visualization
Xia Lin, Drexel University (USA)
Jae-wook Ahn, Drexel University (USA)

In this paper, we discuss how knowledge structures should be mapped, displayed, and visualized. Three different approaches to knowledge structure visualization are presented and discussed. These approaches include visualizing knowledge structures that exist in a conceptual space, visualizing knowledge structures that need to be extracted and learned from a conceptual space, and visualizing knowledge structures through visual metaphors that can be imposed to a conceptual space. Each of the approaches can be powerful and effective for different purpose and use of knowledge structures. Through several visualization prototypes that we built, we compare and discuss these different approaches and relate them to some common features of knowledge structures, including association, representation, organization, and access. The paper concludes that a good understanding of the impact of visualization on these features is essential in order to utilize the power of visualization to support effective, useful and meaningful visualization of knowledge structures.

Looking at one million images: how visualization of big cultural data helps us to unlearn our cultural categories
Lev Manovich, City of New York Graduate Center (USA)

How do we use data mining of massive cultural data sets to question our cultural assumptions and biases, and "unlearn" what we know? How can we do research with massive visual collections of user-generated content containing billions of images? What new theoretical concepts do we need to deal with the new scale of born-digital culture? In 2007 I established Software Studies Initiative (softwarestudies.com) to begin working on these questions. I will briefly present the techniques we developed for exploratory analysis of massive visual collections, and show examples of our projects including analysis of 1 million pages from Manga books and 1 million artworks from deviantArt (online community for user-created art). I will also discuss how computational analysis and visualization of big cultural data sets leads us to question traditional discrete categories used for cultural categorization such as "style" and "period". Visit http://lab.softwarestudies.com/2008/09/cultural-analytics.html and http://lab.softwarestudies.com/2010/11/one-million-manga-pages.html

Easy categorization of large image collections by automatic analysis and information visualization
Marcel Worring, University of Amsterdam (Netherlands)

A large part of our history as well as our daily lives is captured in visual data. Understanding visual collections requires careful categorization to reveal expected as well as hidden relations. Manual categorization is a demanding and cumbersome process. On the other hand automatic methods still have limitations in performance. An optimal approach brings together the power of automatic bulk categorization with detailed and careful expert annotation. We will show how advanced visualizations can aid the categorization and subsequent exploration processes.

Data artefacts: tracking knowledge-ordering conflicts through visualization
Matthew Battles, metaLAB, Harvard University (USA)
Yanni Loukissas, metaLAB, Harvard University (USA)

Changes in the technical and social dimensions of knowledge infrastructures are bringing diverse ontologies, classification schemes, and orderings of knowledge into contact and conflict with one another. A particularly energetic scene for this struggle for coherence is taking place in the library world, where emerging technical considerations - in particular the growing desire for open-data formats and the development of APIs (application programming interfaces) that make metadata in library information systems programmatically accessible - render local variations in classification schemes problematic for librarians and their patrons. A particularly fruitful site for observing these dynamics is the Digital Public Library of America (DPLA), a project seeking to make national digital scientific and cultural resources comprehensively accessible. As the DPLA brings digital collections from various institutional settings together, classificatory principles that organize those materials in their home collections come into contact - and even conflict - with one another. This talk will present research using data visualization and interviews with informants to discern the nature and structure of conflicts in ontological schema emerging in the context of the DPLA project, and to discover what such dynamics have to tell us about changing practices of knowledge ordering in institutional and networked settings.

How to design interfaces for choice: the role of classification in information architecture
Luca Rosati, University for Foreigners of Perugia (Italy)

In a market dominated by the long tail model (Anderson), with an increasing variety of products and information, we have constantly to choose among a large number of options, not only in the web but also in the physical world. If, on one hand, this availability is a richness we wouldn't renounce to, it is also true that the excess of choice often generates stress, and in turn, non-choice or non-purchase. It is the so-called paradox of choice (Schwartz). Some principles, however, show that the time and stress of choice does not depend so much on the number of options available, but overall on the way the choices are organized and presented. The paradox of choice is therefore a matter of quality rather than quantity. Through concrete examples, the talk will show some key principles to improve the choice in menus, cataloges and interfaces in general, by acting on the architecture of the choices themselves.

Ghost in the shell: navigation, meaning and place-making in information space
Andrea Resmini, Jönköping International Business School, Sweden

Space and place are two very different concepts: one, the base experience of embodiment, objective, impersonal, undifferentiated; the other, a way of being “there” that includes memories, experiences, emotions, and behaviours associated with a specific context. While space simply “is”, place is an unstable, transient construct. The author points out that spatial reasoning shapes the way we perceive and understand the world: we not only get around with a map and compass, but we “get out” of difficult predicaments. We also navigate the Web, or “go to Google”. What about places then? If our house is certainly a place, what about Facebook? With an average 25 hrs/week spent online in the EU, does our sense of place stretch out from homes and offices to include our mobile phones, tablets and digital alter-egos in a continuum that permeates every moment of our lives? Should it? And if so, how is this different from the Internet we have known so far? Following this line of thought the author looks into filmic and videogame language, literature, comics, pop references and Japanese anime. He uses a number of examples to explain the transition from digital to postdigital. He argues that the old approach of a literal representation of reality will be replaced with a continuum of abstract grammars which will play a key role in place-making and navigation in complex information environments.

Memory Islands: an approach to cartographic visualization
Bin Yang, University Pierre and Marie Curie - Sorbonne (France)
Jean-Gabriel Ganascia, University Pierre and Marie Curie - Sorbonne (France)

The term "Memory Islands" was inspired by the ancient "Art of Memory" which described how people in the antiquity and the Middle Ages used spatialization to increase their memory capacity. The method of "loci" (plural of Latin locus for place or location) consists of creating a virtual map and associating each entity to designated areas on the map. In this paper, we propose a new method in the field of automated cartography based on the notion of Memory Islands for hierarchical knowledge. We first describe our novel method for cartographic visualization of knowledge (e.g. ontology and its skeleton which is taxonomy), we then show how the technique of "Memory Island" helps to navigate through information contents to memorize their locations and to retrieve them. We also discuss the design principles of this approach. Finally, we present an experimental prototype that is intended to evaluate the psychological relevancy of Memory Islands. We present also some preliminary empirical results showing that the use of Memory Island provides advantages for non-experienced users tackling realistic browsing and visualization tasks.

Exploring semantically related concepts from Wikipedia: the case of SeRE
Daniel Hienert, GESIS (Germany)
Dennis Wegener, GESIS (Germany)
Siegfried Schomisch, GESIS (Germany)

In this paper we present our web application SeRE to explore semantically related concepts. Wikipedia and DBpedia are rich data sources to extract related entities for a given topic, like in- and out-links, broader and narrower terms, categorization information etc. We use the Wikipedia full text body to compute the semantic relatedness for extracted terms, which results in a list of entities that are most relevant for a topic. For any given query, the user interface of SeRE visualizes these related concepts, ordered by semantic relatedness, with snippets from Wikipedia articles that explain the connection between those two entities. In a user study we examine how SeRE can be used to find important entities and their relationships for a given topic and how the classification system can be used for filtering.

How can users get the gist of a taxonomy using tag clouds?
Nathalie Pinede, University of Bordeaux (France)
Véronique Lespinet-Najib , IMS-Cognitics, Polytechnic Institute of Bordeaux (France)

In our study, tag clouds are used to highlight the knowledge's representations of taxonomy. Several representations' modes are possible but we think that tag clouds are an efficient way for visualization and want to test which kind of construction and representation is best suited in terms of uses and global perception. This visual representation includes awareness of the prevalent thematic, but also knowledge of those that appear less frequently. First, we present our original methodological proposal of a hyperlinks' taxonomy which is based on the hypertext lexical units present on the home page of an organizational website. Then, to highlight our data, we examine four different tag clouds' layouts generated for our taxonomy: sequential (weight frequency / no weight frequency), circular and thematic clusters. Four groups of subjects are constituted to test these different layouts according to different criteria. For each layout, an analysis of eye-tracking (with toobii system) is used to measure visual attention. Finally, we discuss our results and show strategic applications in specific fields, such as organizational domains, in order to provide an overview of websites and what most characterizes them.

Sempre avanti? Some reflections on faceted interfaces
Kathryn La Barre, University of Illinois at Urbana-Champaign (USA)

This reflection summarizes observations of faceted interface developments in various environments such as: websites, library catalogue interfaces, and two complex knowledge domains - film and folktales over a ten year period. Two persistent constraints that impede forward movement are identified: metadata quality and interface design. The wicked problem of interface design is the focus of this work. Lessons drawn from the author's corpus of observations suggest fruitful and creative areas for development of more robust approaches to faceted interface development.

Classification and visualization: augmenting user independence and enhancing collections use
Fabrice Papy, University of Lorraine (France)

Information & Communication Technology (ICT), including Web Technology, are poorly exploited in French academic libraries, in general. They are limited to the library information systems (which are mainly for expert users, such as librarians, and anecdotally for end-users with an OPAC dedicated to document retrieval) and a very conventional federated Web portal for information retrieval from online databases. Nevertheless, the complexity of a university library has continued to grow and shows that inadequately trained users have great difficulty accessing collections. However, since the advent of Web 2.0, many new original and ergonomically enhanced interfaces have appeared and proved to be useful for navigating complex websites, reducing information and cognitive overload. The Visual Catalog, a new OPAC generation, was developed to help increase the use of library collections and improve users' ability to retrieve information. The results of studies of two academic libraries implementing the visual catalog solution have confirmed improvements in user experience. This Web catalog which links up classification data, authority file and other essential UNIMARC data is currently used by six French universities.

Sorting documents by base theme with synthetic classification: the double query method
Claudio Gnoli, University of Pavia (Italy)
Alberto Cheti, ISKO Italy

Classification offers unique power in allowing for systematic sorting of information items thus playing an important role in visualization of documents' content and their mutual relationships in the process of information retrieval. The majority of subjects in documents are about combinations of more than one concept. Therefore, classification notation representing the content has to be synthesized. As a classifier combines two or more classes from the schedules, the citation order of the notation elements affects the position of the document in sorted display. Among the concepts discussed in any document, a base theme and several particular themes can be identified. A general rule is that the notation representing the base theme should be cited first, thus producing a "helpful sequence" of compound classmarks. We propose a general method of information retrieval based on a double query combined with an appropriate systematic display of results: classmarks starting with the searched concept should be displayed before those having it as an inner part. We will discuss this principle on the example of a simple information retrieval interface currently being developed at University of Pavia and of ISKO's Knowledge Organization Literature online search interface.

From modeling to visualization of topic relationships in classification schemes
Rebecca Green, OCLC (USA)
Diane Vizine-Goetz, OCLC (USA)
Marcia Lei Zeng, Kent State University (USA)
Maja Žumer, University of Ljubljana (Slovenia)

Although developed primarily for controlled vocabularies, the Functional Requirements for Subject Authority Data (FRSAD) conceptual model has been extended to classification schemes, with a class corresponding to FRSAD's thema, and a class notation and the hierarchically-contextualized caption of a class both corresponding to FRSAD's nomen. This paper explores extending the FRSAD model to accommodate a topic-centered view of the Dewey Decimal Classification (DDC), in which topics are recognized as themas and Relative Index (RI) terms as nomens; a complex series of relationships involving topics and/or RI terms is also recognized. These subject authority data (which require local extensions to MARC classification and authority formats) support different user groups, including - in the DDC context - editors, translators, classifiers, information professional intermediaries, and end users. Use scenarios based on a topic-centered view of the DDC require system assistance, for example, an editor's revision of a topic's treatment throughout the DDC and an end user's discovery of resources topically related to a known resource, but not necessarily assigned the same class number. Visualization strategies supporting these use scenarios are proposed.

Semantic visualization for subject authority data of Chinese Classified Thesaurus
Wei Fan, University of Sichuan (China)
Shuqing Bu, National Library of China
Qing Zou , Lakehead University, Ontario (Canada)

The Chinese Classified Thesaurus (CCT) is a widely used knowledge organization tool in libraries of mainland of China, as well as being an integrated structure of the Chinese Library Classification (CLC) and the Chinese Thesaurus (CT). In practice, CLC and CCT share a common database and are managed by the same management system for synchronous updates. For a long period, CCT has key functions in library cataloging and indexing. However, its complicated knowledge structure and relation maps within are implicit to end-users and have urgent need for adapting to the Semantic Web (SW) environment. This paper discusses semantic representations of CCT's subject authority data based on Simple Knowledge Organization Systems (SKOS) modelling. Then, it explores a web-native dynamic semantic visualization interface implemented on terminology service platform. This could help the user to learn explicit and implicit Chinese knowledge structure and to experience interesting searches.

Enhancing user browsing success through visualization of indexing terms
Špela Razpotnik, National and University Library (Slovenia)
Alenka Šauperl, University of Ljubljana (Slovenia)

Users are overwhelmed by the linear presentation of indexing terms in catalogue records of COBISS, the Slovenian union library catalogue. About a third of all queries are subject queries, but understanding of subject description causes many problems for end-users and also for many librarians. The solution could be a properly designed web application based on ontology using visualization techniques to support indexing and retrieval. Our goal is to support the user in such queries by helping them navigate through large sets of retrieved records via tag clouds and similar visualization tools. The first step is to transform the subject headings list. The hierarchy and relationships can be based on UDC numbers. A search for a subject heading would retrieve a set of records, which would be represented in the final view either with the image of the book cover or with the bibliographic reference and abstract of a journal paper. The tools can also help with indexing. We believe that new features in browsing and indexing in a new generation OPAC could enhance both users' experience: cataloguers and end-users.

The Homer's list or How classifications can be displayed on tablets
Dario Rodighiero, Médialab at Sciences Po, Paris (France)
Giorgio Di Michelis, University of Milano – Bicocca (Italy)

In a passage of the Iliad, Homer writes a list of scenes represented on the shield of Achilles. The list is so long that many artists have attempted to create the shield, triggering a creative process to find a solution. In the same way librarians have been struggling for years to shape classifications. Thanks to the latest mobile technology, we could imagine how a shape can be transmigrated from a paper book to a tablet computer following technological progress. The paper exposes the results of innovative research to display and interact with classifications - targeting specifically the UDC - using tablet computers. Since these mobile devices force the designer to keep things simple, the classification will be initially displayed in the shape of a list, one of the simplest ways to represent information. Later, the list will be enriched with a complete navigation in the semantic and syntactic structure of the classification which, in turn, will result in the creation of an advanced application able to manage the screen rotation and classification's manipulation.

UDC in action
Richard Smiraglia, University of Wisconsin, Milwaukee (USA)
Andrea Scharhorst, Royal Netherlands Academy of Arts and Sciences (Netherlands)
Almila Akdag Salah, Royal Netherlands Academy of Arts and Sciences (Netherlands)
Cheng Gao, Royal Netherlands Academy of Arts and Sciences (Netherlands)

The UDC is not only a classification language with a long history, it also presents a complex cognitive system worthy of the attention of complexity theory. The elements of the UDC: classes, auxiliaries, and operations are combined into symbolic strings which in essence represent a complex networks of concepts. This network forms a backbone of ordering of knowledge and at the same time allows expression of different perspectives on various products of human knowledge production. In this paper we look at UDC strings derived from the holdings of libraries. In particular we analyze the subject headings of holdings of the university library in Leuven, and an extraction of UDC numbers from the OCLC WorldCat. Comparing those sets with the Master Reference File, we look into the length of strings, the occurrence of different auxiliary signs, and the resulting connections between UDC classes. We apply methods and representations from complexity theory. Mapping out basic statistics on UDC classes as used in libraries from a point of view of complexity theory bears different benefits. Deploying its structure could serve as an overview and basic information for users among the nature and focus of specific collections. A closer view into combined UDC numbers reveals the complex nature of the UDC as an example for a knowledge ordering system, which deserves future exploration from a complexity theoretical perspective.

Visualization and navigation of knowledge in pan-European resources: the case of The European Library
Nuno Freire, The European Library, Lisbon (Portugal)

The European Library provides access to research materials currently present in the collections of national and research libraries across Europe. Its most visible service is a portal which provides for searching and browsing collections, bibliographic records, digital objects and full text contents held by these libraries. This centralization of resources enables access to information under a unified knowledge organization system, and due to the diversity of languages and knowledge organization systems in use across European libraries, data mining technologies are being applied for automatic linkage of subject information. Current results are drawn from the project Multilingual Access to Subjects (MACS), which produced manual alignments between three major systems: the Library of Congress Subject Heading (LCSH), the Répertoire d'autorité-matière encyclopédique et alphabétique unifié (RAMEAU) and Schlagwortnormdatei (SWD). On-going work is targeting wider coverage of subject systems, by exploring the alignment of language independent systems such as subject classification systems (e.g. Dewey Decimal Classification and Universal Decimal Classification). Future work will address the integration with digital humanities research infrastructures, and how researchers in the field of digital humanities perceive the value of the resources offered by The European Library and Europeana, where it is expected that knowledge organization systems and ontologies have great relevance.

Cognitive Approach in Classification Visualization: end-users study
Veslava Osinska, Nicolaus Copernicus University, Torun (Poland)
Joanna Dreszer-Drogorob, Nicolaus Copernicus University, Torun (Poland)
Grzegorz Osinski, College of Social and Medial Culture, Torun (Poland)
Michal Gawarkiewicz, Nicolaus Copernicus University, Torun (Poland)

Visualization of scientific information extends the possibility to explore how the science is organized and does change over the time. Particularly classified data include a great potential of discovering the structure and dynamics of specified domain. The authors applied tested and previously presented conception of ACM CCS (Computing Classification System) classification mapping into a sphere surface. Classified documents form pattern according to their semantic similarity. Two main goals of obtained visualizations were determined. It could be mainly used as multiperspective analytical tool of original classification and its structure. Classification sphere also might be considered as an ergonomic interface for exploring scientific resources as well as information retrieval. Obtained graphical representations deliver quantitative material for analysis of classification development and dynamics. The authors try to find reliable tools to evaluate it. They constructed an appropriate interface and surveyed the distinct groups of users, who were asked about key aspects of visualization layout and their changes. Results of our study allow to evaluate visualization of classification thereby to improve proposed methodology as well as to discover a new semantic features and laws in visual layout.

Nederlab: visual analytics in a virtual research environment for humanities
Junte Zhang, Matthijs Brouwer; Hennie Brugman; Marc Kemps-Snijders; Jan Pieter Kunst; Nicoline van der Sijs; René van Stipriaan; Erik Tjong Kim Sang; Rob Zeeman, Meertens Institute, Royal Netherlands Academy of Arts and Sciences (Netherlands)

Nederlab (www.nederlab.nl) is a virtual research environment or laboratory for research on the patterns of change in the Dutch language and culture. Linguists and historians could use Nederlab to research Dutch language and cultural heritage by searching for and having interactive access to large amounts of historical texts and rich and structured metadata describing these resources. The text collections covered by Nederlab include literature i.e. fiction and non-fiction resources, massive amounts of newspaper articles, and the list of collections is set to increase. We demonstrate as example a concrete scenario for literary scholars, and show when, how and which visual analytics on metadata are powerful tools for exploring, finding, collecting and analyzing these texts for (historical and language) research. This includes visualizing the temporal and spatial dimensions for interactive search, and other contextual information such as the names and gender of authors, and comparative analytics of selected results.

The CEDAR Project: classifying the Dutch historical censuses
Ashkan Ashkpour, Erasmus University, Rotterdam (Netherlands)
Albert Meroño-Peñuela, Vrije University Amsterdam, (Netherlands)

The censuses are a rich source of historical information for researchers providing demographic, social and economic structures, yielding a wealth of data on many issues in the course of time. The Dutch historical censuses are currently digitized, but notoriously difficult to compare, aggregate and query in a uniform fashion: meaningful historical information is currently hidden in thousands of disconnected Exel Files and over 2,300 tables of aggregated data. The CEDAR project (eHumanities group) aims at enabling greater access and use of this dataset by applying a specific datamodel (exploiting the Resource Description Framework RDF technology), to make census data interlinkable with other hubs of historical socioeconomic and demographic data; and various harmonization practices. A large part of census data harmonization depends on the classification of the data. Querying these RDF data, we create visualizations in order to explore the thousands of variables in our data set and create bottom up classifications for housing variables, occupations, religious denominations, and so on. These visualizations correspond to different moments in history. We leverage animation techniques to display the conceptual changes that modified the social landscape in fundamental centuries of Europe's history.