2016-06-23: Joint Conference on Digital Libraries (JCDL) 2016 Trip Report

The ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL) is a major international conference that offers the opportunity to address technical, practical, and social issues associated with digital libraries. This annual conference was held at the Paul Robeson Campus Center, Rutgers University Newark, New Jersey, between June 19-23, 2016. Here is a list of the accepted papers and posters/demos.

The theme for this year's conference was Big Libraries, Big Data, Big Innovation. Computer/Information scientists, Librarians, Archivists, Social scientists, Historians and other participants from around the world and other disciples involved in digital library research and practice presented papers and posters, specialized workshops (see Mat's blog post about the WADL workshop), tutorials, panels, and a doctoral consortium (see Sawood's blog post about the doctoral consortium). Also joining this distinguished collection of professionals were five members of the ws-dl group: Dr. Michael Nelson, Dr. Michele Weigle, Sawood Alam, Mat Kelly, and myself (Alexander Nwala).
The first day kicked off at 9am until 5pm (June 19, 2016) with two concurrent events - Tutorials and a Doctoral Consortium at the Paul Robeson Campus Center at Rutgers University. We attended the Doctoral Consortium in which Sawood presented his work. The tutorials presented were as follows:
  1. Introduction to Digital Libraries, presented by Edward A. Fox (Virginia Tech).
  2. Introduction to the Digital Public Library of America (DPLA) API, presented by Unmil P. Karadkar, (The University of Texas at Austin), Audrey Altman and Mark Breedlove (Digital Public Library of America).
  3. Information Extraction for Scholarly Digital Libraries, presented by Kyle WilliamsJian Wu,  Zhaohui Wu and C. Lee Giles (Pennsylvania State University).

Day 1 (June 20, 2016):

The conference officially began on the second day with a keynote titled Future Digital Libraries: Research and Responsibilities by Maria Zemankova of the National Science Foundation in which she talked about libraries, archives, museums, and collections. The talk began with a brief history of libraries, before exploring digital libraries of today,  and the future of digital libraries, etc.

Two concurrent paper sessions (from 11am - 12:30pm) followed the keynote. The first paper session was about Wikipedia and Newspaper Analysis, and the second was about Curation and Education.

The first paper session chaired by Dr. Herbert Van de Sompel (Los Alamos National Laboratory), consisted of the following presentations and was attended by the five members of the ws-dl group present:
  1. Querylog-based Assessment of Retrievability Bias in a Large Newspaper Corpus, by Myriam Traub, Thaer Samar, Jacco van Ossenbruggen, Jiyin He, Arjen de Vries and Lynda Hardman: Myriam Traub et al. addressed the problem of bias in digital libraries by measuring the effectiveness of retrievability measure using a large collection of digital newspapers.
  2. Digital History Meets Wikipedia: Analyzing Historical Persons in Wikipedia, by Adam Jatowt, Daisuke Kawai, and Katsumi Tanaka: Adam Jatowt et al. conducted a temporal analysis about historical persons in Wikipedia, by examining the hyperlink structure of documents in order to understand the relationship between time, link structure and article popularity.
  3. Quality assessment of Wikipedia articles without feature engineering, by Quang Vinh Dang and Claudia-Lavinia Ignat: Given the popularity of Wikipedia and concern about the quality of Wikipedia documents, Quang Vinh Dang et al. addressed the problem of assessing Wikipedia articles for quality by not engineering a list of features which indicate quality. Instead, they assessed Wikipedia articles for quality by analyzing their content, rather than considering a feature set. This was achieved through a deep learning/natural language processing framework.
  4. Glyph Miner: A System for Efficiently Extracting Glyphs from Early Prints in the Context of OCR, by Benedikt Budig, Thomas C. Van Dijk and Felix Kirchner:  Benedikt Budig et al. devised a system that replaces a common part of the OCR training pipeline with a more efficient workflow. Given a set of scanned historical documents, their user-interactive system extracts large numbers of glyphs selected by the user. 
The second paper session chaired by Dr. Edward A. Fox (Virginia Tech) consisted of the following presentations:
  1. Enhancing Scholarly Use of Digital Libraries: A Comparative Survey Review of Bibliographic Metadata Ontologies, by Jacob Jett, Terhi Nurmikko-Fuller, Timothy W. Cole, Kevin R. Page and J. Stephen Downie.
  2. Data Curation with a Focus on Reuse, by Maria Esteva, Sandra Sweat, Robert McLay, Weijia Xu  and Sivakumar Kulasekaran.
  3. Unraveling K-12 Standard Alignment; Report on a New Attempt, by Byron Marshall, Rene Reitsma and Carleigh Samson.
  4. Research on the Follow-up Actions of College Students' Mobile Search, by Dan Wu and Shaobo Liang.
A short break followed the paper sessions, after which two concurrent sessions were conducted. The first panel session titled Issues of Dealing with Fluid Data in Digital Libraries, was chaired by Byron Marshall (Oregon State University) and consisted of the following panelists:
  1. Soo-yeon Hwang - School of Communication and Information, Rutgers University
  2. Melissa Cragin - National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign
  3. Michael Lesk - School of Communication and Information, Rutgers University
  4. Yu-Hung Lin - Rutgers University Libraries
  5. Daniel O'Connor - School of Communication and Information, Rutgers University
The third paper session (ws-dl members present) was titled Web Archiving. The paper session was chaired by Martin Klein (University of California, Los Angeles) and consisted of the following presentations:
  1. Routing Memento Requests Using Binary Classifiers, by Nicolas J. Bornand, Lyudmila Balakireva and Herbert Van de Sompel. Nicolas J. Bornand et al. explored the use of binary and archive-specific classifiers to determine whether or not to query an archive for a given URI. This method was showed to significantly decrease the number of requests and the overall response time of the aggregator, without compromising recall.
  2. The Dawn of Today's Popular Domains: A Study of the Archived German Web over 18 Years, by Helge Holzmann, Wolfgang Nejdl and Avishek Anand: In an effort to see what the future of the Web is, Helge Holzmann et al. embarked on a longitudinal study to see how websites evolved over time by studying a German Web collection to retrospectively analyze how the popular Web evolved over the past 18 years. 
  3. ArchiveSpark: Efficient Web Archive Access, Extraction and Derivation, by Helge Holzmann, Vinay Goel and Avishek Anand: Researchers exploring digital libraries require tools that provides efficient access to Web archive data for extraction and derivation of smaller datasets. To fulfill this need, Helge Holzmann et al. proposed ArchiveSpark; a framework for efficient and distributed Web archive processing.
  4. Shortly after the third paper session, was a plenary meeting in which the Chairs Michael L. Nelson (JCDL Steering Committee Chair) and Sally Jo Cunningham (TCDL Chair), solicited feedback on how the conference may be improved.
    The Minute Madness was next. During the Minute Madness, session Chairs Luis Francisco-Revilla and Ian Milligan conducted a strictly timed event in which poster presenters were given one minute to entice conference participants to visit their poster stands.
    The Poster session and a reception began right after the Minute Madness.

Day 2 (June 21, 2016):

Day 2 of the conference, just like Day 1 began with a keynote address by Rachel Frick (DPLA), titled The State of Practice and Use of Digital Collections: the Digital Public Library of America as a platform for research, in which she talked about the DPLA, exploring its history as well as the past and current efforts.
The fourth paper session followed the keynote. It was chaired by Xiaozhong Liu (Indiana University) and consisted of the following presentations:
  1. Low-cost semantic enhancement to Digital Library metadata and indexing: Simple yet effective strategies, by Annika Hinze, David Bainbridge, Sally Jo Cunningham and J. Stephen Downie: Annika Hinze et al. addressed accessing digital libraries non-disruptively and cheaply by using the results of semantic analysis and disambiguation, while retaining a keyword-based search and lexicographic index.
  2. Desiderata for Exploratory Search Interfaces to Web Archives in Support of Scholarly Activities, by Andrew Jackson, Jimmy Lin, Ian Milligan and Nick Ruest: Andrew Jackson et al. described an exploratory search interface to web archives for humanities scholars and social scientists by presenting their initial implementation and discussed their findings in terms of a desiderata for the system.
  3. Content Selection and Curation for Web Archiving: The Gatekeepers vs. the Masses, by Ian Milligan, Nick Ruest and Jimmy Lin: Ian Milligan et al. addressed the question: "what should we archive?" by a case study about the 2015 Canadian federal elections by comparing a broad ("gatekeepers") crawl approach to a "the masses" crawl approach. Through their study, they recommended a hybrid approach that combines social media and more traditional curatorial methods.
  4. Towards Better Understanding of Academic Search, by Madian Khabsa, Zhaohui Wu and C. Lee Giles: Madian Khabsa et al. studied the distribution of queries that are received by an academic search engine. They also introduced a machine learning approach to identify navigational academic queries.
  5. Investigating Cluster Stability when Analyzing Transaction Logs, by Paul Clough and Daniel Grech: Paul Clough et al. computed stability based on the Jaccard coefficient to investigate the cluster stability when using different subsets of transaction log data from
After an hour and half lunch break, two concurrent events began - a second panel session titled Preserving Born-Digital News (chaired by Vivek Singh, Rutgers University), and a fifth paper session on Q&A and Gaming (chaired by Sally Jo Cunningham, University of Waikato).
The participants of the second panel session (attended by the ws-dl members) consisted of the following panel participants:
  1. Edward McCain (Organizer) - Donald W. Reynolds Institute, University of Missouri Libraries
  2. Matthew Weber - School of Communication and Information, Rutgers University
  3. Martin Klein - University of California Los Angeles Libraries
The fifth paper session titled Q&A and Gaming and chaired by Sally Jo Cunningham (University of Waikato) consisted of the following papers and corresponding presenters:
  1. Experimental Evaluation of Affective Embodied Agents in an Information Literacy Game, by Yan Ru Guo, Dion Hoe-Lian Goh, Hurizan Bin Hussain Muhamad, Boon Kuang Ong and Zichao Lei.
  2. Evaluating the Quality of Educational Answers in Community Question-Answering, by Long Le, Chirag Shah and Erik Choi.
  3. Music Information Seeking via Social Q&A: An Analysis of Questions in Music StackExchange Community, by Hengyi Fu and Yun Fan.
The sixth paper session followed the previous paper/panel sessions. It was titled Publication Mining, chaired by Giorgio Maria Di Nunzio (University of Padua), and consisted of the following papers and corresponding presenters:
  1. PDFFigures++: Mining Figures from Research Papers, by Christopher Clark and Santosh Divvala: Christopher Clark et al. presented their tool/algorithm (PDFFigures 2.0) which extracts figures, tables, and captions from scholarly documents. Their algorithm showed impressive results (94% precision at 90% recall) on the test dataset, and surpassed the state of the art.
  2. Comparing Published Scientific Journal Articles to Their Pre-print Versions, by Martin Klein, Peter Broadwell, Sharon Farb and Todd Grappone: U.S. academic libraries paid $1.7 billion for serial subscriptions in 2008 alone to academic publishers. Consequently, the analysis of Martin Klein et al. revealed that the text contents of scientific papers generally changed very little from their pre-print to final published versions. Thereby providing information to facilitate economic decision targeting subscriptions.
  3. Extracting Academic Genealogy Trees from the Networked Digital Library of Theses and Dissertations, by Wellington Dores, Fabricio Benevenuto and Alberto Laender: Given the decentralized storage of research theses and dissertations across local digital libraries, exploring the genealogy of researches over time is challenging. Thus, Wellington Dores et al. presented a first step towards building a large repository that records the academic genealogy of researchers across different fields and countries.
  4. Probabilistic Assignment of Medical Subject Headings to PubMed Records Based on References and Abstract Similarity, by Adam Kehoe and Vetle Torvik: Adam K. Kehoe et al. described a method for assigning Medical Subject Headings (MeSH) to unlabeled documents by combining abstract similarities and direct citations to labeled MEDLINE records.
After the sixth paper session of the conference a banquet at the Newark Museum followed.
In this banquet, the winners/runner-ups of the best paper and poster were announced and recognized. The Vannevar Bush best paper award went to Comparing Published Scientific Journal Articles to Their Pre-print Versions by Martin Klein, Peter Broadwell, Sharon E. Farb, and Todd Grappone.
The best poster, third place went to my poster: A Supervised Learning Algorithm for Binary DomainClassification of Web Queries using SERPs 
The second place by one vote went to Avoiding the Drunkard's Search: Investigating Collection Strategies for Building a Twitter Dataset, by Clare Llewellyn, Laura Cram, and Adrian Favero

Day 3 (June 22, 2016):

The third day of the conference was split into three sections - the seventh paper session titled Recommendation and Prediction, a keynote by Stephen Bury (New York Art Resources Consortium - NYARC), and four workshops.
The seventh paper session was chaired by Edie Rasmussen (University of British Columbia), and consisted of the following presentations:
  1. Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles?, by Chifumi Nishioka and Ansgar Scherp:  To address the lack of clarity about how different factors of a scientific publication recommender system (based on users' tweets) have an influence on the recommendation performance, Chifumi Nishioka et al. examined three different factors - profiling method, temporal decay, and richness of content.
  2. Early Prediction of Scholar Popularity, by Masoumeh Nezhadbiglari, Marcos Goncalves and Jussara Almeida: Masoumeh Nezhadbiglari et al. tackle the problem of predicting the popularity of scholars by attempting to make the  predictions both as earlier and accurate as possible.
  3. Evaluating Link-based Recommendations for Wikipedia, by Malte Schwarzer, Moritz Schubotz, Norman Meuschke, Corinna Breitinger, Volker Markl and Bela Gipp: Malte Schwarzer et al. reported on the first large-scale investigation about the the performance of the Co-Citation Proximity Analysis method of generating recommendations for Wikipedia. They analyzed links instead of citations to generate article recommendations.
The main conference ended following the presentations from the seventh paper session, but not before Ian Milligan invited us to attend JCDL 2017 in Canada!
--Nwala (@acnwala)

