20th ACM/IEEE Joint Conference on Digital Libraries Libraries (JCDL 2020) was hosted virtually (due to the COVID-19 pandemic) jointly by the School of Information Management at Wuhan University and the School of Public Administration at Northwest University, between August 1 - 5, 2020 in Wuhan, Hubei, China. Similar to last year's conference, we (members of WSDL) attended paper sessions, workshops (Web Archiving and Digital Libraries), and tutorials, where researchers from multiple disciplines presented their work via Zoom. The theme of this year's conference was "Speedier Innovation, Sustainable Development, Societal Transformation".

On 1st of August, Yasith Jayawardana (PhD student of WSDL) attended the JCDL 2020 Doctoral Consortium. It took place immediately preceding the technical program of JCDL 2020. The Doctoral Consortium event allows students to present their thesis and research plans and receive feedback and general advice from the others in the field. [Doctoral Consortium Trip Report]

Day 1

Welcome

Day 1 of JCDL 2020 started off with welcome statements by Prof. Qizhu Tang (Professor and Vice President at the Wuhan University), Prof. Xiaokang Lei (Professor at School of Public Administration, Northwest University), Prof. Dan Wu (JCDL 2020 General Chair, Professor at School of Information Management, Wuhan University), and Prof. Daqing He (JCDL 2020 Program Chair, Professor at School of Computing and Information, University of Pittsburgh).

Keynote 1

Following the welcome statements, Dr. Edward Fox from Virginia Polytechnic Institute and State University, delivered the first keynote of JCDL 2020. Dr. Edward Fox is the founder, Executive Director, and Chairman of the Board of the Networked Digital Library of Theses and Dissertations. The title of his keynote was: How Should One Explore the Digital Library of the Future?

Dr. Edward Fox's keynote emphasized that an essential attribute of humankind today is to explore and discover, and the research process is a series of information needs, information sources, and information search. He also suggested a mathematical way to think about the 5Ss (Streams, Structures, Spatial, Scenarios, and Societies) in Digital Libraries.

Day 2

Day 2 of JCDL 2020 started with Scholarly Communication session followed by seven other sessions including minute madness and poster/demo session.

Scholarly Communication

This session began with Rima Hazra from Indian Institute of Technology presenting the first full paper of the session, titled: Characterizing Authors on the Extent of their Paper Acceptance: A Case Study of the Journal of High Energy Physics [paper]. The work explores what is the impact on citation count and reviews of papers when authors with a lower acceptance rate collaborates with authors with a higher acceptance rate, and vice versa. Their results show that when authors with a lower acceptance rate ~20% collaborate with authors with a higher acceptance rate ~80%, their citation count increases, and attracts positive reviews.

Next, Jingtao Han from Peking University, China, presented a full paper titled: Gatekeeper: Analyzing G-Indexes and Improving Service Quantification. They propose a tool which quantifies scholars’ service impacts based on their roles as gatekeepers in conferences. Gatekeeper refers to a person who serves on a program Committee of a given conference. They introduced 3 types of Gatekeeper-index (G-index) methods to quantify Gatekeeper's service impact, which could also capture service roles and the quality of a conference. They have discovered that authors with a high service impact have a high research impact.

Finally, Baani Lean Kaur Jolly and Lavina Jain from IIIT-Delhi, India presented their full paper presentation titled: Unsupervised Anomaly Detection in Journal-Level Citation Networks. Their study focused on whether the impact factor of a journal could be manipulated by anomalous citations. They also introduced JoCAD, a novel dataset, which consists of synthetically injected citation anomalies, which inspires future research in this direction.

User in Search

In this session Yusuke Yamamoto from Shizuoka University, Japan, presented the first full paper (Vannevar Bush Best Paper nominee) titled: Personalization Finder: A Search Interface for Identifying and Self-controlling Web Search Personalization. They have developed a system that identifies search results that were personalized for a user, and performed a survey on how it impacts the search process of participants. Their results show that most users were unaware that search results were personalized. Further, personalized results have made users spend longer in SERPs, and led users to choose deeper SERPs for political topics, but shallower for entertainment topics. This survey had also revealed that most participants did not want personalization for politics, and did not believe that most of their results were personalized by the search engine.

Next, Yao Zhang from Peking University, China, presented the full paper titled: Users’ Knowledge Use and Change during Information Searching Process: A Perspective of Vocabulary Usage. Their work analyzed how the users' prior knowledge affects their search process, and how a knowledge gain from the search process affects their knowledge structures. Their results show that prior knowledge plays a large role in query creation because as the search session proceeds, the prior knowledge vocabulary decreases. Also they found that many users copy/paste text from pages during a session and do not tend to modify their problem solving process during search.

Finally, Ying Que from The University of Hong Kong, presented the full paper titled: Exploring the Effect of Personalized Background Music on Reading Comprehension, in which they conducted a user study with a reading task using English passages from topics that were emotionally neutral while listening to music. Their user study unveiled that background music had no detrimental effects on the reading task, but the users consumed more time to complete the task. The effective music type was different across age, such that younger people prefer more energetic music. They also saw better accuracy at performing the assigned task when background music was present, but added cognitive load.

Digital Libraries 1

Mayank Singh from Indian Institute of Technology Gandhinagar, India began this session with a full paper presentation titled: Identification, Tracking and Impact: Understanding the Trade Secret of Catchphrases.

Next, Shuo Yu from Dalian University of Technology, China, presented the full paper (Vannevar Bush Best Paper Nominee and Best Student Paper Nominee) titled: Multivariate Relations Aggregation Learning in Social Networks. The presented work was done in collaboration with University of South Australia, Monash University, and Federation University Australia.

Finally, WSDL alumni Lulwah Alkwai (now University of Hail, Saudi Arabia) presented the full paper titled: Making Recommendations from Web Archives for “Lost” Web Pages. The co-authors of the paper are WSDL faculty Dr. Michael Nelson and Dr. Michele Weigle. They proposed an approach to enhance HTTP 404 / 200 responses by surfacing archived web pages that the user may not know existed. They use the URI without the page's content since Many URIs already contain semantic information. Using this information, they showed how to classify a URI to make recommendations for lost pages. Main contributions of their work are, getting information from just the URIs, classifying a web page from just its URI, and making recommendations based on just the URI. [Preprint] [Paper] [GitHub]

Shawn M. Jones (WSDL member) has re-applied the presented technique in "Social Cards Probably Provide For Better Understanding Of Web Archive Collections" to show how much of Archive-It seed URIs contain semantic information.

Scholarly Knowledge 1

Allard Oelen from Leibniz University Hannover, Germany, began this session with their full paper presentation titled: Generate FAIR Literature Surveys with Scholarly Knowledge Graphs. FAIR principle serves as a guide to generate metadata that retains the Findability, Accessibility, Interoperability, and Reusability of data. The authors have attempted to generate and publish literature surveys using Open Research Knowledge Graph (ORKG).

Next, Lars Vogt, from Leibniz University Hannover, Germany, presented the full paper titled: Toward Representing Research Contributions in Scholarly Knowledge Graphs Using Knowledge Graph Cells.

Finally, Michael Färber from Karlsruhe Institute of Technology, Germany, presented their full paper titled: HybridCite: A Hybrid Model for Context-Aware Citation Recommendation.

Document Classification

Malte Ostendorff from German Research Center for Artificial Intelligence, Germany, began this session with their full paper presentation titled: Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Article. According to the presenter, their work could potentially help recommender systems. They have used a new dataset of 32,168 Wikipedia article pairs and Wikidata properties to evaluate their systems. From their results, they have observed the Vanilla BERT system to be the best performing system with the highest F1-score.

Next, Philipp Scharpf from University of Konstanz, Germany, presented their full paper titled: Classification and Clustering of arXiv Documents, Sections, and Abstracts, Comparing Encodings of Natural and Mathematical Language. They have found that combining text and math encodings does not improve the classification accuracy, doc2vec outperforms tf-idf, multilayer perceptron is the most accurate classification algorithms, and K-NN, Random Forest, HDBSCAN are the fastest clustering algorithms in the context of Classification and Clustering of arXiv Documents, Sections, and Abstracts.

Finally, Julian Risch from Hasso-Plattner-Institute, University of Potsdam, Germany presented their full paper presentation titled: Hierarchical Document Classification as a Sequence Generation Task. The neural network model they designed addresses hierarchical classification as a sequence generation task, and generates a sequence of class labels given a document’s text representation. Benefits of their approach are (1) ability to encode parent-child relationships and (2) ability to represent rare leaf nodes.

Minute Madness and Demo / Poster session

Unlike previous years (2019, 2018), since the conference was virtual this year, the authors of posters had to create a minute long video to advertise their respective posters to the conference attendees and uploaded it to assigned Poster Halls in Padlet for the Minute Madness. Altogether, there were 3 poster halls: Poster Hall 1, Poster Hall 2, and Poster Hall 3. During the Minute Madness session, the conference host played all videos uploaded by the authors. The poster session began after the Minute Madness session. Each poster was assigned to a breakout room in Zoom. Conference attendees were directed to the breakout rooms according to their preference.

This year, we featured 3 posters from WSDL in Poster Hall 2. The posters from WSDL were:

Poster 313: "Streaming Analytics and Workflow Automation for Dataset File System (DFS)", by Yasith Jayawardana, Sampath Jayarathna.

Poster 316: "A Heuristic Baseline Method for Metadata Extraction from Scanned Electronic Theses and Dissertations", by Muntabir Choudhury, Jian Wu, William Ingram, and Edward Fox.

Poster 317: "Analyzing the Effect of Reading Patterns using Eye Tracking Measures", by Gavindya Jayawardena, Sampath Jayarathna, Jian Wu.

Natural Language Processing and Web Archive 1 sessions were held after the poster/demo session.

Natural Language Processing

Elvys Linhares Pontes from University of La Rochelle, France began this session with their short paper presentation titled: Linking Named Entities Across Languages Using Multilingual Word Embeddings

Next, Hai Thi Tuyet Nguyen from University of La Rochelle, France presented their short paper presentation titled: Neural Machine Translation with BERT for Post-OCR Error Detection and Correction. They presented a novel technique to detect OCR errors using BERT models and word embedding. They used ICDAR monograph and periodical from Competitions in ICDAR in 2017 and 2019 for evaluation.

Afterwards, David Pride from KMi, The Open University, United Kingdom presented their short paper titled: An Authoritative Approach to Citation Classification. Their goal was to understand why some papers were cited. They have used their own framework (ACT: An Annotation Platform for Citation Typing at Scale) which allows users to classify citation into one of the predefined classes, to conduct a survey. The survey participants have been first authors of multiple papers, and they have been asked to annotate citations of their own papers.

Finally, Cornelius Ihle from Daimler AG, Germany / University of Wuppertal, Germany, presented their short paper titled: A First Step Towards Content Protecting Plagiarism Detection. In this presentation they introduced an approach which uses hashed feature subsets to prevent pre-image and dictionary attacks.

Web Archive 1

Ian Milligan from University of Waterloo, Canada, began this session with their full paper presentation titled: The Archives Unleashed Project Technology, Process, and Community to Improve Scholarly Access to Web Archives. The goal of the Archives Unleashed Project is to improve scholarly access to web archives for Librarians/Archivists, Computer Scientists, and Scholars. The Archives Unleashed Project uses the FEAV model (Filter, Extract, Aggregate, Visualize) to help users learn from the WARCs in web archive collections. It also provides various derivatives that are useful to external applications, such as domain distribution for analysis of domains, domain webgraph for analysis of links to view network effects, plain text for text analysis, metadata analysis, etc. Recently WSDL member Travis Reid covered how to use the Archives Unleashed cloud in his blog: Working With Archives Unleashed Cloud.

Next, Krutarth Patel from Kansas State University, United States (Vannevar Bush Best Paper Award nominee and Student Best Paper Award nominee) presented their full paper titled: Identifying Documents In-Scope of a Collection from Web Archives. They have used three models, Bag-of-Words (BoW), CNN, and Structural (Str) to classify documents of UNT.edu, Texas.gov, and USDA.gov into different categories. With bag-of-words from partial documents, they have discovered that their Random Forest method outperforms having to process the entire document and using a CNN. According to WSDL member Shawn M. Jones, The Off-topic Memento Toolkit (OTMT) from Dark and Stormy Archives, uses memento TimeMaps from a web archive collection to find off-topic mementos and it might be improved by this presented work because they are analyzing the whole collection. A better contrast from what Patel et al. describes and what OTMT does is: OTMT is intra-TimeMap whereas Patel et al. is inter-TimeMap (i.e., collection-granularity).

Finally, Xinyue Wang and Zhiwu Xie from Virginia Tech presented their full paper presentation titled: The Case for Alternative Web Archival Formats to Expedite the Data-To-Insight Cycle. According to the presenters, the modern supply chain uses barcodes, which are not user-friendly but essential for moving products, WARCs were designed to serve users directly, and other big data formats (PARQUET, AVRO) will work more efficiently for machines and permit new insights. While highlighting the shortcomings in WARC file format for storing archived data, they proposed the use of big data systems. They have conducted experiments with serialization formats such as PARQUET and AVRO to run benchmarks.

Day 3

Scholarly Knowledge 2

This session presented three full papers, with all presenters from the Asia-Pacific Region.

First, Jinsong Zhang from Dalian University of Technology, China presented "Characterize and Evaluate Scientific Domain and Domain Context Knowledge Map". The authors tested the precision and recall of domain knowledge maps, generated using different graph construction methods. They used information retrieval as their domain of study and curated a corpus of 6,171 publications from venues such as SIGIR, CIKM, CIVR, GIR, etc.

Next, Renli Wu from Wuhan University, China presented "The Knowledge Import and Export of LIS: The Destinations, Citation Peak Lag, and Changes". They have developed the metric "Citation Peak Lag (CPL)", which measures the flow of knowledge within/among fields. Their evaluation shows that the field "Computer Science" rapidly spreads into the field "Library and Information Science". Moreover, hard / pure scientific fields had a faster and greater flow among all disciplines within that category.

Finally, Souvic Chakraborty from IIT Kharagpur, India presented "Aspect-based Sentiment Analysis of Scientific Reviews". They propose employing aspect-based sentiment analysis, "which correlates well with the accept/reject decision." This is because "certain aspects that were present in a paper and discussed in its review, strongly determine the final decision."

Tutorial 2

This session, titled "Preparing Code and Data for Computational Reproducibility", was conducted by Dr. William Ingram, and Dr. Edward Fox, from Virginia Polytechnic Institute and State University, USA. In this session, they discussed key limitations of the research pipeline, in the context of reproducible data. They provide a hands-on introduction to CodeOcean, a research collaboration platform that uses the concepts of containerization to create, collaborate, share, execute, and publish scientific code and data.

Digital Libraries 2

This session presented four short papers, with all presenters from the Asia-Pacific region.

First, Menasha Thilakaratne from University of Adelaide, Australia presented "Information Extraction in Digital Libraries: First Steps towards Portability of LBD Workflow". They analyzed how well DBpedia resembles MeSH, and claimed that "DBpedia resembles knowledge inferences performed using MeSH with a high precision".

Next, David Bainbridge from University of Waikato, New Zealand, presented “Finding a Safe Port: Cyber-Security Analysis for Open Source Digital Library Software”. In their analysis, they have conducted port scanning, vulnerability scanning, penetration testing, static code analysis, and dynamic application security testing on two software: DSpace and Greenstone. They concluded that vulnerabilities arise when we leave the applications on default mode (The "Danger of Defaults").

Next, Toshiyuki Shimizu from Kyoto University, Japan presented “Keyword Recommendation Methods for Earth Science Data Considering Hierarchical Structure of Vocabularies”. He claimed that there are not enough keywords in metadata of earth science papers, and the similarity between keyword definitions and the paper abstracts can be computed to generate recommended keywords. Why the Earth Science discipline benefits from this method, is "due to its controlled vocabulary". From this method, they were able to improve the recommendation accuracy compared to their baselines.

Finally, Ruilun Liu from the University of Hong Kong, China presented “A Multimodal Music Recommendation System with Listeners’ Personality and Physiological Signals”. In addition to applying personality traits and listening history to the recommendation training model, the authors also recorded physiological data via specific smartwatches. Unfortunately, they discovered that "physiological signals didn't improve the performance of their multimodal model".

Digital Humanities

This session presented four short papers, with all presenters from the Asia-Pacific region.

First, Dana Mckay and George Buchanan from University of Melbourne, Australia presented "Strike a Pose: Gender and the Public and Private Performance of Magazine Reading". They explored the privacy aspect of reading. Interestingly, they found no difference among men and women for reading electronic magazines. However, the types of print magazines read had shown notable differences.

Next, Xiao Hu from University of Hong Kong, Hong Kong presented "Evaluation of Low-end Virtual Reality Content of Cultural Heritage: A Preliminary Study with Eye Movement". Here, they have combined techniques of VR and Eye-Tracking to evaluate the understanding of cultural heritage of participants. They have visualized cultural monuments through VR, and observed eye tracking patterns of participants to test their hypotheses.

Next, Menasha Thilakaratne from University of Adelaide, Australia presented "Garbage In, Garbage Out? An Empirical Look at Information Richness of LBD Input Types". Here, the authors have applied "Information Foraging Theory", which explains the phenomenon of obtaining maximum gain from minimum effort, into Literature Based Discovery (LBD). She mentioned that the LBD framework could have potential applications in areas such as Drug Repositioning.

Finally, Shuran Liu and Jun Wang, from Peking University, China presented "How to Organize Digital Tools to Help Scholars in Digital Humanities Research?". By interviewing scholars and analyzing widely-used Digital Humanities (DH) tools, the authors derived 9 research tasks, 4 stages of the humanities research process, and 4 categories of research techniques. They categorized the used DH tools based on these, and developed a navigation website to enable scholars to find digital humanities resources easily.

Domain-Specific Applications

This session presented four short papers, with all presenters from the Europe region.

First, Corinna Breitinger from University of Konstanz, Germany presented "Supporting the Exploration of Semantic Features in Academic Literature Using Graph-based Visualizations". They introduced a visual interface, RecVis for exploring the results of academic literature retrieval using a force-directed graph layout. The biggest challenge, according to Corinna, is "finding the best standardized recommender system" .

Next, Malte Ostendorff from German Research Center for Artificial Intelligence, Germany presented "Towards an Open Platform for Legal Information". They showed that while legal information systems improve the accessibility and processing of legal documents, they have limitations in their scope and extensibility. The open source platform proposed in their presentation, uses improved metadata to enable transparent processing and access of Legal Open Data. As of publication, they have created such metadata for over 250,000 German laws and court decisions.

Next, Timo Spinde from University of Konstanz, Germany and University of Wuppertal, Konstanz, GA, Germany presented "Enabling News Consumers to View and Understand Biased News Coverage: A Study on the Perception and Visualization of Media Bias". The authors have manually annotated datasets and tested varying visualization strategies. Interestingly they have discovered that becoming aware of the media bias of certain political news had no strong effect on how they were perceived.

Finally, Bikash Gyawali from The Open University, UK presented "Open Access 2007 -- 2017: Country and University Level Perspective". They discovered that there was no link between the proportion of Open Access papers published by authors at a university, and the ranking of that university.

Keynote 2

The second keynote of JCDL 2020, Title: "Towards a Sustainable Infrastructure for the Preservation of Cultural Heritage and Digital Scholarship", was presented by Dr. Peter Zhou. He's the assistant University Librarian and Director of the C.V. Starr East Asian Library at UC Berkeley, USA, and holds a PhD in Linguistics and a MS in Library and Information Science from University of Illinois, USA.

In this keynote, he discussed the challenges of preserving digital content, and the sustainability of such methods. He explained the major building blocks of the lifecycle of digital content – 1) Creation > 2) Organization > 3) Preservation > and 4) Publication. He used the analogy of printed content to explain how digital content should be preserved. He emphasized that we need to think about collection, preservation, and dissemination of digital content similar to printed content, and that we need institutional power to ensure that preservation of digital content lasts sustainably for the years to come.

In the latter part of his keynote, Dr. Zhou explained how these concepts were applied to the task of digitally preserving the structure of the Dunhuang Cave in China. They have captured digital images of the cave structure at different points in time, similar to how mementos are created for a webpage. He demonstrated how VR and 3D modeling was used to provide an immersive experience of how the Dunhuang cave appeared in the past.

The keynote concluded with three key takeaways:

Facilitate DLC development
Digital preservation is important world wide
The goal is to preserve Digital content for next 100 years

Scholarly Data

This session presented three long papers, with all presenters from the North American Region.

First, Jodi Schneider from iSchool at University of Illinois, USA presented “Towards Knowledge Maintenance in Scientific Digital Libraries with the Keystone Framework”. They attempted to address the question "How do we handle retracted papers?" by introducing a new framework, the keystone framework, designed to identify when and how citing unreliable findings impacts a paper. They conducted two pilot case studies and demonstrated that the keystone framework indeed can be applied for knowledge maintenance.

Next, Nicholas Weber from University of Washington, USA presented "Seeking Justification: How Expert Reviewers Validate Empirical Claims with Data Annotations". They used data annotations to increase the transparency and accessibility of data. In their study, they have asked participants to review a research paper and judge the validity of its empirical claims. They discovered that data annotation did not improve the ability of reviewers to validate such claims. However, having data annotations have improved the trust that reviewers had towards a paper.

Finally, Maria Esteva and Weijia Xu from UT Austin, USA presented "Modeling Data Curation to Scientific Inquiry: A Case Study for Multimodal Data Integration". They presented an approach to optimize multimodal datasets curation and maximize data reuse, and demonstrated their case study, ASTRIAGraph.

Web Archive 2

This session presented four short papers, with all presenters from the North American Region.

First, Kurtis Weir from University of Wolverhampton, UK presented "Creating a Bespoke Virtual Reality Personal Library Space for Persons with Severe Visual Disabilities". They have created a VR-driven personal library environment to allow people with visual impairments to engage in reading tasks. They claim it could be personalized to an extent of changing font sizes, colors, direction of the lighting source etc.

Next, Dr. Jian Wu from our WSDL Group at ODU presented “A Comparative Study of Sequence Tagging Methods for Domain Knowledge Entity Recognition in Biomedical Papers”. It evaluates different models to perform domain knowledge entity recognition, and concluded that, 1) the ML model is not always better than the DL model, 2) the corpus size affects the effectiveness of this method, 3) pre-trained word embeddings (e.g., ELMo) play a role in improving NN models, and 4) attention mechanisms reduce the performance of these models.

Next, Xiaolang Jiang from University of Illinois, USA presented “On the Ambiguity and Relevance of Place Names in Scientific Text”. Here, the authors ask two questions, 1) "Can you retrieve a paper about a given location?", and 2) "Can we disambiguate place names in scientific text?" by applying a search interface on a sample of place name sentences from PubMed abstracts. They conclude that to study the role of place in scientific text, disambiguation of place names should be accompanied by assessing their degree of relevance.

Finally, Yuerong Hu, also from University of Illinois, USA presented "Improving Digital Libraries’ Provision of Digital Humanities Datasets: A Case Study of HTRC Literature Dataset". Here, they have investigated the limitations of the curated datasets provided by digital libraries in the context of digital humanities. They suggested that the usability of digital libraries could be improved by flagging the limitations of each dataset.

Day 4

Content Annotation

This session presented three long papers, with all presenters from the Asia-Pacific region.

First, Yimeng Dai from University of Melbourne, Australia presented "Person Name Recognition with Fine-grained Annotation". They addressed person name recognition in the contexts of academic homepages, academic resumes, articles in online forums and social media, through an annotation scheme based on anthroponymy, and a neural network CogNN.

Next, Bingyao Pang from Zhejiang University, China presented "Chinese Calligraphy Character Image Recognition and Its Applications in Web and Wechat Applet Platform". Here, the authors have applied deep learning for the recognition of Chinese characters, and were able to achieve better recognition than previous algorithms, especially for cursive and running characters.

Finally, Yuchen Qian from Nanjing University, China presented "The ACL FWS-RC: A Dataset for Recognition and Classification of Sentence about Future Works". The authors have constructed a corpus of “Sentences about future work (FWS)” based on full-text in academic papers, and grouped them into 6 main categories – method, resources, evaluation, application, problem and other, and 17 sub-categories. They found that the “method” category had the highest number of FWS. Other categories had a minimal difference in their counts.

Search and Recommendation

This session presented four short papers, with all presenters from the Asia-Pacific region.

First, Ping Liu from Wuhan University, China presented "A Grounded Theory Approach for Modelling the Knowledge Construction Process in Exploratory Search".

Next, Xiao Hu from Hong Kong University, China presented "Personalized Book Recommendation to Young Readers: Two Online Prototypes and a Preliminary User Evaluation".

Next, Xiang Xue fromNanjing University, China presented "Interacting with Mobile Music Applications: Investigation of Influencing Factors of Music Information Encountering".

Finally, Jiaqi Chen from Beijing Normal University, China presented "A Comparative Study of the Relationship between the Subjective Difficulty, Objective Difficulty of Search Tasks and Search Behaviors". Here, the authors compared "subjective" and "objective" difficulty of IR tasks using different behavioral aspects. Participants performed 5 IR tasks at different difficulties, and answered questionnaires to quantize their subjective difficulty.

Network and Learning

This session presented four short papers, with all presenters from the Asia-Pacific region.

First, Chaocheng He from Wuhan University, China presented "Spatial Research Leadership Flows and Spatial Research Leadership Rank: A Case Study of Pharmaceutical Field". Here, authors proposed a network structure to determine the dominance/leadership of institutions in collaborative research, which is generally assumed to be homogeneous.

Next, Soumya Banerjee from National Digital Library of India and IIT Kharagpur, India presented "Segmenting Scientific Abstracts into Discourse Categories: A Deep Learning-Based Approach for Sparse Labeled Data". Here, authors performed Sequential Sentence Classification on the abstracts of research papers, and observed that abstracts followed the pattern of Background > Technique > Observation. Using a Bi-LSTM model yielded better results than the baselines they used. By adding noise and augmenting their data, they were able to increase their accuracy further.

Next, Chanathip Pornprasit from Mahidol University, Thailand presented "ConvCN: A CNN-Based Citation Network Embedding Algorithm Towards Citation Recommendation".

Finally, Manjira Sinha from IIT Kharagpur, India presented "Relation Aware Attention Model for Uncertainty Detection in Text".

Digital Libraries - 3

This session presented three full papers, with all presenters from the Europe region.

First, Ygor Gallina from LS2N & Université de Nantes, France presented "Large-Scale Evaluation of Keyphrase Extraction Models". They used nine datasets from the domains of Scientific Articles, Paper Abstracts, and News Articles, to evaluate their Keyphrase Extraction Models, and obtained interesting results.

Next, Tim Repke from University of Potsdam, Germany presented "Visualising Large Document Collections by Jointly Modeling Text and Network Structure". The authors have incorporated text and graphs to visualize semantic information and the relationships in their network structure. They introduced an algorithm based on multi-objective optimization to jointly position embedded documents and graph nodes in a 2D-landscape, and provided a live demonstration of it.

Finally, Madgalena Chudy from Polish Academy of Sciences, Poland Ewa Łukasik from Poznan University of Technology, Poland, and Tomasz Parkoła from Poznan Supercomputing and Networking Center, Poland, jointly presented "Digital Library Adaptation for Traditional Music and Content-Based Research: Polish Sound Archives and dLibra". They attempt to provide stable infrastructure and software solutions necessary to enable musicological research and, consequently, to open up traditional music resources for a larger group of users.

Neural Semantic Representation

This session presented three full papers, with all presenters from the Europe region.

First, Janus Wawrzinek from TU-Braunschweig, Germany presented "Explainable Word-Embeddings for Medical Digital Libraries – A Context-Aware Approach". Here, they've used Drug-Disease Association (DDA) as their context, and have introduced a 5 step methodology: 1) intermediate entity extraction, 2) embedded graph construction, 3) building explanation, 4) explanation ranking, and 5) metadata enrichment, to make their word embeddings explainable.

The next paper, "Mining Semantic Subspaces to Express Discipline-Specific Similarities" was also presented by Janus Wawrzinek. Here, authors propose extracting a semantic subspace from large embedding spaces that better fits the query semantics defined by a user. They used LASSO to visualize the most important features.

Finally, Andi Rexha from Know-Center GmbH, Austria presented "A Neural-based Architecture for Small Datasets Classification". Here, the authors proposed a neural-based architecture, based on BERT, for addressing the text classification problem on small datasets.

Tutorial 4

This session, titled "Writing about Data Science Research", was conducted by Dr. Kevin Cohen, from University of Colorado, USA and Dr. Daniel Gifu, from University of Iasi, Romania, and Romanian Academy - Iasi Branch, Romania. In this session, they presented a checklist for the results section of research papers.

The Results section answers the question that you raised in the Introduction section
Consider starting your paper by thinking about the form of the answer
Create faux tables/figures to see what our potential data looks like, so that we consider how we express our results while designing experiments. It can help us determine how to sample data/participants.
"It is not always obvious what an answer to your question would look like", he said. If there are too many independent variables in a graph, "the fact that it takes some time to think about this means that it is a pretty terrible graph".

He also made some interesting points on data visualization:

don’t rely upon truncated axes to help prove your point
if you have to truncate your axes for you to see the cool thing, there might not be a cool thing
don’t let your own graph fool you
think about how people will read it
last but not least, include axes labels.

Regarding tables, he suggested asking yourself the question, "is this table a reference source or does it tell your story?". He also pointed out that when keeping track of data, we should record the data's source and script name for a table/figure (e.g., as a LaTeX comment), and also to use a programming language rather than Excel to create visualizations, because the program itself is a good record of how you generated it.

Day 5

Keynote 3

The third and final keynote of JCDL 2020, "Natural Language Technologies for Internet Applications", was presented by Dr. Luo Si from Alibaba Group Inc. This keynote discussed how Natural Language Processing (NLP) and related technologies are critical for the success of many Internet applications such as digital libraries, e-commerce and customer service. It presented some recent research efforts and trends of four sets of NLP technologies for Internet applications. First, neural language model has been a very popular research direction in the last a few years that serves as the foundation of many NLP technologies and has significantly improved the performance of many applications; Second, machine translation techniques have been substantially advanced to better bridge the language barriers for many Internet applications; Third, the identification of inappropriate Internet text information (e.g., pornographic content) is challenging due to the diversified text representation; Fourth, machine reading comprehension has become an important question and answering technology to directly satisfy information needs of many Internet users. These technologies were discussed with examples from large-scale real-world applications.

Closing Ceremony

With the third and final keynote coming to an end, it was time to bid farewell to JCDL 2020. The closing ceremony began with Dr. Gary Marchionini from University of North Carolina, USA delivering the first closing remark. He highlighted points from the keynotes related to research maturity, and congratulated all young scholars who brought such compelling work into JCDL 2020. He thanked the organizing teams at Wuhan University, China and Northwest University, USA, and the program committee for making JCDL 2020 a success.

Next, the awards ceremony commenced, with the honorable mentions for the best Poster / Demo award at JCDL 2020.

Poster 316: "A Heuristic Baseline Method for Metadata Extraction from Scanned Electronic Theses and Dissertations", by Muntabir Choudhury, Jian Wu, William Ingram, and Edward Fox.
Poster 313: "Streaming Analytics and Workflow Automation for Dataset File System (DFS)", by Yasith Jayawardana, Sampath Jayarathna.
Poster 245: "Mathematical Formulae in Wikimedia Projects 2020", by Moritz Schubotz, Andre Greiner-Petter, Norman Meuschke, Olaf Teschke.

Here, two of the three honorable mentions (1 and 2) were from WSDL.

Next, the best Poster / Demo award was announced, which was Poster 317: "Analyzing the Effect of Reading Patterns using Eye Tracking Measures", Gavindya Jayawardena, Sampath Jayarathna, Jian Wu, also from WSDL at ODU.

Next, the Vannevar Bush Best Paper Award was granted to "Personalization Finder: A Search Interface for Identifying and Self-controlling Web Search Personalization", by Yusuke Yamamoto from Shizuoka University, Japan and Takehiro Yamamoto, from University of Japan, Japan.

Finally, Dr. Michael L. Nelson announced that Dr. J. Stephen Downie will be taking over his duties as the JCDL Steering Committee Chair. He acknowledged the great work Dr. Dan Wu has done to make JCDL 2020 a success. It was also announced that JCDL 2021 will be held virtually, but it's possible that there will be a hybrid approach.

The Web Archiving and Digital Libraries Workshop (WADL 2020) was held after the closing ceremony. With that, JCDL 2020 officially came to an end.

--Yasith (@yasithmilinda), Gavindya (@Gavindya2)

Search This Blog

Web Science and Digital Libraries Research Group

2020-08-16: Joint Conference on Digital Libraries (JCDL) 2020 Trip Report

Day 1

Welcome

Keynote 1

Day 2

Scholarly Communication

User in Search

Digital Libraries 1

Scholarly Knowledge 1

Document Classification

Minute Madness and Demo / Poster session

Natural Language Processing

Web Archive 1

Day 3

Scholarly Knowledge 2

Tutorial 2

Digital Libraries 2

Digital Humanities

Domain-Specific Applications

Keynote 2

Scholarly Data

Web Archive 2

Day 4

Content Annotation

Search and Recommendation

Network and Learning

Digital Libraries - 3

Neural Semantic Representation

Tutorial 4

Day 5

Keynote 3

Closing Ceremony

Comments

Post a Comment