2018-06-08: Joint Conference on Digital Libraries (JCDL) 2018 Trip Report
The gathering place at the Cattle Raisers Museum, Fort Worth, Texas |
This year's 18th ACM/IEEE Joint Conference on Digital Libraries Libraries (JCDL 2018) took place at the University of North Texas (Fort Worth, Texas). Between June 3-6, members of WSDL attended paper sessions, workshops, tutorials, panels, and a doctoral consortium.
The first day of the conference was dedicated to doctoral consortium, tutorials, and workshops. The doctoral consortium provided an opportunity for Ph.D. students in the early phases of their dissertation to present their thesis and research plans and receive constructive feedback. I will provide a link to the Doctoral Consortium blogpost when it becomes available.
We had a great #jcdl2018 / #wadl2018 / @jcdl2018! Here is the 2018 @WebSciDL reunion photo, but we're getting so large we weren't all there at once!— Michael L. Nelson (@phonedude_mln) June 7, 2018
Not pictured:
Incoming fac (Fall 2018): @OpenMaze @fanchyna
Alum: @johnaberlin
See you June 2-6, 2019 @iSchoolUI @JCDLConf pic.twitter.com/GmKpqKg0Zs
The theme of this year's conference was "From Data to Wisdom: Resilient Integration across Societies, Disciplines, and Systems." The conference provided researchers across multiple disciplines ranging from Digital Libraries and Web science research to Libraries and Information science, with the opportunity to communicate the findings of their research.#JCDL2018 seems like a good time to announce recent @WebSciDL comings and goings.— Michael L. Nelson (@phonedude_mln) June 4, 2018
Present at @jdl2018 we're happy to have:
* 2 faculty
* 2 incoming faculty (!)
* 6 students
* 3 alumni https://t.co/vjdxuSWFjd
Day 1 (June 3, 2018)
Day 2 (June 4, 2018)
#jcdl2018 stats. About 37% acceptance rate for full papers this year. Participation is increasing too. pic.twitter.com/H1vk3OB9KR— Gianmaria Silvello (@giansilv) June 4, 2018
The conference officially began on the second day with Dr. Jiangping Chen's introduction of the conference and the keynote speaker - Dr. Trevor Owens. Dr. Trevor Owens is a librarian, researcher and policy maker and the first head of Digital Content Management for library services at the Library of Congress. His talk was titled: "We have interesting problems."
— Trevor Owens 💾🗄🕚 (@tjowens) June 4, 2018
Great crowd for my #jcdl2018 keynote! Room set up feels a little like looking out at some sort of intergalactic senate. pic.twitter.com/UTlrmyuVAo— Trevor Owens 💾🗄🕚 (@tjowens) June 4, 2018
And @tjowens kicks off the #jcdl2018 opening keynote - appropriately titled “We Have Interesting Problems.” Really looking forward to this! pic.twitter.com/YWNAlHMjdf— Ian Milligan (@ianmilligan1) June 4, 2018
It started with a highlight of Ben Shneiderman's The New ABCs of Research which provides students with guidance on how to succeed in research, and provides senior researchers and policy makers on how to respond to new problems and apply new technologies. The new ABC's of research may be grossly summarized with two acronyms included in the book: ABC (Applied, Basic, and Combined) and SED (Science, Engineering, and Design).
— Mat Kelly (@machawk1) June 4, 2018
Additionally, he presented NDP@3, an IMLS framework
for investments in digital infrastructures for libraries. Also he presented multiple IMLS-funded projects such as: Image Analysis for Archival Discover (AIDA), which explores various ways to use millions of images representing the digitized cultural record.
.@tjowens giving a nice run-down of cool funded IMLS projects - a nice shoutout to the WASAPI project. W/o WASAPI we’d be having a devil of a time with @unleasharchives - great community development. #jcdl2018— Ian Milligan (@ianmilligan1) June 4, 2018
another @US_IMLS web archiving project: "Combining Social Media Storytelling with Web Archives" https://t.co/QLp3VP365m— Michael L. Nelson (@phonedude_mln) June 4, 2018
supported @yasmina_anwar @acnwala @shawnmjones
some products:https://t.co/IcTJCYNzhPhttps://t.co/pulO7jNJt5
+ many papers#JCDL2018 @tjowens @WebSciDL
Interested in how National Digital Platform and/or @librarycongress might help with distribution and discoverability of new forms of digital scholarship, esp SUP's Mellon-funded web-based digital publications. #jcdl2018 https://t.co/zmenxltYbw— Jasmine Mulliken (@jasminemulliken) June 4, 2018
Next he talked about some resources at the Library of Congress Labs such as:
- Library of Congress Colors: provides the capability of exploring the colors in the Library of Congress collections.
- LC for Robots: provides a list of APIs, data and tutorials for exploring the digital collections at the Library of Congress.
Let's talk about .@LC_Labs https://t.co/EmYLrqWwcf #jcdl2018 @tjowens @opba pic.twitter.com/AlzNBZ3TYL— Martin Klein (@mart1nkle1n) June 4, 2018
Now @tjowens has mentioned the work done at @LC_Labs by @opba @MeghaninMotion @JaimeMears @liblaura @blprnt @whaleandpetunia @chelseastieber #jcdl2018 https://t.co/3Zt12OzBjT and also the work of @kzwa pic.twitter.com/HIAcyb3Efx— Shawn M. Jones (@shawnmjones) June 4, 2018
Work done by @LC_Labs mentioned by @tjowens, check out @github repositories by @blprnt: https://t.co/1RbDDOmdtc and @liblaura: https://t.co/JKcQC8gUWQ #jcdl2018 pic.twitter.com/CpW7eTUVTr— Shawn M. Jones (@shawnmjones) June 4, 2018
Following the keynote were three concurrent paper sessions with the theme: Use, Collection Building, and Semantics & Linking. I will briefly describe the papers discussed in two paper sessions.
Paper session 1B (Day 2)
Myriam Traub (best paper nominee), a PhD student at Centrum Wiskunde & Informatica (CWI) presented a full paper titled: "Impact of Crowdsourcing OCR Improvements on Retrievability Bias." She discussed how crowd-sourced correction of OCR errors affects the retrievability of documents in a historic newspaper corpus in a digital library.
@MyriamCTraub on " Impact of Crowdsourcing OCR Improvements on Retrievability Bias"https://t.co/SWiesqhYps #jcdl2018 pic.twitter.com/ywpanphtl4— Martin Klein (@mart1nkle1n) June 4, 2018
Three short papers followed Traubs's presentation. First, Karen Harker, a Collection Assessment Librarian at the University of North Texas Libraries presented: "Applying the Analytic Hierarchy Process to an Institutional Repository Collection." She discussed the application of the Analytic Hierarchy Process (AHP) to create a model for evaluating collection development strategies of institutions. Second, Douglas Kennard presented: "Computer-Assisted Crowd Transcription of the U.S. Census with Personalized Assignments for Better Accuracy and Participation," where he introduced the Open Genealogy Data census transcription project that strives to make census data readily available to researchers and digital libraries. This was achieved through the use of automatic handwriting recognition to bootstrap their census database, and subsequent crowd-sourced correction of the data through a web interface. Finally, Mandy Neumann, a research associate at the Institute of Information Science at TH Köln presented: "Prioritizing and Scheduling Conferences for Metadata Harvesting in dblp." She explored different features for ranking conference candidates by using a pseudo-relevance assessment.Super cool & nuanced paper exploring experiments on effects for discoverability of crowdsourcing fixes for OCR errors in digitized historical collections #JCDL2018 https://t.co/TxwEkBFF5M— Trevor Owens 💾🗄🕚 (@tjowens) June 4, 2018
Next up Karen Harker on— Martin Klein (@mart1nkle1n) June 4, 2018
"Applying the Analytic Hierarchy Process to an Institutional Repository Collection"https://t.co/L797rqtpCX #jcdl2018 pic.twitter.com/mGzUmt3bVl
Douglas J. Kennard aren't these doctors' prescriptions? ;-) "Computer-Assisted Crowd Transcription of the U.S. Census with Personalized Assignments for Better Accuracy and Participation" at #jcdl2018 pic.twitter.com/pfiQFaIe1D— Sawood Alam (@ibnesayeed) June 4, 2018
@protestreich on "Prioritizing and Scheduling Conferences for Metadata Harvesting in DBLP"https://t.co/HKcslsrxbd #jcdl2018 pic.twitter.com/viXLzkc4IW— Martin Klein (@mart1nkle1n) June 4, 2018
Paper session 1C (Day 2)
Dr. Federico Nanni (best paper nominee), a postdoctoral researcher at the Data and Web Science Group at the University of Mannheim presented the first of three full papers titled: "Entity-Aspect Linking: Providing Fine-Grained Semantics of Entities in Context," in which he introduced a method for obtaining specific descriptions of entities in text by retrieving the most related section from Wikipedia.
Onto the first paper session at #jcdl2018, @f_nanni et al.'s Entity-Aspect Linking: Providing Fine-Grained Semantics of Entities in Context. pic.twitter.com/SnK8BTk2ON— Mat Kelly (@machawk1) June 4, 2018
First up in the “Semantics and Linking” #jcdl2018 session is @f_nanni on “Entity-Aspect Linking: Providing Fine-Grained Semantics of Entities in Context.”— Ian Milligan (@ianmilligan1) June 4, 2018
Next, Gary Munnelly, a PhD student at the School of Computer Science and Statistics (SCSS) at Trinity College Dublin presented: "Investigating Entity Linking in Early English Legal Documents," discussing the effectiveness of different entity linking systems for the task of disambiguating named entities in 17th century depositions obtained during the 1641 Irish rebellion.
Finally, Dr. Ahmed Tayeh presented: "An Analysis of Cross-Document Linking Mechanisms," where he discussed different strategies for linking or associating information across physical and digital documents. The titles of other papers presented in a parallel session (1A) include:Gary Munnelly is talking about using @dbpedia as a source for information on entity linking for studying text, he identifies the problems of evolving entities, emerging entities, and more as part of "Investigating Entity Linking in Early English Legal Documents" #jcdl2018 pic.twitter.com/sZ6whIDS4e— Shawn M. Jones (@shawnmjones) June 4, 2018
- Understanding the Position of Information Professionals with regards to Linked Data: A Survey of Libraries, Archives, and Museums - Shengli Deng et al.
- The role of pre-existing highlights in reader–text interactions and outcomes - Samuel Dodson et al.
- Evaluating Saccade-Bounded Eye Movement Features for the User Interest Modeling - Sampath Jayarathna et al.
- Interaction on an Academic Social Networking Sites: A Study of ResearchGate Q&A on Library and Information Science - Lucy McKenna et al.
#jcdl2018 Ahmed Tayeh presents "An Analysis of Cross-Document Linking Mechanisms" - some users use annotations to link information, other users use physical/digital folders to link information - Tayeh & Beat Signer are creating linking tools to support linking across documents pic.twitter.com/9iEcFHGp0L— Shawn M. Jones (@shawnmjones) June 4, 2018
Open Cross-Document Linking Service Based on a Plug-in Architecture from Ahmed Tayeh
Two full papers were presented after a break. The first was titled: "Putting Dates on the Map: Harvesting and Analyzing Street Names with Date Mentions and their Explanations," was presented by Rosita Andrade. She presented her research about the automated analysis of street names with date references around the world, and showed that "temporal streets" are frequently used to commemorate important events such as a political change in a country.
Paper session 2A (Day 2)
Two full papers were presented after a break. The first was titled: "Putting Dates on the Map: Harvesting and Analyzing Street Names with Date Mentions and their Explanations," was presented by Rosita Andrade. She presented her research about the automated analysis of street names with date references around the world, and showed that "temporal streets" are frequently used to commemorate important events such as a political change in a country.
#jcdl2018 has resumed again from lunch with Rosita Andrade presenting "Putting Dates on the Map: Harvesting and Analyzing Street Names with Date Mentions and their Explanations". pic.twitter.com/8EVH9gDumW— Mat Kelly (@machawk1) June 4, 2018
#jcdl2018 Rosita Andrade presents "Putting Dates on the Map: Harvesting and Analyzing Street Names with Date Mentions and their Explanations" pic.twitter.com/deqeIIdUSz— Shawn M. Jones (@shawnmjones) June 4, 2018
#jcdl2018 A temporal analysis of "temporal streets" - streets with date expressions in their name - has taught me a new term, in addition to temponyms there are also toponyms? pic.twitter.com/U92phMcGD8— Shawn M. Jones (@shawnmjones) June 4, 2018
#jcdl2018 As part of "Putting Dates on the Map", Rosita Andrade used HeidelTime https://t.co/ibnk5CaS1X a temporal tagger that was introduced to me by @jannikstroetgen at #www2016 pic.twitter.com/SBRb55SbK4— Shawn M. Jones (@shawnmjones) June 4, 2018
Rosita Andrade mentioned the website for temporal street information: https://t.co/nzvzSYJVpc #jcdl2018— Shawn M. Jones (@shawnmjones) June 4, 2018
Next, Dr. Philipp Mayr, a deputy department head and a team leader at the GESIS department Knowledge Technologies for the Social Sciences presented: "Contextualised Browsing in a Digital Library's Living Lab." He presented two approaches that contextualize browsing in a digital library. The first approached is based on document similarity and the second utilizes implicit session information (e.g., queries and document metadata from sessions of users).
@Philipp_Mayr presenting ‘’Contextualized Browsing’’ in #DigitalLibraries to address shortcomings of keyword-based search #jcdl2018 @jcdl2018 pic.twitter.com/UrY1qWxZ9W— Corinna Breitinger (@BreitingerC) June 4, 2018
Now @Philipp_Mayr on contextualized browsing in a DL's living lab. #jcdl2018 pic.twitter.com/25EATnkLS1— Mandy (@protestreich) June 4, 2018
.@Philipp_Mayr compared non-contextualized browsing just using query expansion with synonyms & translations with those using similarity measures and again with searches that used session-based contextualized browsing #jcdl2018 pic.twitter.com/fs7w96fhUt— Shawn M. Jones (@shawnmjones) June 4, 2018
Paper session 3A (Day 2)
Three concurrent paper sessions followed Dr. Phillip Mayr's presentation. Dr. Dominika Tkaczyk, a researcher and a data scientist at the Applied Data Analysis Lab at the University of Warsaw (Poland) presented: "Machine Learning vs. Rules and Out-of-the-Box vs. Retrained: An Evaluation of Open-Source Bibliographic Reference and Citation Parsers," in which she presented the results of the comparison of different methods for parsing scholarly article references.
Dominika Tkaczyk shows the state of the art in scholarly article reference parsinghttps://t.co/4oRRsnmiMo #jcdl2018 pic.twitter.com/JwXy1YKwpU— Martin Klein (@mart1nkle1n) June 4, 2018
Dominika Tkaczyk et al. find that GROBID performs pretty well before *and* after retraininghttps://t.co/wWCbkahkXK #jcdl2018 pic.twitter.com/DGFTPwYsdV— Martin Klein (@mart1nkle1n) June 4, 2018
Anne Lauscher, a PhD student at the University of Mannheim presented: "Linked Open Citation Database: Enabling Libraries to Contribute to an Open and Interconnected Citation Graph." She presented the current state of the workflow and implementation of the Linked Open Citation Database project, which is a distributed infrastructure based on linked data technology for efficiently cataloging citations in libraries.
this much #jcdl2018 pic.twitter.com/SvRqKtbZuq— Martin Klein (@mart1nkle1n) June 4, 2018
Paper session 3C (Day 2)
Norman Meuschke, a PhD student at the University of Konstanz, presented: "An Adaptive Image-based Plagiarism Detection Approach," in which he discussed his analysis of images in academic documents to detect disguised forms of plagiarism with approaches such as perceptual hashing, ratio hashing and position-aware OCR text matching.
An Adaptive Image-based Plagiarism Detection Approach by Norman Meuschke from Research Group Gipp (University of Konstanz)
#jcdl2018 @normeu is presenting "An Adaptive Image-based Plagiarism Detection Approach" starting with an overview of the forms of plagiarism & detection methods: copy & paste, shake & paste, technical disguise, paraphrasing, structural & idea plagiarism, and cross-lang plagiarism pic.twitter.com/6gYHcPsx7o— Shawn M. Jones (@shawnmjones) June 4, 2018
.@normeu mentions that images contain a lot of semantic information, visualizations and figures contain a lot of core information for a paper, and that there is a research gap here #jcdl2018 pic.twitter.com/Qt6Q3AG9mu— Shawn M. Jones (@shawnmjones) June 4, 2018
Tesseract OCR software https://t.co/5r7FSOiKJ9 was mentioned by @normeu as a method of detecting text in images, but it is not perfect, hence they use a method of position-aware text matching to account for OCR errors #jcdl2018 pic.twitter.com/4sNIB1S2ac— Shawn M. Jones (@shawnmjones) June 4, 2018
#PerceptualHashing technique utilized by @normeu at #JCDL2018 could be useful in establishing #ArchivalFixity, relevant to @WebSciDL @maturban1's work. pic.twitter.com/IC7JXh1pXl— Sawood Alam (@ibnesayeed) June 4, 2018
Hisham Benotman presented his work: "Extending Multiple Diagram Navigation with Internal Diagram And Collection Connections." He discussed his work about extending Multiple diagram navigation (MDN) such that diagram-to-content queries reach related collection documents not directly connected to the diagrams.
#jcdl2018 Hisham Benotman is presenting "Extending Multiple Diagram Navigation with Internal Diagram And Collection Connections" pic.twitter.com/swGQh6c5gR— Shawn M. Jones (@shawnmjones) June 4, 2018
Other papers presented in a parallel session (3B) include:
- Building a Theoretical Framework for the Development of Digital Scholarship Services in China's Universities - Fang Zhang et al.
- Formula Ranking within an Article - Ke Yuan
- Ranking Scientific Papers and Venues in Heterogeneous Academic Networks by Mutual Reinforcement - Fang Zhang et al.
Minute madness followed the paper sessions. The minute madness was an activity in which poster presenters were given 1 minute to advertise their respective posters to the conference attendees. The poster session began after the minute madness.
Graduate students lining up to present their research in one minute or less. #MinuteMadness #JCDL2018 pic.twitter.com/V7QCrpGo6E— JoAnn Livingston (@JoAnnLivingston) June 4, 2018
— Martin Klein (@mart1nkle1n) June 4, 2018
Now for my favourite part of #jcdl2018 - the “minute madness.” Dozens of presenters, one minute each, exhorting us to visit their posters. pic.twitter.com/d2wJVGgrUT— Ian Milligan (@ianmilligan1) June 4, 2018
— JoAnn Livingston (@JoAnnLivingston) June 4, 2018
I think this slide wins for most fun slide to share with no context #JCDL2018 pic.twitter.com/Riw1XcNtIM— Trevor Owens 💾🗄🕚 (@tjowens) June 4, 2018
Another contender! #JCDL2018 pic.twitter.com/LWlV8FpnvZ— Trevor Owens 💾🗄🕚 (@tjowens) June 4, 2018
I think @ibnesayeed is trying to start a new viral #webarchiving hashtag: #GiveOurToolbarsBack! #JCDL2018 pic.twitter.com/kv8ib2aWlo— Ian Milligan (@ianmilligan1) June 4, 2018
Felix Hamborg shares research from @BelaGipp’s group on the extraction of Main Event Descriptors from News Articles #jcdl2018 #minutemadness pic.twitter.com/mlcV3pXDwu— Corinna Breitinger (@BreitingerC) June 4, 2018
#UNTResearch on a roll here with #MinuteMadness at #JCDL2018 #GoMeanGreen pic.twitter.com/e7kiRFJmZ8— JoAnn Livingston (@JoAnnLivingston) June 4, 2018
And now the up close and personal poster presentations #JCDL2018 pic.twitter.com/sND1xTkmBZ— JoAnn Livingston (@JoAnnLivingston) June 4, 2018
@maturban1’s poster “ArchiveNow: Simplified, Extensible, Multi-Archive Preservation.” #JCDL2018 pic.twitter.com/y76YJ5APZR— Hany Alsalmi (@HanyAlsalmi) June 5, 2018
A busy poster session #JCDL2018 pic.twitter.com/yfSFvN2snQ— Hany Alsalmi (@HanyAlsalmi) June 5, 2018
— Martin Klein (@mart1nkle1n) June 5, 2018
Dr. Ali Shiri’s poster #JCDL2018 pic.twitter.com/A5q2rfcIhq— Hany Alsalmi (@HanyAlsalmi) June 5, 2018
— Michele Whitehead (@WhiteheadML) June 5, 2018
Day 3 (June 5, 2018)
Day 3 of the conference began with Dr. Niall Gaffney's keynote. Dr. Niall Gaffney is an Astronomer and Director of Data Intensive Computing at the Texas Advanced Computing Center (TACC). He started by emphasizing the importance of scientific reproducibility before moving on to show some of the projects supported by the computational machinery at TACC such as Firefly.
Dr. Niall Gaffney emphasizing the importance of scientific reproducibility (keynote JCDL 2018)#jcdl2018 pic.twitter.com/UKVoDrjwLg— Alexander C. Nwala (@acnwala) June 5, 2018
Progress in Research vs. Reproducibility. Is it really tit for tat? #jcdl2018 keynote @de_Niled pic.twitter.com/ILxTpdqIXr— Min-Yen Kan (@knmnyn) June 5, 2018
Texas Super Computing @jcdl2018 #JCDL2018 presented by Niall Gaffney #stampede2 pic.twitter.com/TrcdFiWINY— Philipp Mayr (@Philipp_Mayr) June 5, 2018
. @de_Niled reminds me of @phonedude_mln‘s admonition to our @WebSciDL students: “No magic laptops!” #jcdl2018 pic.twitter.com/5RmBS1wajL— Michele Weigle (@weiglemc) June 5, 2018
Two concurrent paper sessions followed a short break.
Paper session 4A (Day 3)
Dr. Gianmaria Silvello, an assistant professor at the Department of Information Engineering of the University of Padua presented a full paper titled: "Evaluation of Conformance Checkers for Long-Term Preservation of Multimedia Documents." He discussed his project about the development of an evaluation framework for validating the conformance of long-term preservation by assessing correctness, usability and usefulness.
#jcdl2018 session 4 has begun with @giansilv presenting "Evaluation of Conformance Checkers for Long-Term Preservation of Multimedia Documents". pic.twitter.com/tJWAMwZcLv— Mat Kelly (@machawk1) June 5, 2018
Next, Dr. Pavlos Fafalios a researcher at L3S Research Center in Germany presented a full paper titled: "Ranking Archived Documents for Structured Queries on Semantic Layers," in which he proposed two ranking models that rank archived documents and considers the similarity of documents to entities, timeliness of documents, and the temporal relations between the entities.
Kudos to @pavlos098 et al. for making their evaluation data set from their #jcdl2018 presentation "Ranking Archived Documents for Structured Queries on Semantic Layers" publicly available https://t.co/sYqfmzw1vu pic.twitter.com/2E7QAtQD3b— Mat Kelly (@machawk1) June 5, 2018
The slides of my presentation "Ranking Archived Documents for Structured Queries on Semantic Layers" are available at https://t.co/8rYD7EVCPP— Pavlos Fafalios (@pavlos098) June 5, 2018
and the paper at https://t.co/ZkxP7TeDTX
#jcdl2018
The final paper presented (not by an author of the paper) in this session was a short paper titled: "Modeling Author Contribution Rate With Blockchain." Three concurrent paper sessions (all full papers) followed after break.
Florian Mai, a graduate student at Kiel University in Germany was the first presenter of the paper session on Text Collections. He presented a full paper titled: "Using Deep Learning for Title-Based Semantic Subject Indexing to Reach Competitive Performance to Full-Text," in which he presented the findings from investigating how deep learning models obtained from training on titles compare to deep learning models obtained from training on full-texts.
Omar Alonso (best paper nominee) presented a full paper titled: "How it Happened: Discovering and Archiving the Evolution of a Story Using Social Signals." He introduced a method of showing the evolution of stories from the perspective of social media users as well as the articles that include social media as supporting evidence.
Noah Siegel a researcher at the Allen Institute for Artificial Intelligence presented a full paper titled: "Extracting Scientific Figures with Distantly Supervised Neural Networks," where he introduced a system of extracting figures from large number of scientific documents without human intervention.
Next, Yuta Kobayashi presented a paper titled: "Citation Recommendation Using Distributed Representation of Discourse Facets in Scientific Articles," presenting the effectiveness of using facets of scientific articles such as "objective," "method," and "result" for citation recommendation by learning a multi-vector representation of scientific articles, in which each vector represents a facet in the article.
Catherine Marshall, an adjunct professor at Texas A&M University presented: "Biography, Ephemera, and the Future of Social Media Archiving." She presented her finding from answering the following question: "Will the addition of new digital sources such as records repositories, digital libraries, social media, and collections of ephemera change biographical research practices?" She demonstrated how new digital resources unravel a subject's social network, thus exposing biographical information formerly invisible.
The last paper session on Topic Modeling and Detection consisted of three full papers. First, Julian Risch (best paper nominee), a PhD student at Hasso-Plattner Institute (Germany) presented: "My Approach = Your Apparatus? Entropy-Based Topic Modeling on Multiple Domain-Specific Text Collections." He presented a topic model combined with automatic domain term extraction and phrase segmentation that distinguishes collection-specific and collection-independent words based on information entropy.
A dinner at the Fort Worth Museum of Science and History followed after a break. The best poster award was presented to Mohamed Aturban, a fellow PhD student at Old Dominion University and member of WSDL for this poster "ArchiveNow: Simplified, Extensible, Multi-Archive Preservation."
Paper session 4B (Day 3)
Florian Mai, a graduate student at Kiel University in Germany was the first presenter of the paper session on Text Collections. He presented a full paper titled: "Using Deep Learning for Title-Based Semantic Subject Indexing to Reach Competitive Performance to Full-Text," in which he presented the findings from investigating how deep learning models obtained from training on titles compare to deep learning models obtained from training on full-texts.
Using deep learning for Title-based semantic subject indexing to reach competitive performance to full text— Giorgio Saez (@Giosd) June 5, 2018
With @_florianmai @jcdl2018 #jcdl2018 pic.twitter.com/D5115YP3IC
.@_florianmai - asks can the best title-based method outperform the best full text method for subject indexing? - he notes that his results are positive for his EconBiz dataset, but not for his PubMed dataset #jcdl2018 pic.twitter.com/eENpfhZvwk— Shawn M. Jones (@shawnmjones) June 5, 2018
#jcdl2018 Code for "Using Deep Learning For Title-Based Semantic Subject Indexing To Reach Competitive Performance to Full Text" by @_florianmai is available at https://t.co/1FqzfJ5KPk— Shawn M. Jones (@shawnmjones) June 5, 2018
— Giorgio Saez (@Giosd) June 5, 2018
Next, Chris Holstrom, a PhD student from the Information School at the University of Washington presented a short paper: "Social Tagging: Organic and Retroactive Folksonomies," in which he showed that tags on MetaFilter and AskMetaFilter follow a power law distribution and retroactive taggers do not use "organization" tags like professional indexers.
Chris Holstrom is presenting "Social Tagging: Organic and Retroactive Folksonomies" - organic tags are produced while users are producing the posts and retroactive tags are applied afterward - he evaluated content on https://t.co/fNpljfMguY and https://t.co/9RI8hOTrGN #jcdl2018 pic.twitter.com/L9XMtpCm2u— Shawn M. Jones (@shawnmjones) June 5, 2018
#jcdl2018 Chris Holstrom: Do tags fit a power law distribution? Yes. Do retroactive taggers use "organization" tags like professional indexers? No. Do retroactive taggers use preferred terms and avoid synonyms which are common in folksonomies? No, they add more synonyms. pic.twitter.com/pvzps4TSlM— Shawn M. Jones (@shawnmjones) June 5, 2018
Next, Jens Willkomm, a PhD student at the Karlsruhe Institute of Technology in Germany, presented a full paper titled: "A Query Algebra for Temporal Text Corpora." He proposed a novel query algebra for accessing and analyzing words in large text corpora.Social tagging: Organic and retroactive Folksonomies (Chris Holstrom) @jcdl2018 #jcdl2018 @UW_iSchool @UNTCOI pic.twitter.com/5mGHWf3adL— Giorgio Saez (@Giosd) June 5, 2018
Jens Willkomm from #KIT presenting his interdisciplinary work on a query algebra for temporal text corpora #jcdl2018 pic.twitter.com/qnPCfr1bpR— Susanne Putze (@s_putze) June 5, 2018
Jens Willkomm presents "A Query Algebra for Temporal Text Corpora" - joint work with philosophers - if you could read 1 book per day, then you can read 365 books a year - there are many many books to get through - a query language may help #jcdl2018 pic.twitter.com/hy4PHjTQYP— Shawn M. Jones (@shawnmjones) June 5, 2018
Nice conclusion: Read and cite our paper! #jcdl2018 pic.twitter.com/MLku0nF2K2— Susanne Putze (@s_putze) June 5, 2018
Paper session 5A (Day 3)
Omar Alonso (best paper nominee) presented a full paper titled: "How it Happened: Discovering and Archiving the Evolution of a Story Using Social Signals." He introduced a method of showing the evolution of stories from the perspective of social media users as well as the articles that include social media as supporting evidence.
Tobias Backes a researcher at Gesis presented his paper titled: "Keep it Simple: Effective Unsupervised Author Disambiguation with Relative Frequencies." He addressed the problem of author name homonymy in the Web Science domain by proposing a novel probabilistic similarity measure for author name disambiguation based on feature overlap.How it happened: Discovering and archiving the evolution of a story using social signals. (Omar Alonso) @elunca @jcdl2018 #jcdl2018 #Microsoft pic.twitter.com/pJnsrvKPQe— Giorgio Saez (@Giosd) June 5, 2018
The last paper (best paper nominee) presented in this session was titled: "Digital History meets Microblogging: Analyzing Collective Memories in Twitter."Talk about author name disambiguation by Tobias Backes from @gesis_org #jcdl2018 pic.twitter.com/OUxiCMZW9c— Kai Eckert (@kaiec) June 5, 2018
Paper session 5B (Day 3)
Noah Siegel a researcher at the Allen Institute for Artificial Intelligence presented a full paper titled: "Extracting Scientific Figures with Distantly Supervised Neural Networks," where he introduced a system of extracting figures from large number of scientific documents without human intervention.
Next, André Greiner-Petter presented his full paper titled: "Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context." He presented a new approach for mathematical format conversion that utilizes textual information to reduced error rate. Additionally, he evaluated state-of-the art tools for mathematical conversions and provided a public manually-created gold standard dataset for mathematical format conversion.Extracting Scientific Figures with Distantly Supervised Neural Networks https://t.co/taDFsZRNL2 presented by Noah Siegel @allenai_org #JCDL2018— Philipp Mayr (@Philipp_Mayr) June 5, 2018
@GreinerPetter presenting @physikerwelt and @BelaGipp’s group research on ''Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Content'' at #jcdl2018 pic.twitter.com/xkM3fuYcbU— Corinna Breitinger (@BreitingerC) June 5, 2018
Next, Yuta Kobayashi presented a paper titled: "Citation Recommendation Using Distributed Representation of Discourse Facets in Scientific Articles," presenting the effectiveness of using facets of scientific articles such as "objective," "method," and "result" for citation recommendation by learning a multi-vector representation of scientific articles, in which each vector represents a facet in the article.
Paper session 5C (Day 3)
Catherine Marshall, an adjunct professor at Texas A&M University presented: "Biography, Ephemera, and the Future of Social Media Archiving." She presented her finding from answering the following question: "Will the addition of new digital sources such as records repositories, digital libraries, social media, and collections of ephemera change biographical research practices?" She demonstrated how new digital resources unravel a subject's social network, thus exposing biographical information formerly invisible.
Next, I presented our full paper titled: "Scraping SERPs for Archival Seeds: It Matters When You Start" on behalf of co-authors Dr. Michele Weigle and Dr. Michael Nelson. In my presentation, first, I highlighted the importance of web archive collections for studying important historical events ranging from elections to disease outbreaks. Next, I showed that search engines (specifically Google) can be used to generate seeds. Finally, I showed that it becomes harder to find the older URLs of news stories over time, so seed generators that utilize search engines should begin early and persist to capture the evolution of an event."Biography, emphera,and the future of social media archiving" @ccmarshall #jcdl2018 pic.twitter.com/VrpuXcre7W— Michael L. Nelson (@phonedude_mln) June 5, 2018
.@acnwala presenting "Scraping SERPs for Archival Seeds: It Matters When You Start" https://t.co/AZeJfohcaA #jcdl2018 pic.twitter.com/coQ99URUIV— Michael L. Nelson (@phonedude_mln) June 5, 2018
.@acnwala:"we don't have enough curators to capture seeds for all events", @internetarchive @archiveitorg often send out requests for seeds from volunteers. Collection building often begins with a search-can we use search engine result pages to help find seeds as well? #jcdl2018 pic.twitter.com/WgB289Hfll— Shawn M. Jones (@shawnmjones) June 5, 2018
In "Scraping SERPs for archival seeds: it matters when you start" @acnwala details how one can scrape search engine result pages (SERPs) to find seeds for use in web archive collections #preprint here: https://t.co/OB4j39rwsR #jcdl2018 pic.twitter.com/Hya5PH5h4n— Shawn M. Jones (@shawnmjones) June 5, 2018
Alexander Nwala @acnwala amazing to see how quickly Trump-Russia stories disappear after they show up in the news #JCDL2018 – crazy dynamics pic.twitter.com/RJmVV31g2X— Mike Hucka (@mhucka) June 5, 2018
This is cool from @acnwala - how search results move up and down various Google search result pages (or SERPs). Some persisting for dozens of days; others from page 1 to page 5; or even back from page 5 to 1 etc etc. Nice viz too. #JCDL2018 pic.twitter.com/IIuCRZIn2F— Ian Milligan (@ianmilligan1) June 5, 2018
Next, Mat Kelly (best paper nominee), a fellow PhD student at Old Dominion University and member of WSDL presented his full paper titled: "A Framework for Aggregating Private and Public Web Archives." He showed his framework that provides a means of combining public web archive captures and private web captures (e.g., banking and social media information) without compromising sensitive information included in the private captures. This work utilizes Sawood Alams's Memgator, a Memento aggregator that supports multiple serialization formats such as Link, JSON, and CDXJ..@WebSciDL resources for: Scraping SERPs for Archival Seeds: It Matters When You Start tech report— Alexander C. Nwala (@acnwala) June 5, 2018
Tech report: https://t.co/2uvwKd7ykc
Slides: https://t.co/qFzZMftWQ9
151,602 URI from 7 months: https://t.co/0uUFp2QqBn
App. for Scraping Google: https://t.co/iBf4kHekNY#jcdl2018
Great slides by @machawk1 as he walks us through personas working with their own personal web archives, supplemented by other personal and public collections. #jcdl2018 pic.twitter.com/qTJMXNdWTu— Ian Milligan (@ianmilligan1) June 5, 2018
.@machawk1 from @WebSciDL shows that, in additional to customizing the results from aggregators, memento meta aggregators can allow access to private web archives #jcdl2018 pic.twitter.com/f41GquaCAm— Shawn M. Jones (@shawnmjones) June 5, 2018
Paper session 6A (Day 3)
The last paper session on Topic Modeling and Detection consisted of three full papers. First, Julian Risch (best paper nominee), a PhD student at Hasso-Plattner Institute (Germany) presented: "My Approach = Your Apparatus? Entropy-Based Topic Modeling on Multiple Domain-Specific Text Collections." He presented a topic model combined with automatic domain term extraction and phrase segmentation that distinguishes collection-specific and collection-independent words based on information entropy.
#jcdl2018 Julian Risch presents "My Approach = Your Apparatus? Entropy-Based Topic Modeling on Multiple Domain-Specific Text Collections" pic.twitter.com/pr4PYqslA3— Shawn M. Jones (@shawnmjones) June 5, 2018
Next, Dr. Ralf Krestel, the head of Web Science Research Group & Senior Researcher at Hasso-Plattner Institute (Germany) presented his full paper titled: "WELDA: Enhancing Topic Models by Incorporating Local Word Context." He proposed a new topic model called WELDA that combines word embeddings (WE) and Latent Dirichlet Allocation (LDA).#jcdl2018 Thanks to Julian Risch for introducing Cross-collection LDA (ccLDA), a topic modeling technique I was not familiar with pic.twitter.com/JE6mUx8dC3— Shawn M. Jones (@shawnmjones) June 5, 2018
#jcdl2018 Ralf Krestel presents "WELDA: Enhancing Topic Models by Incorporating Local Word Context" : "You shall know a word by the company it keeps" - FIRTH, 1957 pic.twitter.com/s4dREHg6cY— Shawn M. Jones (@shawnmjones) June 5, 2018
Something one can do with word embeddings that cannot be done with topic models: word analogies - Ralf Krestel #jcdl2018 pic.twitter.com/fIDxR0L726— Shawn M. Jones (@shawnmjones) June 5, 2018
Ralf Krestel: improving topic models by leveraging word embeddings: WELDA #jcdl2018 pic.twitter.com/EGClhiD9lO— Shawn M. Jones (@shawnmjones) June 5, 2018
Ralf Krestel mentions that WELDA adds new steps to the Gibbs sampling algorithm for topic modeling #jcdl2018 pic.twitter.com/hAR5zN9Igo— Shawn M. Jones (@shawnmjones) June 5, 2018
Finally, Angelo Salatino, a PhD student at the Knowledge Media Institute (UK) presented a full paper titled: "AUGUR: Forecasting the Emergence of New Research Topics." He introduced AUGUR, which is a new approach for the early detection of research topics in order to help stakeholders such as universities, institutional funding bodies, academic publishers and companies recognize new research trends.#jcdl2018 Ralf Krestel provides some answers to "How do we evaluate topic models?", such as "topic coherence" - reminds me of Chang's "Reading Tea Leaves: How Humans Interpret Topic Models" https://t.co/Wt4IJrttB8 pic.twitter.com/S9UY294e1D— Shawn M. Jones (@shawnmjones) June 5, 2018
Excited to see @angelosalatino’s @JCDL2018 presentation on forecasting the emergence of research topics! Paper here: https://t.co/4NpCzNl4jv @kmiou #JCDL2018 #ScholarlyCommunication #DigitalLibraries @skm3ou pic.twitter.com/hyuELic1BN— Dasha Herrmannova (@robodasha) June 5, 2018
#jcdl2018 @angelosalatino presents "AUGUR: Forecasting the Emergence of New Research Topics" https://t.co/NY1I6FgLUf - I met Angelo at #www2016 where he was exploring emerging topics in this paper https://t.co/w1gxKId6pp pic.twitter.com/DEjRiQ4U81— Shawn M. Jones (@shawnmjones) June 5, 2018
To look at emerging work @angelosalatino used Scopus and the Computer Science Ontology portal: https://t.co/9SemqzM7in #jcdl2018 pic.twitter.com/0755KjWbiM— Shawn M. Jones (@shawnmjones) June 5, 2018
It never ceases to amaze me how important graphs are to #computerscience - @angelosalatino modeled topics as a graph, and used a community detection algorithm to detect cliques and find topic communities #jcdl2018 pic.twitter.com/ll9HFFgFUF— Shawn M. Jones (@shawnmjones) June 5, 2018
#jcdl2018 Prior work by @angelosalatino on the emergence of topics related to his AUGUR work presented @jcdl2018: "How are topics born? Understanding the research dynamics preceding the emergence of new areas" : https://t.co/fmWBHGJw7x— Shawn M. Jones (@shawnmjones) June 5, 2018
A dinner at the Fort Worth Museum of Science and History followed after a break. The best poster award was presented to Mohamed Aturban, a fellow PhD student at Old Dominion University and member of WSDL for this poster "ArchiveNow: Simplified, Extensible, Multi-Archive Preservation."
— Michael L. Nelson (@phonedude_mln) June 5, 2018Dr. Federico Nanni (Providing Fine-Grained Semantics of Entities in Context) and Myriam Traub (Impact of Crowdsourcing OCR Improvements on Retrievability Bias) tied for the Vannevar Bush best paper awards. Myriam Traub also won the best student paper award.
Congrats to @MyriamCTraub and @f_nanni -- they split the Vannevar Bush best paper award, and @MyriamCTraub won the best student paper award! #jcdl2018 pic.twitter.com/n48sjXAJEW— Michael L. Nelson (@phonedude_mln) June 5, 2018
— Kai Eckert (@kaiec) June 5, 2018
Day 4 (June 6, 2018)
Day 4 began with a keynote from Dr. Carly Strasser, director of Strategic Development for the Collaborative Knowledge Foundation. Her keynote "Open Source Tech for Scholarly Communication: Why It Matters," illustrated the problems in the submission, production and delivery of scholarly communication. She talked about the problem of the disjoint nature (silos) of the various stages of scholarly communication, as well as the expensive delivery, slow production, static and less interoperable output.
Excited about @carlystrasser 's keynote titled "Open Source Tech for Scholarly Communication: Why It Matters." #jcdl2018 pic.twitter.com/mg695f2H5R— Martin Klein (@mart1nkle1n) June 6, 2018
#jcdl2018 @carlystrasser gives this morning's keynote: "Open Source Tech for Scholarly Communication: Why It Matters." pic.twitter.com/MosOOlgly5— Shawn M. Jones (@shawnmjones) June 6, 2018
#jcdl2018 @carlystrasser mentions the "chaos of the research process" and what is not necessarily being captured, with a focus on publications even though there are many other products of the scholarly process #scholarlycommunication pic.twitter.com/V29h9P0HD7— Shawn M. Jones (@shawnmjones) June 6, 2018
.@carlystrasser shared work of @olihb mapping scholarly collaboration https://t.co/YRFISJ6irn pic.twitter.com/88wyr30hlr— Shawn M. Jones (@shawnmjones) June 6, 2018
#jcdl2018 @carlystrasser shares Jennifer Lin's work of the "article nexus": https://t.co/FkEABBJCeQ pic.twitter.com/eJI0fgrHfD— Shawn M. Jones (@shawnmjones) June 6, 2018
She also presented a vision of scholarly communication that consists of living documents that link to open source code and data, a cheaper delivery system, faster production and more interoperable and dynamic output. Additionally, she talked about the organizations working to achieve various aspects of this vision.
.@carlystrasser gives a plug for @force11rescomm in October as well as other groups like @PLOS @ProjectJupyter @datadryad @biorxivpreprint and many others that are thinking about the future of research #jcdl2018 pic.twitter.com/chSIIBWBZX— Shawn M. Jones (@shawnmjones) June 6, 2018
#jcdl2018 @carlystrasser lists #opensource software successes, best when collaborative and community-driven pic.twitter.com/wHJlQPECXf— Shawn M. Jones (@shawnmjones) June 6, 2018
"Get everything into the browser" and "eliminate that production step in the middle" of #scholarlycommunication - @carlystrasser #jcdl2018 pic.twitter.com/FZNbVBOSTL— Shawn M. Jones (@shawnmjones) June 6, 2018
#jcdl2018 @carlystrasser describes "the open ecosystem" including tools by @CokoFoundation and other tools like @inveniosoftware pic.twitter.com/Ir5LBfR0hL— Shawn M. Jones (@shawnmjones) June 6, 2018
The main conference gave way to workshops and a preview of JCDL 2019 which is scheduled to take place at the School of Information Sciences at the University of Illinois, Urbana-Champaign from June 2-6, 2019.
#jcdl2018 isn't over yet, but #jcdl2019 will be @Illinois_Alma - @profdownie says that your best action plan is to plan to spend sometime in Chicago too! pic.twitter.com/m8ur0C42Xh— Shawn M. Jones (@shawnmjones) June 6, 2018
I would like to thank the organizers of the conference, the hosts, University of North Texas (UNT) College of Information and UNT Health Science Center, as well as SIGIR for the travel grants. Here are other trip reports including the Doctoral Consortium (from Shawn Jones), a preview of WADL (Web Archiving and Digital Libraries) workshop from Jasmine Mulliken, Digital Production Associate at Stanford University Press, Mat Kelly's (WADL) trip report, and Corren McCoy's Knowledge Discovery From Digital Libraries (KDDL) Workshop Trip Report. Dr. Min-Yen Kan set up a repository for all the slides from JCDL 2018; please upload your slides if you have not already done so.
After lunch photo Wednesday #jcdl2018 pic.twitter.com/JPRmB1U0Wz— Michele Whitehead (@WhiteheadML) June 6, 2018
-- Nwala (@acnwala)
Comments
Post a Comment