Showing posts from May, 2023

2023-05-31: Towards an Ethical Framework for Full-Text Search in Web Archives

Figure 1:'s full-text search engine makes it effortless to search archived content on GeoCities. The exploitative search results for the query " party ", not included in the figure, show the dangers of full-text search without an ethical framework. On International Women's Day , I wanted to highlight how web archives have preserved progress towards gender equality. Both Archive-It and the UK Web Archive have dedicated collections about this topic.  Check out these women's rights web archive collection lists for #IWD2023 : @UKWebArchive women's rights collections: Archive-It women's right collections: #WebArchiveWednesday — Lesley Frew (@lesley_elis) March 9, 2023 While the Internet Archive does not have any dedicated collections, it is possible to conduct a keyword/metadata search across the Wayback Machine holdings. I performed a Google search (Internet Archive Internatio

2023-05-29: 2023 WS-DL Research Expo

On May 8, 2023,  we  held our third annual  WS-DL Research Expo .   We stuck with the same format as the prior two years ( 2022 & 2021 ): one student from each professor giving a short overview of their research.  Links to all the materials are gathered in the GitHub  repo , but here again is the list of students and their presentations: Travis Reid : Game Walkthroughs and Web Archiving Project Gavindya Jayawardena : RAEMAP: Real-time Advanced Eye Movements Analysis Pipeline Yash Prakash : AutoDesc: Facilitating Convenient Perusal of Web Data Items for Blind Users David Calano : Updates on Memento Damage Kehinde Ajayi : A Study on Reproducibility and Replicability of Table Structure Recognition Methods We were fortunate enough to welcome back some of our  alumni , including:  Justin Brunelle (PhD, 2016),   Shawn Jones   (PhD, 2021),  Mat Kelly   (PhD, 2019),  Sawood Alam   (2020).  We really appreciate the ongoing relationship we have with our alumni, and all are welcome at th

2023-06-03: A Trip report on Bill Ingram's Visit to ODU

On Friday, March 24, 2023, we had the pleasure of hosting William A. Ingram , who holds the positions of associate dean, and executive director for information technologies in the University Libraries of Virginia Tech. During his visit, he gave a presentation entitled "Maximizing Access to Long Scholarly Documents." This talk provided an overview of his recent research endeavors, focusing on data analysis, automatic metadata extraction, and strategies for enhancing accessibility to long scholarly documents. Bill Ingram presenting "Maximizing Access to Long Scholarly Documents" Graduate Seminar Talk by Bill Ingram During his talk on his research, he shared his research " Building A Large Collection of Multi-domainElectronic Theses and Dissertations " on making the collection of long scholarly documents computationally driven and excavating knowledge from this rich information source, focusing on electronic theses and dissertations. A Large Collection of Mu

2023-05-25: Generative Archive Restoration

  Rise of the Machines! Machine Learning just cannot seem to keep itself out of news cycles. The third version of OpenAI's generative dialogue language model, ChatGPT, had tech giants all around scrambling in quite a circus trying to push out their own versions. Google's size and stagnation in recent years had it seeing red and bringing Sergey Brin back into the fold to aid in rushing out its own chat bot, Bard . Microsoft, comparatively, has been humming along for a while now with its own research in the AI agent space but a various headlines  hint that its past and present efforts might not be paying off as well as they were hoping.  is a relatively new search engine leveraging machine learning in its own chat assistant YouChat  and other services in an attempt to push the frontiers of a search engine through multi-modal search with integrated artificial intelligence enhancements. These efforts are all seeking to shake up how we seek and retrieve information o