Posts

Showing posts from 2023

2023-05-31: Towards an Ethical Framework for Full-Text Search in Web Archives

Image
Figure 1: Arquivo.pt's full-text search engine makes it effortless to search archived content on GeoCities. The exploitative search results for the query " party site:geocities.com ", not included in the figure, show the dangers of full-text search without an ethical framework. On International Women's Day , I wanted to highlight how web archives have preserved progress towards gender equality. Both Archive-It and the UK Web Archive have dedicated collections about this topic.  Check out these women's rights web archive collection lists for #IWD2023 : @UKWebArchive women's rights collections: https://t.co/hfuik5pwDf Archive-It women's right collections: https://t.co/IUzAWZsdFF #WebArchiveWednesday — Lesley Frew (@lesley_elis) March 9, 2023 While the Internet Archive does not have any dedicated collections, it is possible to conduct a keyword/metadata search across the Wayback Machine holdings. I performed a Google search (Internet Archive Internatio

2023-05-29: 2023 WS-DL Research Expo

Image
On May 8, 2023,  we  held our third annual  WS-DL Research Expo .   We stuck with the same format as the prior two years ( 2022 & 2021 ): one student from each professor giving a short overview of their research.  Links to all the materials are gathered in the GitHub  repo , but here again is the list of students and their presentations: Travis Reid : Game Walkthroughs and Web Archiving Project Gavindya Jayawardena : RAEMAP: Real-time Advanced Eye Movements Analysis Pipeline Yash Prakash : AutoDesc: Facilitating Convenient Perusal of Web Data Items for Blind Users David Calano : Updates on Memento Damage Kehinde Ajayi : A Study on Reproducibility and Replicability of Table Structure Recognition Methods We were fortunate enough to welcome back some of our  alumni , including:  Justin Brunelle (PhD, 2016),   Shawn Jones   (PhD, 2021),  Mat Kelly   (PhD, 2019),  Sawood Alam   (2020).  We really appreciate the ongoing relationship we have with our alumni, and all are welcome at th

2023-05-25: Generative Archive Restoration

Image
  Rise of the Machines! Machine Learning just cannot seem to keep itself out of news cycles. The third version of OpenAI's generative dialogue language model, ChatGPT, had tech giants all around scrambling in quite a circus trying to push out their own versions. Google's size and stagnation in recent years had it seeing red and bringing Sergey Brin back into the fold to aid in rushing out its own chat bot, Bard . Microsoft, comparatively, has been humming along for a while now with its own research in the AI agent space but a various headlines  hint that its past and present efforts might not be paying off as well as they were hoping. You.com  is a relatively new search engine leveraging machine learning in its own chat assistant YouChat  and other services in an attempt to push the frontiers of a search engine through multi-modal search with integrated artificial intelligence enhancements. These efforts are all seeking to shake up how we seek and retrieve information o