Posts

2015-12-22: 60% of Web Annotations are Orphaned or in Danger of Being Orphaned

Image
Figure 1. An Annotation is defined by OAC  as a set of connected resources   In our  TPDL paper , we studied 6281 highlighted text annotations (out of 7744 annotations) available in the Hypothes.is annotation system in January 2015. The main goal was to investigate the prevalence of orphaned annotations, where neither a live Web page nor an archived copy of the web page contains the text that had previously been annotated. Recently, we applied the same analysis as in our TPDL paper to a larger number of annotations.  Figure 2 illustrates that the number of annotations in Hypothes.is has been increasing since July 2013. Our TPDL paper focused on the 7744 annotations available in January 2015.  Our updated paper (available at arXiv.org ) analyzed the 20,133 highlighted text annotations (out of 33,946 total annotations) available in August 2015.  In this post, I will focus on reporting results of our arXiv paper. Figure 2. January 2015 - dataset used in TPDL paper August 2015

2015-10-07: IMLS and NSF fund web archive research for WS-DL

Image
In the spring and summer of 2015, the Web Science and Digital Libraries (WS-DL) group has received a total of $950k of funding from the IMLS and the NSF to study various aspects of web archiving.  Although previously announced on twitter (IMLS: 2015-03-31 & NSF: 2015-08-25 ), here we provide greater context for how these awards support our vision for the future of web archiving*. Our IMLS proposal is titled " Combining Social Media Storytelling With Web Archives " and a PDF of the full proposal is available directly from the IMLS.  This proposal is joint with our partners at Archive-It and is informed by our experiences in several areas, such as: Our previous attempts at visualizing Archive-It collections where we ran into difficulty scaling conventional approaches (e.g., treemaps , timelines) to entire collections.   "Storytelling" social media services; namely storify.com , but also pinterest.com , scoop.it , paper.li , and similar services.   Th

2015-09-30: Digital Preservation - Magdeburg Germany Trip Report

Image
Dr. Herzog : This large green area on your left is Sanssouci Park . It has 11 palaces in it. Yasmin : I want to visit this park after we are back from the university, can we? Dr. Herzog : We sure can... I think we will be back before sunset. Yasmin : I love beautiful things. Dr. Herzog : Who doesn't? Sawood : [Smiles] The three souls were heading to the Hochschule Magdeburg-Stendal University from Potsdam, Germany in Dr. Michael Herzog 's car for a lunch lecture on the topic of Digital Preservation. Yasmin and Sawood from the Web Science and Digita Libraries Research Group of the Old Dominion University , Norfolk, Virginia were invited for the talk by Dr. Herzog at his SPiRIT Research Group . The two WSDL members have presented their work at TPDL 2015 in Poznan, Poland then on their way back home they ware halted and hosted by Dr. Herzog in Germany for the lunch lecture. You may also enjoy the TPDL 2015 trip report by Yasmin . Passing by beautiful landsc

2015-09-28: TPDL 2015 in Poznan, Poland

Image
The Old Market Square in Poznan On September 15 2015, Sawood Alam and I ( Yasmin AlNoamany ) attended the 2015 Theory and Practice of Digital Libraries (TPDL) Conference in Poznan, Poland. This year, WS-DL had four accepted papers in TPDL for three students ( Mohamed Aturban (who could not attend the conference because of visa issues), Sawood Alam, and Yasmin AlNoamany). Sawood and I arrived in Poznan on Monday, Sept. 14. Although we were tired from travel, we could not resist walking to the the best area in Poznan, the old market square . It was fascinating to see those beautiful colorful houses at night with the reflection of the water on them after it rained with the beautiful European music by many artists who were playing in the street. The next morning we headed to the conference, which was held in Poznań Supercomputing and Networking Center . The organization of the conference was amazing and the general conference co-chairs, Marcin Werla and Cezary Mazurek , wer

2015-09-21: InfoVis Spring 2015 Class Projects

Image
In Spring 2015, I taught Information Visualization (CS 725/825) for MS and PhD students.  This time we used Tamara Munzner 's Visualization Analysis & Design textbook, which I highly recommend : "This highly readable and well-organized book not only covers the fundamentals of visualization design, but also provides a solid framework for analyzing visualizations and visualization problems with concrete examples from the academic community. I am looking forward to teaching from this book and sharing it with my research group." —Michele C. Weigle, Old Dominion University I also tried a flipped-classroom model , where students read and answer homework questions before class so that class time can focus on discussion, student presentations, and in-class exercises. It worked really well -- students liked the format, and I didn't have to convert a well-written textbook into Powerpoint slides. Here I highlight a couple of student projects from that course.  (All c

2015-09-10: CDXJ: An Object Resource Stream Serialization Format

Image
I have been working on an IIPC funded project of profiling various web archives to summarize their holdings . The idea is to generate statistical measures of the holdings of an archive under various lookup keys where a key can be a partial URI such as Top Level Domain (TLD), registered domain name, entire domain name along with any number of sub-domain segments, domain name and a few segments from the path, a given time, a language, or a combination of two or more of these. Such a document (or archive profile) can be used answer queries like "how many *.edu Mementos are there in a given archive?", "how many copies of the pages are there in an archive that fall under netpreserve.org/projects/* ", or "number of copies of *.cnn.com/* pages of 2010 in Arabic language". The archive profile can also be used to determine the overlap between two archives or visualize their holdings in various ways. Early work of this research was presented at the Internet