2015-12-22: 60% of Web Annotations are Orphaned or in Danger of Being Orphaned
Figure 1. An Annotation is defined by OAC as a set of connected resources |
Recently, we applied the same analysis as in our TPDL paper to a larger number of annotations. Figure 2 illustrates that the number of annotations in Hypothes.is has been increasing since July 2013. Our TPDL paper focused on the 7744 annotations available in January 2015. Our updated paper (available at arXiv.org) analyzed the 20,133 highlighted text annotations (out of 33,946 total annotations) available in August 2015. In this post, I will focus on reporting results of our arXiv paper.
Figure 2. January 2015 - dataset used in TPDL paper August 2015 - dataset used in arXiv version |
Based on my experience in analyzing web annotations in Hypothes.is, I have seen annotations created just for the purpose of testing the system to see how it works (e.g. some annotations contain the tag "test" in Hypothes.is). Although some annotations can be considered as not beneficial, the majority of annotations are valuable to the community in different aspects. For example, 9 out of the 10 most annotated websites in Hypothes.is are related to education, academic research, or publishing.
The Hypothes.is annotation system offers free accounts allowing users to annotate the Web by, for example, creating tags/notes to highlighted text or to a web page as a whole. Hypothes.is supports collaborative work by letting users reply to each other's comments as shown in Figure 3.
Figure 3. Annotating the Web Using Hypothes.is Annotation System |
It is known that web pages are not fixed resources, and they might be changed or become unavailable at any time. These changes in webpages can affect the associated annotations. Figure 4 shows the target URI http://climatefeedback.org/ as it appeared in December 2014. The highlighted text “Scientific feedback for Climate Change information online” in the webpage was annotated with “After reading about your project at MIT news, I visited your page and ...”. In August 2015, this annotation can no longer be attached to the target web page because the highlighted text no longer appears on the page, as shown in Figure 5. Although the live Web version of http://climatefeedback.org/ has changed and the annotation was in danger of being orphaned, the original version that was annotated has been archived and is available at the Internet Archive. The annotation could be re-attached to this archived resource, or memento.
Figure 4. http://climatefeedback.org/ in December 2014 |
Figure 5. http://climatefeedback.org/ in August 2015 |
- Safe - The annotation can be attached to the target live web page and also to at least one memento.
- In Danger - The annotation can be attached to the target live web page but it is not attached to any mementos. In this case, if the live web page is changed such that the associated annotations become unattached, then these annotations, unfortunately, would become orphaned.
- Re-attached - The annotation is no longer attached to the live web page but, fortunately, it can be reattached to at least one memento from public web archives.
- Orphaned - The annotation is neither attached to the live web page nor any mementos.
Safe and re-attached annotations can be recovered with web archives, so they are in better situation than the other two categories. We want to make annotations that belong to the second category (In danger) safe or re-attached by archiving their target web pages. Obviously, we can do nothing about annotations that belong to orphaned category. They are lost.
We used the LANL Memento Aggregator to look for archived copies of web pages (mementos) in the public archives. To be more specific, we were looking for the closest mementos to annotations' creation date. In the example shown in Figure 4, we would need to find the closest mementos captured immediately before and after the annotation creation date (e.g., December 3, 2014 at 12:47 AM for the web page http://climatefeedback.org).
Figure 6(a) shows an example where mementos are available before and after the annotation creation date. In this example, only M1 and M3 will be tested to see if the associated annotations can be re-attached to these mementos. Figure 6(b) shows mementos that are only available before the annotation creation date while Figure 6(c) shows mementos that are only available after the annotation date. Finally, Figure 6(d) shows annotations that have no existing mementos for their target web pages in the web archives.
Figure 6. Discovering Mementos for Annotations' Target Web Pages |
Figure 7. The Status of Current Hypothes.is Annotations |
As we can see, having 60% of annotations orphaned or in danger of being orphaned will lead us to a conclusion that archiving webpages at the time of annotation is important to avoid orphaned annotations.
-- Mohamed Aturban
Comments
Post a Comment