Posts

Showing posts from January, 2017

2017-01-23: Finding URLs on Twitter - A simple recommendation

Image
As part of a research experiment, I had the need to find URLs embedded in tweets from Twitter's web search service. Most of the URLs where much older than 7 days, so using the Twitter search API was not an option since the search is performed on a sample of tweets published in the past 7 days, so I used the web search service.  I began the experiment by pasting URLs from tweets into the search box on twitter.com: I noticed I was able to find some URLs embedded in tweets, but this was not always the case. Based on my observations, finding the URLs was not correlated with the age of the tweet. I discussed this observation with Ed Summers and he recommended adding a "url:" prefix to the URL before searching. For example, if the search URL is:        "http://www.cnn.com",  he recommended searching for       "url:http://www.cnn.com" I observed that prepending search URLs with the "url:" prefix improved my search success rate. For example, the se…

2017-01-20: CNN.com has been unarchivable since November 1st, 2016

Image
CNN.com has been unarchivable since 2016-11-01T15:01:31, at least by the common web archiving systems employed by the Internet Archive, archive.is, and webcitation.org. The last known correctly archived page in the Internet Archive's Wayback Machine is 2016-11-01T13:15:40, with all versions since then producing some kind of error (including today's;2017-01-20T09:16:50). This means that the most popular web archives have no record of the time immediately before the presidential election through at least today's presidential inauguration. Given the political controversy surrounding the election, one might conclude this is a part of some grand conspiracy equivalent to those found in the TV series The X-Files. But rest assured, this is not the case; the page was archived as is, and the reasons behind the archival failure are not as fantastical as those found in the show.  As we will explain below, other archival systems have successfully archived CNN.com during this period (e…

2017-01-15: Summary of "Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data"

Image
Based on the paper:

Kuhn, T., Dumontier, M.: Trusty URIs: Verifiable, immutable, and permanent digital artifacts for linked data. Proceedings of the European Semantic Web Conference (ESWC) pp. 395–410 (2014).

A trusty URI is a URI that contains a cryptographic hash value of the content it identifies. The authors introduced this technique of using trusty URIs to make digital artifacts, specially those related to scholarly publications, immutable, verifiable, and permanent. With the assumption that a trusty URI, once created, is linked from other resources or stored by a third party, it becomes possible to detect if the content that the trusty URI identifies has been tampered with or manipulated on the way (e.g., trusty URIs to prevent man-in-the-middle attacks). In addition, trusty URIs can verify the content even if it is no longer found at the original URI but still can be retrieved from other locations, such as Google's cache, and web archives (e.g., Internet Archive).

The core …

2017-01-08: Review of WS-DL's 2016

Image
The Web Science and Digital Libraries Research Group had a productive 2016, with two Ph.D. and one M.S. students graduating, one large research grant awarded ($830k), 16 publications, and 15 trips to conferences, workshops, hackathons, etc.

For student graduations, we had:
Justin Brunelledefended his Ph.D. dissertation on February 5, 2016.  Justin already had a full-time position at MITRE, but not coincidentally he had his choice of significant promotions at the conclusion of his Ph.D. Yasmin AlNoamanydefendedher Ph.D.dissertation on June 16, 2016.  Yasmin had several opportunities, and eventually decided on a postdoc fellow position in Software Curation at UC Berkeley, with Dr. Erik Mitchell. Greg Szalkowski completed his M.S. in 2016 as well.  We had hoped to keep him on for a Ph.D., but he's having too much fun traveling the world setting up military communications solutions. Other student advancements:
Shawn Jones passed his breadth exam.Alexander Nwala passed his breadth exam

2017-01-07: Two WS-DL Classes Offered for Spring 2017

"One of the primary reasons I got hired was because of the [@WebSciDL] courses I took at ODU." -@prasanna_sajjan (@ODUnow, @oducs, MS '16) pic.twitter.com/hGJtRIQ8JY — ODU Computer Science (@oducs) December 5, 2016

Two WS-DL classes are offered for Spring 2017:

CS 725/825 - Information Visualization, Dr. WeigleCS 432/532 - Introduction to Web Science, Dr. Nelson Information Visualization is being offered both online (CRNs 26614/26617 (HR), 26615/26618  (VA), 26616/26619 (US)) and on-campus (CRN 24698/24699).  Web Science is offered on-campus only (CRNs 25728/25729).  Although it's not a WS-DL course per se, WS-DL member Corren McCoy is also teaching CS 462/562 Cybersecurity Fundamentals this semester (see this F15 offering from Dr. Weigle for an idea about its contents).

--Michael