Posts

Showing posts from August, 2018

2018-08-30: Excited to Join WS-DL group in ODU!

Image
I am an outlier compared with most computer scientists because I spent 10 years on a field called "Astronomy and Astrophysics". Very few computer scientists followed the same path as me to transfer from a seemingly irrelevant major. But this is where my passion is, so I did it, and I made it!

Right after I graduated as a PhD in 2011, I joined the CiteSeerX group directed by Dr. C. Lee Giles at IST, Penn State University. I worked as a DBA for web crawling at the beginning and soon became the tech leader of the search engine, and recently the Co-PI of an NSF awarded proposal on CiteSeerX. I spent six years, an usually long time as a postdoc and then was promoted to a teaching faculty. However, I kept moving on, because I wanted to do research!

Luckily, Michael and Michele did not mind of taking the risk and bet on me to be a tenure-track faculty at the Old Dominion University. So I accepted the offer and became a member of the Web Science Digital Library group at ODU CS.

I ap…

2018-08-25: Four WS-DL Classes Offered for Fall 2018

Image
Four WS-DL classes are offered for Fall 2018:
CS 418/518 Web Programming is taught by Dr. Justin Brunelle, Tuesdays 4:20-7pm.  This class teaches LAMP, the original web programming stack. Even if you end up using MEAN, you still need to know LAMP. CS 431/531 Web Server Design is taught by Dr. Michael L. Nelson, Wednesdays 4:20-7pm.  This class teaches REST, the primary architectural style for web programming, via implementing a fully functional web server from scratch.  CS 795/895 Intro to Data Science is taught by Dr. Sampath Jayarathna, Tuesdays & Thursdays, 5:45-7pm.  This course will cover Python, machine learning, NumPy, pandas, and general data wranglingCS 795/895 Mining Scholarly Big Data is taught by Dr. Jian Wu, Tuesdays & Thursdays, 9:30-10:45am.  This course will cover machine learning, data mining, deep learning, as applied to the corpus of scholarly communication (via Dr. Wu's involvement in the CiteSeerX project). Dr. Michele C. Weigle is not teaching this …

2018-08-01: A Preview of MementoEmbed: Embeddable Surrogates for Archived Web Pages

Image
As I described to the audience of Dodging the Memory Hole last year, surrogates provide the reader with some clue of what exists behind a URI. The social card is one type of surrogate. Above we see a comparison between a Google URI and a social card generated from that URI. Unless a reader understands the structure of all URIs at google.com, they will not know what the referenced content is about until they click on it. The social card, on the other hand, provides clues to the reader that the underlying URI provides directions from Old Dominion University to Los Alamos National Laboratory. Surrogates allow readers to pierce the veil of the URI's opaqueness.

With the death of Storify, I've been examining alternatives for summarizing web archive collections. Key to these summaries are surrogates. I have discovered that there exist services that provide users with embeds. These embeds allow an author to insert a surrogate into the HTML of their blog post or other web page. These…