Posts

2018-02-27: Summary of Gathering Alumni Information from a Web Social Network

Image
While researching my dissertation topic  (slides 2--28) on social media profile discovery, I encountered a related paper titled Gathering Alumni Information from a Web Social Network  written by Gabriel Resende Gonçalves , Anderson Almeida Ferreira , and Guilherme Tavares de Assis , which was published in the proceedings of the  9th IEEE Latin American Web Congress (LA-WEB) . In this paper, the authors detailed their approach to define a semi-automated method to gather information regarding alumni of a given undergraduate program at Brazilian higher education institutions. Specifically, they use the  Google Custom Search Engine (CSE) to identify candidate LinkedIn pages based on a comparative evaluation of similar pages in their training set. The authors contend alumni are efficiently found through their process, which is facilitated by focused crawling of data publicly available on social networks posted by the alumni themselves. The proposed methodology...

2018-01-08: Introducing Reconstructive - An Archival Replay ServiceWorker Module

Image
Web pages are generally composed of many resource such as images, style sheets, JavaScript, fonts, iframe widgets, and other embedded media. These embedded resources can be referenced in many ways (such as relative path, absolute path, or a full URL). When the same page is archived and replayed from a different domain under a different base path, these references may not resolve as intended, hence, may result in a damaged memento . For example, a memento (an archived copy) of the web page https://www.odu.edu/ can be seen at  https://web.archive.org/web/20180107155037/https://www.odu.edu/ . Note that domain name has changed from www.odu.edu to web.archive.org and some extra path segments are added to it. In order for this page to render properly, various resource references in it are rewritten, for example,  images/logo-university.png  in a CSS file is replaced with /web/20171225230642im_/http://www.odu.edu/etc/designs/odu/images/logo-university.png . Traditiona...

2018-01-07: Review of WS-DL's 2017

Great writeup of #jcdl2017 in Toronto by @acnwala , featuring @oducs @WebSciDL (2 faculty, 2 alums, 3 grad students) https://t.co/wSirB8Jhq9 pic.twitter.com/HM0XePiz8u — ODU Computer Science (@oducs) July 28, 2017 . @WebSciDL luncheon, joint w/ Dr Li's group and several prospective students. pic.twitter.com/uvgDpHmPWc — Michael L. Nelson (@phonedude_mln) February 10, 2017 The Web Science and Digital Libraries Research Group had a steady 2017, with one MS student graduated, one research grant awarded ($75k), 10 publications, and 15 trips to conferences, workshops, hackathons, internships, etc.  In the last four years (2016--2013) we have graduated five PhD and three MS students, so the focus for this year was "recruiting" and we did pick up seven new students: three PhD and four MS.  We had so many new and prospective students that Dr. Weigle and I created a new CS 891 web archiving seminar to indoctrinate introduce them to web archiving and graduate schoo...

2018-01-06: Two WSDL Classes Offered for Spring 2018

Image
Two Web Science & Digital Library ( WS-DL ) courses will be offered in Spring 2018:  CS 725/825 " Information Visualization ", Wednesdays 9:30-12:15 pm (CRNs 24134 & 24135) offered by Dr. Michele C. Weigle .  This will be an updated version of the same course most recently taught in Fall 2017 (from which the figure above is taken).  CS 432/532 " Web Science ", Tuesdays 4:20-7 pm (CRNs 24600 & 24601) offered by Alexander Nwala .  This will be an updated version of the course most recently taught by Dr. Michael Nelson in Spring 2017 .  Also, although they are not WS-DL courses per se, WS-DL member Corren McCoy is also teaching CS 462 Cybersecurity Fundamentals again this semester, and WS-DL alumnus Dr. Charles Cartledge is teaching two classes: CS 395 " Data Wrangling " and CS 395 " Data Analysis ". --Michael

2018-01-02: Link to Web Archives, not Search Engine Caches

Image
Fig.1 Link TheFoundingSon Web Cache Fig.2 TheFoundingSon Archived Post In a recent article in Wired, " Yup, the Russian propagandists were blogging lies on Medium too ," Matt Burgess makes reference to three now-suspended Twitter accounts: @TheFoundingSon  ( archived ), @WadeHarriot  ( archived ), and @jenn_abrams  ( archived ), and their activity on the blogging service Medium . Fig.3 TheFoundingSon Suspended Medium Account Burgess reports that these accounts were suspended on Twitter and Medium, and quotes a Medium spokesperson as saying:  With regards to the recent reporting around Russian accounts specifically, we’re paying close attention and working to ensure that our trust and safety processes continue to evolve and identify any accounts that violate our rules. Unfortunately, to provide evidence of the pages' former content, Burgess links to Google caches instead of web archives.  At the time of this writing, two of the three links fo...

2017-12-31: Digital Blackness in the Archive - DocNow Symposium Trip Report

Image
Digital Blackness in the Archive was such a beautiful event. Thank you. #BlackDigArchive pic.twitter.com/SpoWYWDOhL — DocumentingTheNow (@documentnow) December 15, 2017 From December 11-12, 2017, I attended the second Documenting the Now Symposium in St. Louis, MO.  The meeting presentations were recorded and are available along with an annotated agenda ; for further background about the Documenting the Now project and my involvement via the advisory board, I suggest my 2016 trip report , as well as DocNow activity on github , slack , and Twitter .  In addition, the meeting itself was extensively live-tweeted with #BlackDigArchive (see also the data set of Tweet ids collected by Bergis Jules ). kicking off @documentnow #BlackDigArchive at @fergusonlibrary ! Livestream: https://t.co/K2XbN2HJdi pic.twitter.com/mVEBXSALTm — bibliotekah (@tttkay) December 11, 2017 Awesome keynote by @amplify285 ! So glad she could be here. Just found out she’s also a @WUSTL a...