Posts

2013-03-27: ResourceSync Meeting and JCDL 2013 PC Meeting

Image
 On March 21 & 22 members of the ResourceSync technical group met in Ann Arbor Michigan to work the 0.5 version of the ResourceSync specification .  In case you're not familiar, ResourceSync is a framework, intended to replace OAI-PMH , for specifying how a destination ("harvester" in PMH terms) can synchronize the web resources of a source ("repository" in PMH terms).  The source publishes a list of resources that it makes available via ResourceSync (which may be a subset of valid resources at the web site) using Sitemaps , with the idea that if you're already using Sitemaps then you are already minimally compliant, and the more advanced features of ResourceSync also use the Sitemap syntax for consistency.  Although the syntactic details are in flux, Herbert's presentation at the September 2012 NISO Forum is a good introduction the framework, as are the two recent D-Lib Magazine articles ( Sept/Oct 2012 and Jan/Feb 2013 ).  Some important

2013-03-22: NTRS, Web Archives, and Why We Should Build Collections

Image
At the ResourceSync meeting this week, Simeon Warner brought my attention to the fact that the NASA Technical Report Server (NTRS) digital library had gone offline on March 19.  Although I have not been involved with it since about 2004, I was the creator of NTRS and it was a central part of my early career .  If you click on http://ntrs.nasa.gov/ now, you can a message saying the service is down.  Technically, you get an "HTTP/1.1 503 Service Temporarily Unavailable" message: $ curl -I http://ntrs.nasa.gov/ HTTP/1.1 503 Service Temporarily Unavailable Date: Sat, 23 Mar 2013 04:00:14 GMT Server: Apache/2.2.3 (Red Hat) Last-Modified: Fri, 22 Mar 2013 12:50:02 GMT ETag: "720003-300-4d882e4c05280" Accept-Ranges: bytes Content-Length: 768 Connection: close Content-Type: text/html; charset=UTF-8  And the body of the page says: The NASA technical reports server will be unavailable for public access while the agency conducts a review of the site's conten

2013-03-02: NFL 2013 Salary Cap

Image
The NFL salary cap for 2013 has been calculated to be about $123 million. All NFL teams must be in compliance with the salary cap by March 12th when the new league year starts. March 12th also marks the start of the free agent market in the NFL. Teams that are over the salary cap must let some players go and teams that are under the salary cap are looking to add new players to their rosters. The process sounds simple on the surface but in reality it becomes confusing rather quickly. Many teams routinely exceed the salary cap by manipulating contracts. The Pittsburgh Steelers were about $14 million over the cap until they modified Ben Roethlisberger's contract and changed most of his pay into a signing bonus. Signing bonuses can be amortized over the life of a contract. Instead of receiving an $18 million dollar salary, the player gets a $2 million dollar salary and a $16 million dollar bonus. The bonus will be divided by the number of years in the contract and thus reduce the i

2013-02-24: Personal Digital Archiving 2013

Image
On February 21-22 Justin Brunelle ( @justinfbrunelle ) and I ( @machawk1 ) traveled to College Park, Maryland for Personal Digital Archiving (PDA) 2013 . Other members of the Web Science and Digital Libraries Research (WS-DL) Group at ODU had previously attended this conference (see 2012 Trip Report and 2011 Trip Report ), always previously at Internet Archive in San Francisco , and knew it would be informative and extremely relevant to both of research efforts. We had both been anticipating a few of the presentations, namely the keynotes by Sally Bedell Smith and George Sanger and that Erin Engle ( @erinengle ) promised on the Library of Congress digital preservation blog The Signal . For the sake of preservation, I captured videos of many of the presentations , which I posted on Internet Archive. Each available will be linked inline in this post but for a more original experience, view the videos. As our sole mission at WS-DL is not only to document conferences (ok, admit

2013-01-18: NFL Conference Championship Predictions

Image
The NFL Conference Championship games are this weekend and just one game separates the four remaining teams from Super Bowl XLVII. If you ignore the vapidity of the Te'o coverage, there is much discussion of how the loss of Rob Gronkowski will impact the performance of the Patriots this weekend. Aaron Hernandez should perform admirably and the entire team should be able to make up the difference and triumph. For our predictions we run a number of different types of algorithms in our research and compare the outputs. The three main algorithms that have consistently had the best performance are a Support Vector Model (SVM), a Multilayer Perceptron Neural Network, and a ranking algorithm. All three algorithms sided with the favorites. The SVM gives us a binary output, winner or loser. There is nothing in between. The SVM chose San Francisco and New England. The Neural Network output is a continuous variable that is supposed to be the margin of victory. A positive score fa

2013-01-13: Three WS-DL Classes Offered for Spring 2013

Image
Three WS-DL classes are offered for the Spring 2013 semester: one undergraduate elective and two upper-level graduate courses. CS 418 Web Programming - This is a follow-on to last semester's Web Programming course .  This semester it will be taught by PhD student Scott Ainsworth , who has extensive experience in this area.  Students will learn to program in a LAMP environment .  CS 795/895 Applied Visual Analytics - Taught by Dr. Weigle , students will review basic data mining and information visualization techniques and then work together in groups on particular challenges from the visual analytics community . CS 895 Web-Based Information Retrieval - Taught by Dr. Nelson , this class will a review of IR models , ranking, evaluation, DM / ML , etc. CS 418 counts toward the Web Programming Minor , and the upper level graduate classes will count toward the 24 hours of course work required for the PhD.  The deadline to register is January 22 . --Michael

2013-01-10: NFL Divisional Playoff Predictions

Image
For the NFL Divisonal playoff week the predictions for all of our algorithms are in agreement. For our predictions we run a number of different types of algorithms in our research and compare the outputs. The three main algorithms that have consistently had the best performance are a Support Vector Model (SVM), a Multilayer Perceptron Neural Network, and a ranking algorithm. All three algorithms sided with the favorites except for Atlanta. All three picked Seattle for the upset. All season long our algorithms have consistently shown that Atlanta is over-rated. Yes they have won most of their games this season but they have had the easiest strength of schedule this year out of the all of the NFL teams. ESPN's adjusted strength of schedule shows that Atlanta has had an easy season. Big Lead Sports states that not only was Atlanta's season easy but it was respectively the easiest season for any NFL team in a number of years. Most of our algorithms take the strength of sched

2013-01-05: NFL Playoff Predictions

Image
The wildcard week of the playoffs is upon us. The numbers were crunched and the results were rather predictable. In three of the games the home is the favorite to win for both the Support Vector Model (SVM) and the PageRank model. For the fourth game the Seahawks were chosen by both the SVM and the PageRank model.  The SVM gives us a binary result so there is no degree or way to judge how close of a game it may be. Our numbers indicate that the Redskins Seahawks game is going to be close and probably a low scoring game. Both teams like to run the ball but the Seahawks defense has performed better that the Redskins this year. What my be interesting is that the Seahawks are a 2 to 3 point favorite and they are the visiting team. Our previous research has shown that home team underdogs are often a good bet to cover the spread. Vergin and Sosik found that not only has the home underdog been viable in some years but that the effect was more pronounced on nationally televised games ver

2012-12-21: The Performance of Betting Lines for Predicting the Outcome of NFL Games

Image
It was the first week of the 2007 National Football League (NFL) season. After waiting all summer for the NFL season to begin, the fans were rabid with anticipation. The airwaves were filled with sportscasters debating the prospects of teams from both conferences and how they would perform. Of particular interest was the New England Patriots. They had two starters out with injuries and their star receiver, Randy Moss, was questionable for the game. New England was playing against the NY Jets and their simmering rivalry add heat to the fire. Many of the sportscasters were lining up with the Jets and Vegas was favoring the Jets with a 6 point line at home. When betting opened for the game the action on the Patriots was heavy. The shear volume of bets place on New England to win forced the sportsbooks to move the spread in an attempt to equalize betting on both sides. Eventually the line moved all  of the way to New England being a seven point favorite by game time. New England went

2012-12-20: NFL Power Rankings Week 16

Image
The NFL Playoffs are only a few weeks away. With the end of the regular season in sight there are a few trends that subtlety change the game. One of the trends is the weather. Tennessee is playing at Green Bay and snow is in the forecast. Another end of the season trend is displayed by teams that have clinched playoff positions. They rest their starting lineup and play backup players. That is more of a week 17 phenomenon but with Atlanta and Houston both at 12-2 for the season they may play some non-starters during the game. This ranking system is based on team performance and does not take trends like the weather into account. Our ranking system is based on Google's PageRank algorithm.It is explained in some detail in past posts . A directed graph is created to represent the current years season. Each team is represented by a node in the graph. For every game played a directed edge is created from the loser pointing to the winner and it is weighted by the Margin of Victo

2012-12-17: Archive-It Partners Meeting

I attended the 2012 Archive-It Partners Meeting in Annapolis, MD on December 3. I decided to attend at the last minute, and Kristine and Lori graciously let me have 5 minutes to talk about our project and upcoming NEH proposal.  We're looking for humanities-types and Archive-It partners to work with in evaluating our visualizations. After my presentation, I was able to make contacts with several potential partners.   Visualizing Digital Collections at Archive-It from Michele Weigle There were several nice talks in the half-day session.  The full schedule and slides from all of the presentations are available. Related to what we're working on, Alex Thurman from Columbia University Libraries talked about their local portal  to their Human Rights collection ( collection  at Archive-It).  They offer a rotated list of screenshots for featured sites and have tabs to show the collection pages by title, URL, subject, place, and language. One nice feature they'v

2012-12-14: InfoVis at Grace Hopper

Image
I was selected give a 5-minute faculty lightning talk at the Grace Hopper Celebration of Women in Computing in October in Baltimore.  Short talks are among the most difficult to prepare, especially short talks for a general audience. I decided to increase my level of difficulty for the talk by combining two topics in my 5-minute talk, information visualization (infovis) and web archiving. I ended up presenting a snapshot of the work that Kalpesh Padia and Yasmin AlNoamany did for their JCDL 2012 paper, Visualizing Digital Collections at Archive-It (see  related blog post ). Information Visualization - Visualizing Digital Collections at Archive-It from Michele Weigle The faculty lightning talks session was new at Grace Hopper, but went very well.  We had a 45-minute session and got to hear about 8 totally different research projects.  Info and slides from all of the presentations are available on the GHC wiki .  Especially for work-in-progress, this format was a great w

2012-11-10: Site Transitions, Cool URIs, URI Slugs, Topsy

Image
Recently I was emailing a friend and wanted to update her about the recent buzz we have enjoyed with Hany SalahEldeen 's TPDL 2012 paper about the loss rate of resources shared over Twitter.  I remembered that an article in the MIT Technology Review from the Physics arXiv blog started the whole wave of popular press (e.g., MIT Technology Review , BBC , The Atlantic , Spiegel ).  To help convey the amount of social media sharing of these stories, I was sending links to the sites using social media search engine Topsy .  Having recently discovered it, Topsy has quickly become one of my favorite sites.  It does many things, but the part I enjoy most is the ability to prepend " http://topsy.com/ " to a URI to discover how many times a URI has been shared and who is sharing it.  For example: http://www.bbc.com/future/story/20120927-the-decaying-web becomes: http://topsy.com/http://www.bbc.com/future/story/20120927-the-decaying-web and you can see all the tweets th

2012-11-06: TPDL 2012 Conference

Image
It all started last April, particularly on the 9th, when I received an email from the Dr. George Buchanan delivering the good news, my paper have been accepted at the annual international conference on Theory and Practice of Digital Libraries TPDL 2012 . Being the Program Chair, Dr. Buchanan sent me the reviews and feedback associated with my paper which was entitled “ Losing My Revolution: How Many Resources Shared on Social Media Have Been Lost? ” which paved the way in the following months for the preparation process to present this paper.   Along with submitting the paper, Dr. Nelson gave me the permission to submit my PhD proposal to be considered for the Doctoral Consortium at the conference. Scoring my second goal, Dr. Birger Larsen and Dr. Stefan Gradmann sent me a delightful email announcing the committee's acceptance to my proposal and I was invited a day before the conference to present my work at the consortium. The Hat-trick came a few weeks before