Posts

2012-04-12: Ruby on Rails Presentation for the ODU ACM

Image
Recently I was invited to present Ruby on Rails as a rapid Web application development platform at an ODU ACM meeting. I have divided the session into three phases: presentation followed by live example application development and Q&A. The example application was a simple RESTful database driven application to store attendees' record with several attributes of various data-types. I started the presentation with an overview of Ruby language . Ruby is a fairly new general purpose multi-paradigm (including functional, object oriented, imperative and reflective) language, with syntax inspired by Python , Smalltalk and many other languages. Ruby is designed for programmer productivity and fun hence, its primary focus is human rather than computer needs. This philosophy goes very well with the rapidly changing client needs. Often by the time programers are done with the application development as per the specifications, the client's needs have already changed. Hence, R

2012-03-08: ResourceSync NISO Telecon

Image
On March 6, 2012 we had a ResourceSync telecon for the purpose of explaining the goals of the project, some preliminary technology explorations, as well as soliciting potential partners in the development of a NISO standard . The project is joint between NISO and the Open Archives Initiative (OAI), and funded by the Alfred P. Sloan Foundation . The details of the project are yet to be decided, but the focus is on exploring differing modalities for change notification (CN) and content transfer (CT). We are exploring a variety of push technologies to augment conventional harvesting technologies (e.g., RSS, Atom, OAI-PMH). More details can be found in the slides Herbert and Rob covered during the telecon: ResourceSync: Conceptual and Technical Problem Perspective from Herbert Van de Sompel The ResourceSync team had a face-to-face meeting in Baltimore, February 2-3, 2012 where we settled on some of the basic project parameters and discussed an early prototype. Prior t

2012-02-24: Personal Digital Archiving 2012

Image
For its third consecutive year, the Personal Digital Archiving conference took place at Internet Archive in San Francisco , CA. Ahmed and I attended a diverse range of fascinating sessions on how people think about creating and preserving personal digital archives. The environment was very nice, and friendly (there were a baby and a dog in the second day ^_^). The conference was held on Feb. 24 and Feb. 25, 2012. The first day started at 9:00 am with a keynote and welcome intro by Brewster Kahle about the Internet Archive and personal archives. Brewster gave a quick intro about Internet Archive history and asked an important question, “what would we want out of the Internet Archive in terms of preserving stuff that individuals are creating?”, which should be solved by knowing how to collect materials and make them useful for the people from this conference. Mike Ashenfelder from the Library of Congress gave a talk entitled “Personal Digital Archive Advice for the General Publ

2012-02-11: Losing My Revolution: A year after the Egyptian Revolution, 10% of the social media documentation is gone.

Image
The Egyptian revolution on the 25th of January 2011 was unlike any other revolution in history because of the role of social media . Several blogs, Storify entries, web pages, channels on YouTube where created to document the revolution . Several books were even published documenting the 18 days . All of these contributions were made by the public, not historians, utilizing the tools of web 2.0 . As a result of all these contributions we have an enormous digital content including thousands of posts, tweets, images, videos and sound files narrating and documenting the revolution. Unfortunately, at the first anniversary of this revolution over 10% of this digital content is already gone. Websites like Twitter , YouTube , Facebook , Storify , 1000Memories , Blogger and IAmJan25 have allowed the public to document the events of the revolution in real-time. Storify, for example, allows the user to create a timed organized collection of tweets, links, images, posts, map locations or vid

2012-02-05: Superbowl 46

Image
Superbowl 46 is today and whether you love football or if you just watch for the commercials you are in for some entertainment tonight. Tonight's game is one of the closest games in recent history. There is no doubt that New England has a great offense led by Tom Brady. New England has an Offensive Passing Efficiency of 7.65 yards per play compared to a league average of 5.97. The Giants led by Eli Manning are not far behind with a Passing efficiency of 7.32 yards per play. Both teams are in the top five for offensive passing. However the differences are more dramatic on the defensive side of the house. The Giants have given up 5.97 yards per play which is the league average. The patriots have the 29th worst pass defense and have have given up an average of 6.68 yards per play. Running the algorithms the same way we have all year has the Patriots winning the Superbowl. The predicted margin of victory matches the Vegas Line exactly so this will be a close game. Because this is Sc

2012-01-23: Release of Warrick 2.0 Beta

Image
After a long hiatus, the Warrick tool has been resurrected with some modifications. Warrick is a free utility for reconstructing (or recovering) a website. The original version of Warrick discovered archived versions of resources by searching the Web Infrastructure (which includes search engine caches and the Internet Archive ) for archived versions of web resources. It would automatically download and organize the best versions of the archived resources and package them into a copy of the deleted site. As discussed by Warrick's creator, Frank McCown , the original version of Warrick was prone to breaking due to frequent changes to search engine APIs and archive URLs . Warrick 2.0, adapted from Dr. McCown's original code by Justin F. Brunelle , interfaces with the Memento framework via the mcurl program (developed by Ahmed AlSum ). By incorporating Memento timemaps, Warrick no longer has the responsibility of directly searching and communicating with the caches and archive

2012-01-221: 2011 NFL Season Conference Championship

Image
The NFL Conference championship games are today. Our models have a tendency to reward teams that can pass the ball well as Passing efficiency correlates with wins rather well. Therefore it is no surprise that two out of the three models predict New England will win over Baltimore. However the Neural Network is predicting that it will be a close game and that New England will not cover the spread of 7 points. The San Francisco / New York game is going to be a good game to watch. Both teams are very close but the Giants have the edge on passing efficiency. Favorite Spread Underdog Discrete Pagerank At NE 4 BAL NE BAL At SF 1 NYG NYG SF -- Greg Szalkowski

2012-01-01: 2011 NFL Season Week 17

Image
The last week of the regular season games is here. Week 17 traditionally exhibits greater statistical dispersion than the other weeks. Teams that have locked in playoff spots will be resting the starting players and teams that do not have a chance at the playoffs may be looking for a better draft pick for next year. Our algorithms once again have picked Green Bay to win but most likely they will rest Aaron Rodgers and most of the starters and Detroit will win the game. Green Bay is an enigma this year, they are 14-1 so far and they have given up more yards than they have gained over the year which invites some interesting analysis . Favorite Spread Underdog Discrete Pagerank At PHI 10 WAS PHI PHI At ATL 14 TB ATL ATL SF 5 At STL SF SF At MIN 6 CHI CHI CHI At GB 8 DET GB GB

2011-12-15: 2011 NFL Season Week 15

Image
So far this year all three of the prediction algorithms are 68% correct straight up. This is better than the predictions of most of the NFL "experts" such as the guys at ESPN . Last year we ended up right below 70% correct as well. Breaking the 70% barrier over the season seems to be rather hard to do as seen on the Prediction Tracker . Looking into the statistics of those games reveals some interesting information. In the majority of those games, the losing team had better box scores but still lost the game. We had thought that incorporating the betting line data this year would have had impact but the accuracy of the straight up predictions is not significantly better than last year. The season isn't over yet and anything can happen so here are the predictions for week 15. Favorite Spread Underdog Discrete Pagerank DAL 7 at TB DAL DAL at NYG 10 WAS NYG NYG

2011-12-14 Python & Memento Presentation for the ODU ACM

Earlier this semester, I was invited to present Python at an ODU ACM meeting . I presented a brief overview of the Python language and followed up with a code walk through of the code I use to parse Memento timemaps in my current research. Python, of course, has advantages and disadvantages compared to other languages. Since most ODU undergrads have experience with C++, the presentation presents Python with respect to C++. Pythons advantages include a fast development cycle and an extensive collection of community libraries. Its primary disadvantage compared to C++ is execution speed. My experience is that Python is sometimes over 100 times slower. Python's basic syntax and semantics are straight forward, so the presentation focused on the Python equivalents of commonly-used C++ constructs and the differences between static (C++) and dynamic (Python) typing. Python's implementation of high-level data types (lists, dictionaries, tuples, and sets) and functional code

2011-12-14: CS 495/595 Web Server Development for Spring 2012

Image
The only WS-DL related class that will be offered in spring 2012 is CS 495/595 "Web Server Development". I had planned to offer CS 751/851 "Introduction to Digital Libraries, but I've taught that the last two springs and it has been a while since I've taught the web server development class (the last offering was actually from Martin Klein in spring 2010 ). The premise of this course is that the best way to really get to know HTTP is to build a fully-functional web server from scratch in the language of your choice. That sounds simple enough, but it becomes quite challenging, in part because if you do a poor job at design at the beginning you have to live with the consequences the entire semester. On the other hand, do a good job up front and each assignment will just drop into place (hello, software design ). Along the way, you'll also become quite familiar with reading RFCs and the REST architectural model. Take a look at past offerings of the c

2011-12-08: Summer Microsoft Internship

Image
It all started in San Francisco airport while waiting to get my luggage on my way to the PDA2011 conference. The recruiter from Microsoft called me to inform me that I have been accepted to intern at Microsoft Silicon Valley this summer. I was ecstatic and after a couple of months of bureaucracy and a ton of documents I was ready to leave Norfolk by the end of May. Since I haven’t been on an adventure or a trip for a long time, and since I will definitely need a car in California for the three months of the summer, I decided to drive my car all across the continent. I have always wanted to make a road trip like that where I can stop in every city or town along the way, check out their attractions and eat from their authentic cuisines. At the same time, our colleague and best friend Moustafa Aly managed to secure a job at Amazon’s engineering office in San Francisco . So when he knew I was going to drive all the way there he told me: “forget the plane, I will join you!” We left Nor