Showing posts from March, 2016

2016-03-22: Language Detection: Where to start?

Language detection is not a simple task, and no method results in 100% accuracy. You can find different packages online to detect different languages. I have used some methods and tools to detect the language of either websites or some texts. Here is a review of methods I came across during working on my JCDL 2015 paper, How Well are Arabic Websites Archived? . Here I discuss detecting a webpage's language using the HTTP language header and the HTML language tag. In addition, I reviewed several language detection packages, including Guess-Language , Python-Language Detector , LangID and Google Language Detection API . And since Python is my favorite coding language I searched for tools that were written in Python. I found that a primary way to detect the language of a webpage is to use the HTTP language header and the HTML language tag. However, only a small percentage of pages include the language tag and sometimes the detected language is affected by the browser setti

2016-03-07: Archives Unleashed Web Archive Hackathon Trip Report (#hackarchives)

The Thomas Fisher Rare Book Library  (University of Toronto) Between March 3 - March 5, 2016, Librarians, Archivists, Historians, Computer Scientists, etc., came together for the Archives Unleashed Web Archive Hackathon at the University of Toronto Robarts Library, Toronto, Ontario Canada. This event gave researchers the opportunity to collaboratively develop open-source tools for web archives. The event was organized by Ian Milligan , (assistant professor of Canadian and digital history in the Department of History at the University of Waterloo), Nathalie Casemajor (assistant professor in communication studies in the Department of Social Sciences at the University of Québec in Outaouais (Canada)), Jimmy Lin (the David R. Cheriton Chair in the David R. Cheriton School of Computer Science at the University of Waterloo), Matthew Weber (Assistant Professor in the School of Communication and Information at Rutgers University), and Nicholas Worby (the Government Information &

2016-03-07: Custom Missions in the COVE Tool

When I am not studying Web Sciences at ODU, I work as a software developer at Analytical Mechanics Associates . In general, my work there aims to make satellite data more accessible. As part of this mission, one of my primary projects is the COVE tool . The COVE tool allows a user to view where a satellite could potentially take an image. The above image shows the ground swath of both Landsat 7 (red) and Landsat 8 (green) over a one day period.  The CEOS Visualization Environment (COVE) tool is a browser-based system that leverages Cesium , an open-source JavaScript library for 3D globes and maps, in order to display satellite sensor coverage areas and identify coincidence scene locations. In other words, the COVE tool allows the user to see where a satellite could potentially take an image and where two or more satellite paths overlap during a specified time period. The Committee on Earth Observing Satellites (CEOS) is currently operating and planning hundreds of Earth observat