Posts

Showing posts from March, 2016

2016-03-22: Language Detection: Where to start?

Image
Language detection is not a simple task, and no method results in 100% accuracy. You can find different packages online to detect different languages. I have used some methods and tools to detect the language of either websites or some texts. Here is a review of methods I came across during working on my JCDL 2015 paper, How Well are Arabic Websites Archived?. Here I discuss detecting a webpage's language using the HTTP language header and the HTML language tag. In addition, I reviewed several language detection packages, including Guess-Language, Python-Language Detector, LangID and Google Language Detection API. And since Python is my favorite coding language I searched for tools that were written in Python.

I found that a primary way to detect the language of a webpage is to use the HTTP language header and the HTML language tag. However, only a small percentage of pages include the language tag and sometimes the detected language is affected by the browser setting. Guess-La…

2016-03-07: Archives Unleashed Web Archive Hackathon Trip Report (#hackarchives)

Image
Between March 3 - March 5, 2016, Librarians, Archivists, Historians, Computer Scientists, etc., came together for the Archives Unleashed Web Archive Hackathon at the University of Toronto Robarts Library, Toronto, Ontario Canada. This event gave researchers the opportunity to collaboratively develop open-source tools for web archives. The event was organized by Ian Milligan, (assistant professor of Canadian and digital history in the Department of History at the University of Waterloo), Nathalie Casemajor (assistant professor in communication studies in the Department of Social Sciences at the University of Québec in Outaouais (Canada)), Jimmy Lin (the David R. Cheriton Chair in the David R. Cheriton School of Computer Science at the University of Waterloo), Matthew Weber (Assistant Professor in the School of Communication and Information at Rutgers University), and Nicholas Worby (the Government Information & Statistics Librarian at the University of Toronto’s Robarts Library).

2016-03-07: Custom Missions in the COVE Tool

Image
When I am not studying Web Sciences at ODU, I work as a software developer at Analytical Mechanics Associates. In general, my work there aims to make satellite data more accessible. As part of this mission, one of my primary projects is the COVE tool.

The CEOS Visualization Environment (COVE) tool is a browser-based system that leverages Cesium, an open-source JavaScript library for 3D globes and maps, in order to display satellite sensor coverage areas and identify coincidence scene locations. In other words, the COVE tool allows the user to see where a satellite could potentially take an image and where two or more satellite paths overlap during a specified time period. The Committee on Earth Observing Satellites (CEOS) is currently operating and planning hundreds of Earth observation satellites.  COVE initially began as a way to improve Standard Calibration and Validation (Cal/Val) exercises for these satellites. Cal/Val exercises need to compare near-simultaneous surface observati…