Posts

Showing posts with the label Docker

2018-07-22: Tic-Tac-Toe and Magic Square Made Me a Problem Solver and Programmer

Image
" How did you learn programming? ", a student asked me in a recent summer camp. Dr. Yaohang Li organized the Machine Learning and Data Science Summer Camp  for High School students of the Hampton Roads metropolitan region at the Department of Computer Science, Old Dominion University  from June 25 to July 9, 2018. The camp was funded by the  Virginia Space Grant Consortium . More than 30 students participated in it. They were introduced to a variety of topics such as Data Structures, Statistics, Python, R, Machine Learning, Game Programming, Public Datasets, Web Archiving, and Docker etc. in the form of discussions, hands-on labs, and lectures by professors and graduate students. I was invited to give a lecture about my research and Docker . At the end of my talk I solicited questions and distributed Docker swag. The question "How did you learn programming?" led me to draw Tic-Tac-Toe Game and a 3x3 Magic Square on the white board. Then I told them a more t

2018-03-04: Installing Stanford CoreNLP in a Docker Container

Image
Fig. 1: Example of Text Labeled with the CoreNLP Part-of-Speech , Named-Entity Recognizer and Dependency Annotators . Click to expand image. The  Stanford CoreNLP  suite provides a wide range of important natural language processing applications such as Part-of-Speech (POS) Tagging and Named-Entity Recognition (NER) Tagging. CoreNLP is written in Java and there is support for other languages . I tested a couple of the latest Python wrappers that provide access to CoreNLP but was unable to get them working due to different environment-related complications. Fortunately, with the help of Sawood Alam , our very able Docker  campus ambassador at Old Dominion University, I was able to create a Dockerfile  that installs and runs the CoreNLP server ( version 3.8.0 ) in a container. This eliminated the headaches of installing the server and also provided a simple method of accessing CoreNLP services through HTTP requests. How to run the CoreNLP server on localhost port 9000 from a D

2017-12-03: Introducing Docker - Application Containerization & Service Orchestration

Image
For the last few years, Docker , the application containerization technology, has been gaining a lot of attraction from the DevOps community and lately it has made its way to the academia and research community as well. I have been following it since its inception in 2013. For the last couple years, it has become a daily driver for me. At the same time, I have been encouraging my colleagues to use Docker in their research projects. As a result, we are gradually moving away from one virtual machine (VM) per project to a swarm of nodes running containers of various projects and services. If you have accessed MemGator , CarbonDate , Memento Damage , Story Graph or some other WS-DL services lately, you have been served from our Docker deployment. We even have an on-demand PHP/MySQL application deployment system using Docker for the CS418 - Web Programming course . I ( @ibnesayeed ) have been selected as the @Docker Campus Ambassador for Old Dominion University! /cc @ODU @oducs

2017-11-22: Deploying the Memento-Damage Service

Image
Many web services such as  archive.is ,  Archive-It ,  Internet Archive , and  UK Web Archive  have provided archived web pages or mementos  for us to use. Nowadays, the web archivists have shifted their focus from how to make a good archive to measuring how well the archive preserved the page. It raises a question about how to objectively measure the damage of a memento that can correctly emulate user (human) perception. Related to this,  Justin Brunelle  devised a prototype for measuring the impact of missing embedded resources (the damage) on a web page. Brunelle, in his IJDL paper (and the earlier JCDL version), describes that the quality of a memento depends on the availability of its resources. The straight percentage of missing resources in a memento is not always a good indicator of how "damaged" it is. For example, one page could be missing several small icons whose absence users never even notice, and a second page could be missing a single embedd

2017-08-14: Introducing Web Archiving and Docker to Summer Workshop Interns

Image
Last Wednesday, August 9, 2017, I was invited to give a talk to some summer interns of the Computer Science Department at Old Dominion University. Every summer our department invites some undergrad students from India and hosts them for about a month to work on some projects under a research lab here as summer interns. During this period, various research groups introduce their work to those interns to encourage them to become potential graduate applicants. Those interns also act as academic ambassadors who motivate their colleagues back in India for higher studies. This year, Mr. Ajay Gupta invited a group of 20 students from  Acharya Institute of Technology and B.N.M. Institute of Technology and supervised them during their stay at  Old Dominion University . Like the last year, I was selected from the Web Science and Digital Libraries Research Group again to introduce them with the concept of web archiving and various researches of our lab. An overview of the talk can be fo

2017-02-22: Archive Now (archivenow): A Python Library to Integrate On-Demand Archives

Image
Examples: Archive Now (archivenow) CLI A small part of my research is to ensure that certain web pages are preserved in public web archives to hopefully be available and retrievable whenever needed at any time in the future. As archivists believe that "lots of copies keep stuff safe", I have created a Python library ( Archive Now ) to push web resources into several on-demand archives, such as The Internet Archive , WebCite , Perma.cc , and Archive.is . For any reason, one archive stops serving temporarily or permanently, it is likely that copies can be fetched from other archives. By Archive Now , one command like:     $ archivenow --all www.cnn.com is sufficient for the current CNN homepage to be captured and preserved by all configured archives in this Python library. Archive Now allows you to accomplish the following major tasks: A web page can be pushed into one archive A web page can be pushed into multiple archives A web page can be pushed into all archi

2016-07-21: Dockerizing ArchiveSpark - A Tale of Pair Hacking

Image
"Some doctors prescribe application of sandalwood paste to remedy headache, but making the paste and applying it is no less of a headache." -- an Urdu proverb This is the translation of a couplet from an Urdu poem which is often used as a proverb. This couplet nicely reflects my feeling when Vinay Goel from the Internet Archive was demonstrating how suitable ArchiveSpark was for our IMLS Museums data analysis during the  Archives Unleashed 2.0 Datathon , in the Library of Congress, Washington DC on June 14, 2016. ArchiveSpark allows easy data extraction, derivation, and analysis from standard web archive files (such as CDX and WARC). On the back of my head I was thinking, it seems nice, cool, and awesome to use ArchiveSpark (or Warcbase ) for the task, and certainly a good idea for serious archive data analysis, but perhaps an overkill for a two day hackathon event. Installing and configuring these tools would have required us to setup a Hadoop cluster, Jupyter