Wednesday, July 31, 2019

2019-03-27: Install ParsCit on Ubuntu

ParsCit is a citation parser developed by a joint effort of Pennsylvania State University and National University of Singapore. Over the past ten years, it is been compared with many other citation parsing tools and is still widely used. Although Neural ParsCit has been developed, the implementation is still not as easy to use as ParsCit. In particular, PDFMEF encapsules ParsCit as the default citation parser.

However, many people found that installing ParsCit is not very straightforward. This is partially because it is written in perl and the instructions on the ParsCit website are not 100% accurate. In this blog post, I describe the installation procedures of ParsCit on a Ubuntu 16.04.6 LTS desktop. Installation on CentOS should be similar. The instructions do not cover Windows.

The following steps assume we install ParsCit under /home/username/github.
  1. Download the source code from and unzip it.
    $ unzip
  2.  Install c++ compiler
    $  sudo apt install g++
    To test it, write a simple program and run
    $ g++ -o hello
    $ ./hello
  3. Install ruby
    $ sudo apt install ruby-full
    To test it, run
    $ ruby --version
  4. Perl usually comes with the default Ubuntu installation, to test it, run
    $ perl --version
  5. Install Perl modules, first start CPAN
    $ perl -MCPAN -e shell
    choose the default setups until the CPAN prompt is up:
    Then install packages one by one
    cpan[1]> install Class::Struct
    cpan[2]> install Getopt::Long
    cpan[3]> install Getopt::Std
    cpan[4]> install File::Basename
    cpan[5]> install File::Spec
    cpan[6]> install FindBin
    cpan[7]> install HTML::Entities
    cpan[8]> install IO::File
    cpan[9]> install POSIX
    cpan[10]> install XML::Parser
    cpan[11]> install XML::Twig
    choose the default setups
    cpan[12]> install XML::Writer
    cpan[13]> install XML::Writer::String
  6. Install crfpp (verison 0.51) from source.
    1. Get into the crfpp directory
      $ cd crfpp/
    2. Unzip the tar file
      $ tar xvf crf++-0.51.tar.gz
    3. Get into the CRF++ directory
      $ cd CRF++-0.51/
    4. Configure
      $ ./configure
    5. Compile
      $ make
      This WILL cause an error like below
      path.h:26:52: error: 'size_t' has not been declared
           void calcExpectation(double *expected, double, size_t) const;
      Makefile:375: recipe for target 'node.lo' failed
      make[1]: *** [node.lo] Error 1
      make[1]: Leaving directory '/home/jwu/github/ParsCit-master/crfpp/CRF++-0.51'
      Makefile:240: recipe for target 'all' failed
      make: *** [all] Error 2
      This is likely caused by missing the following two lines in node.cpp and path.cpp. Add these two lines before other include statements, so the beginnings of either file look like
      #include "stdlib.h"
      #include <iostream>
      #include <cmath>
      #include "path.h"
      #include "common.h"

      then run ./configure and "make" again.
    6. Install crf++
      $ make clean
      $ make
      This should rebuld crf_test and crf_learn.
  7. Move executables to where parscit expects to find them.
    $ cp cp crf_learn crf_test ..
    $ cd .libs
    $ cp -Rf * ../../.libs
  8. Test ParsCit. Under the bin/ directory, run
    $ ./ -m extract_all ../demodata/sample2.txt
    $ ./ -i xml -m extract_all ../demodata/E06-1050.xml

Tuesday, July 30, 2019

2019-07-30: SIGIR 2019 in Paris Trip Report

ACM SIGIR 2019 was held in Paris, France July 21-25, 2019 in the conference center of the Cite des sciences et de l'industrie. Attendees were treated to great talks, delicious food, sunny skies, and warm weather. The final day of the conference was historic - Paris' hottest day on record (42.6 C, 108.7 F).
There were over 1000 attendees, including 623 for tutorials, 704 for workshops, and 918 for the main conference. The acceptance rate for full papers was a low 19.7%, with 84/426 submissions accepted. Short papers were presented as posters, set up during the coffee breaks, which allowed for nice interactions among participants and authors. (Conference schedule - contains links to videos of many of the talks)

Several previously-published ACM TOIS journal papers were invited for presentation as posters or oral presentations. We were honored to be invited to present our 2017 ACM TOIS paper, "Comparing the Archival Rate of Arabic, English, Danish, and Korean Language Web Pages" (Alkwai, Nelson, Weigle) during the conference.

Opening Reception

On Sunday, the conference opened with the Doctoral Consortium, tutorials, and a lovely reception at the Grande Galerie de l'Evolution.

Keynote 1 (July 22)

The opening keynote was given by Bruce Croft (@wbc11), Director of UMass' Center for Intelligent Information Retrieval, on the "Importance of Interaction in Information Retrieval" (slides).
Croft began with categorizing two IR research communities: CS as system-oriented and IS as user-oriented
From there, he gave an overview of interaction in IR and pointed to questions and answers (and conversational recommendation) as an essential component of interactive systems. Asking clarifying questions is key to a quality interaction.  Interaction in IR requires a dialogue.

I appreciated the mentions of early infovis in IR.

I'll let these tweets summarize the rest of the talk, but if you missed it you should watch the video when it's available (I'll add a link).

SIRIP Panel (July 23)

The SIGIR Symposium on IR in Practice (SIRIP) (formerly known as the "SIGIR industry track")  panel session was led by Ricardo Baeza-Yates and focused on the question, "To what degree is academic research in IR/Search useful for industry, and vice versa?"

The panelists were:
It was an interesting discussion with nice insights into the roles of industrial and academic research and how they can work together.

    Women in IR Session (July 23)

    The keynote for the Women in IR (@WomenInIR) session was given by Mounia Lalmas (@mounialalmas) from Spotify.

    This was followed by a great panel discussion on several gender equity issues, including pay gap and hiring practices.

    Banquet (July 23)

    The conference banquet was held upstairs in the Cite des sciences et de l'industrie.

    During a break in the music, the conference award winners were announced:

    Best Presentation at the Doctoral Consortium: From Query Variations To Learned Relevance Modeling
    Binsheng Liu (RMIT University)

    Best Short Paper: Block-distributed Gradient Boosted Trees
    Theodore Vasiloudis (RISE AI, @thvasilo), Hyunsu Cho (Amazon Web Services), Henrik Boström (KTH Royal Institute of Technology)

    Best Short Paper (Honorable Mention): Critically Examining the "Neural Hype": Weak Baselines and the Additivity of Effectiveness Gains from Neural Ranking Models
    Wei Yang (University of Waterloo), Kuang Lu (University of Delaware), Peilin Yang (No affiliation), Jimmy Lin (University of Waterloo)

    Best Paper (Honorable Mention): Online Multi-modal Hashing with Dynamic Query-adaption
    Xu Lu (Shandong Normal University), Lei Zhu (Shandong Normal University), Zhiyong Cheng (Qilu University of Technology (Shandong Academy of Sciences)), Liqiang Nie (Shandong University), Huaxiang Zhang (Shandong Normal University)
    video of talk

    Best Paper: Variance Reduction in Gradient Exploration for Online Learning to Rank
    Huazheng Wang (University of Virginia), Sonwoo Kim (University of Virginia), Eric McCord-Snook (University of Virginia), Qingyun Wu (University of Virginia), Hongning Wang (University of Virginia)
    video of talk

    Test of Time Award: Novelty and Diversity in Information Retrieval Evaluation (pdf)
    Charles L. A. Clarke, Maheedhar Kolla (@imkolla), Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, Ian MacKinnon
    Published at SIGIR 2008, now with 881 citations 

    Keynote 2 (July 24)

    The final keynote was given by Cordelia Schmid (@CordeliaSchmid) from INRIA and Google on "Automatic Understanding of the Visual World" (video).
    She presented her work on understanding actions in video and interaction with the real world.  One interesting illustration was video of a person walking and then falling down.  Without taking enough context into account, a model may classify this as a person sitting (seeing only the result of the fall), but with tracking the action, their model can detect and correctly classify the falling action.

    My Talk (July 24)

    After the final keynote, I presented our 2017 ACM TOIS paper, "Comparing the Archival Rate of Arabic, English, Danish, and Korean Language Web Pages" (Alkwai, Nelson, Weigle) during Session 7B: Multilingual and Cross-modal Retrieval.

    Other Resources

    Check out these other takes on the conference:

    Au Revoir, Paris!


    Updated (2019-10-14):  added links to available videos

    Wednesday, July 17, 2019

    2019-07-17: Bathsheba Farrow (Computer Science PhD Student)

    My name is Bathsheba Farrow.  I joined Old Dominion University as a PhD student in the fall of 2016.  My PhD advisor is Dr. Sampath Jayaratha. I am currently researching various technologies for reliable data collection in individuals suffering from Post-Traumatic Stress Disorder (PTSD).  I intend to use machine learning algorithms to identify patterns their physiological data to support rapid, reliable PTSD diagnosis.  However, diagnosis is only one side of the equation.  I also plan to investigate eye movement desensitization and reprocessing, brainwave technology, and other methods that may actually alleviate or eliminate PTSD symptoms.  I am working with partners at Eastern Virginia Medical School (EVMS) to discover more ways technology can be used to diagnosis and treat PTSD patients.

    In May 2019, I wrote and submitted my first paper related to my PTSD research to the IEEE 20th International Conference on Information Reuse and Integration (IRI) for Data Science: Technological Advancements in Post-Traumatic Stress Disorder Detection:  A Survey.  The paper was accepted in June 2019 by the conference committee as a short paper.  I am currently scheduled to present the paper at the conference on 30 July 2019.  The paper describes brain structural irregularities and psychophysiological characteristics that can be used to diagnosis PTSD.  It identifies technologies and methodologies used in past research to measure symptoms and biomarkers associated with PTSD that has or could aid in diagnosis.  The paper also describes some of the shortcomings past research and other technologies that could be utilized in the future studies.

    While working on my PhD, I also work full-time as a manager of a branch of military, civilian and contractor personnel within Naval Surface Warfare Center Dahlgren Division Dam Neck Activity (NSWCDD DNA).  I originally started my professional career with internships at the Bonneville Power Administration and Lucent Technologies.  Since 2000, I have worked as a full-time software engineer developing applications for Version, National Aeronautics and Space Administration (NASA), Defense Logistics Agency (DLA), Space and Naval Warfare (SPAWAR) Systems Command, and Naval Sea Systems Command (NAVSEA).  I have used a number of programming languages and technologies during my career including, but not limited to, Smalltalk, Java, C++, Hibernate, Enterprise Architect, SonarQube, and HP Fortify.

    I completed  a Master’s degree in Information Technology through Virginia Tech in 2007 and a Bachelor of Science degree in Computer Science at Norfolk State University in 2000.  I also completed other training courses through my employers including, but not limited to, Capability Maturity Model Integration (CMMI), Ethical Hacking, and other Defense Acquisition University courses.

    I am originally from the Hampton Roads area.  I have two children, with my oldest beginning her undergraduate computer science degree program in the fall 2019 semester.

    --Bathsheba Farrow

    Monday, July 15, 2019

    2019-07-15: Lab Streaming Layer (LSL) Tutorial for Windows

    First of all, I would like to give credit to Matt Gray for going through the major hassle in figuring out the prerequisites and for the awesome documentation provided on how to Install and Use Lab Streaming Layer on Windows.
    In this blog, I will guide you how to install open source Lab Stream Layer (LSL) and stream data (eye tracking example using PupilLabs eye tracker) to NeuroPype Academic edition. Though a basic version of LSL is available along with NeuroPype, you will still need to complete following prerequisites before installing LSL.
    You can find installation instructions for LSL at The intention of this blog is to provide an easier and more streamlined step-by-step guide for installing LSL and NeuroPype.
    LSL is low-level technology for exchange of time series between programs and computers.

    Figure: LSL core components

    Christian A. Kothe, one of the developers of LSL, has a YouTube video in which he explains the structure and function of LSL.
    Figure: LSL network overview
    Installing Dependencies for LSL: LSL need to be built and installed manually using CMakeWe will need a C++ compiler to install LSL. We can use Visual Studio 15 2017 as  C++ compiler. In addition to CMake and Visual Studio, it is required to install Git, Qt, and Boost prior to LSL installation. Though Qt and Boost are not required for the core liblsl library, they are required for some of the apps used to connect to the actual devices.

    Installing Visual Studio 15.0 2017: Visual Studio can be downloaded and installed from You must download Visual Studio 2017 since other versions (including latest 2019) does not work when building some of the dependencies.  You can select community version as it is free.
    VS version - 2017

    The installation process will ask which Workloads you want to install additionally. Select the following Workloads to install. 
            1. .NET desktop development
            2. Desktop development with C++
            3.  Universal Windows Platform development

    Figure: Workloads need to be installed additionally
       Installing Git: Git is open source distributed version control system. We will use Git to download the LSL Git repository. Download Git for Windows from Continue the installation with default settings except feel free to choose your own default text editor (vim, notepad++, sublime, etc) to use with git.In addition, when encountered Adjust your PATH environment page, make sure to choose the Git from the command line and also from 3rd-party software option in order to execute git commands using command prompt, python prompts, and other third party software.

    Installing CMake:
    Figure: First interface of CMake Installer
    CMake is a program for building/installing other programs onto an OS. 
    You can download CMake from Choose the cmake-3.14.3-win64-x64.msi file to download under Binary distributions
    When installing, feel free to choose the default selections, except, when prompted, choose Add CMake to the system PATH for all users.

    Installing Qt:
    Qt is a GUI generation program mostly used to create user interfaces. Some of the LSL apps use this to create user interfaces for the end user to interact with when connecting to the device. 
    Open-source version can be downloaded and installed from An executable installer for Qt is provided so installing should be easy. 

    You will be asked to enter details of a Qt account in the install wizard. You can either create or log in if you have an account already. 
    Figure: Qt Account creation step

    -      During the installation process, select defaults for all options except in the Select Components page, select the following to install:
    o Under Qt 5.12.3:
    §  MSVC 2017 64-bit
    §  MinGW 7.3.0 64-bit
    §  UWP ARMv7 (MSVC 2017)
    §  UWP x64 (MSVC 2017)
    §  UWP x86 (MSVC 2017)
    Figure: Select Components to be installed in Qt

    The directory that you need for installing LSL is  C:\Qt\5.12.3\msvc2017_64\lib\cmake\Qt5

    Installing Boost
    Boost is a set of C++ libraries which provides additional functionalitis to C++ coding. Boost also needs to be compiled/installed manually. The online instructions for doing this is at
    You can download Boost from Select the downloaded file and extract it directly into your C:\ drive. Then, open a command prompt window and navigate to C:\boost_1_67_0 folder using cd C:\boost_1_67_0 command.

    Then execute 
    1. bootstrap 
    2. .\b2 
    commands one after the other.
    Figure: Executing bootstrap and .\b2 commands

    Figure: After Executing bootstrap and .\b2 commands

    The directory that you need for installing LSL is C:\boost_1_67_0\stage\lib

    Installing Lab Streaming Layer: Clone lab streaming layer repository from Github into your C:\ drive.

    In a command prompt, execute following commands. 
    1. cd C:\
    2. git clone --recursive
    Make a build directory in the labstreaminglayer folder
    3. cd labstreaminglayer
    4. mkdir build && cd build

    Configure lab streaming layer using CMake

    5. cmake C:\labstreaminglayer -G "Visual Studio 15 2017 Win64"  

    The above command configures LSL, defines which Apps are installed, and tell LSL where the Qt, Boost, and other dependencies are installed.
         i.     C:\labstreaminglayer is the path to the lab streaming layer root directory (where you cloned LSL from Gihub)
                                        ii.     The –G command defines the compiler used to compile LSL ( We use Visual Studio 15 2017 Win64)
                                        iii.     –D is the command for additional options.
    1.     –DLSL_LSLBOOST_PATH à Path to the LSL Boost directory
    2.     –DQt5_DIR à Path to Qt cmake files
    3.     –DBOOST_ROOT à Path to installed boost libraries
    4.   –DLSLAPPS_<App Name>=ON à These are the Apps located in the Apps folder (C:\labstreaminglayer\Apps) that you want installed. Just add the name of the folder within the Apps folder that you want installed directly after –DLSLAPPS_ with no spaces 
    Build (install) lab streaming layer using CMake
    6. cd ..
    7. mkdir install

          8. cmake --build C:\labstreaminglayer\build --config Release --target C:\labstreaminglayer\install

    Now, that the LSL installation is complete, we will have a look at the LabRecorder. Labrecorder is the main LSL program to interact with all the streams. You can find the LabRecorder program at C:\labstreaminglayer\install\LabRecorder\ LabRecorder.exe.

    The interface of LabRecorder  looks like following.
    Figure: LabRecorder interface when PupilLabs is streaming

    The green color check box entries below Record from Streams are the PupilLabs’(eye tracking device) streams. When all programs are installed and running for each respective device, the devices’ streams will appear as above under Record from Streams.You can check your required data stream from the devices listed, then just press Start to begin data recording from all the devices. The value under Saving to on the right specify where the data files (in XDF format) will be saved.

    Installing PupilLabs LSL connection: There are many devices which could be connected with LSL. Muse EEG device, Emotive Epoc EEG device, and PupiLabs core eye tracker are some of them. The example below shows how to use PupilLabs core eye tracker with LSL for streaming data to NeuroPype.

    Figure : PupilLabs core eye tracker, Source -

    Let us first begin with Setting Up PupilLabs core eye tracker. You can find instructions for using and developing with PupilLabs here. I’ll provide some steps to setup everything from start to finish to work with LSL below though.  The LSL install instructions for PupilLabs is at

    To setup PupilLabs Eyetracker, first you have to download PupilLabs software from You can go ahead and choose pupil_v1.11-4-gb8870a2_windows_x64.7z file and unzip it into your C:\ drive. You may need 7z unzip program for unzipping. Then, you just have to plug in the PupilLabs eye tracker to your computer. It will automatically begin to install drivers for the hardware.

    After that, you can run the Pupil Capture program located at: C:\pupil_v1.11-4-gb8870a2_windows_x64\pupil_capture_windows_x64_v1.11-4-gb8870a2\pupil_capture.exe with Administrative Privileges so that it can install the necessary drivers. Next, you can follow the instructions in to setup, calibrate, and use the eye tracker with the Pupil Capture program.

    Connect PupilLabs with LSL: Build liblsl-Python in a Python or Anaconda Prompt. You could do with your command prompt as well. Execute following commands:
    1. cd C:\labstreaminglayer\LSL\liblsl-Python
    2. python build

    Then, you have to install LSL as plugin in Pupil Capture program. 
    a.     In the newly created C:\labstreaminglayer\LSL\liblsl-Python\build\lib folder, copy the pylsl folder and all its contents into the C:\Users\<user_profile>\pupil_capture_settings\plugins folder (replace <user_profile> with your Windows user profile).
    b.     In the C:\labstreaminglayer\Apps\PupilLabs folder, copy into the C:\Users\<user_profile>\pupil_capture_settings\plugins folder.
    Figure: Original Location of

    Figure: After copying and pylsl folder into C:\Users\<user_profile>\pupil_capture_settings\plugins folder

    If the pylsl folder does not have lib folder containing liblsl64.dll, there is a problem with pylsl build. As an alternative approach, install pylsl via pip by running pip3 install pylsl command in command prompt. Make sure you have installed pip in your computer prior running these commands in your command prompt. You can use pip3 show pylsl command to see where is the pylsl module built in your compute. This module will include the pre-built library files. Copy this newly created pylsl module to the C:\Users\<user_profile>\pupil_capture_settings\plugins folder. 
    In this example,  pylsl module  was installed in C:\Users\<user_profile>\AppData\Local\Python\Python37\Lib\site-packages\pylsl folder. It includes a lib folder which contains 
     Figure: pylsl module's installation location when used pip3 install pylsl command
    As the next step, launch pupil_capture.exe and enable Pupil LSL Relay from Plugin manager in Pupil Capture – World window.

    Figure: Enabled Pupil LSL Realy from Plugin Manager
    Now when you hit the R button on the left of World window, you start recording from PupilLabs while streaming it to the LSL.  In Labrecorder, you could see the streams in green color (see Figure LabRecorder interface when PupilLabs is streaming).
    Now, let's have a look at how to get data from LSL to Neuropype.

    Getting Started with Neuropype and Pipeline Designer:
    First, you have to download and install the NeuroPype Academic Edition (neuropype-suite-academic-edition-2017.3.2.exe) from The NeuroPype Academic Edition includes a Pipeline Designer application, which you can use to design, edit, and execute NeuroPype pipelines using a visual ‘drag-and-drop’ interface. 

    Before launching Neuropype Pipeline Designer, make sure that NeuroPype Server is running on background. If not, you can run it by double clicking on NeuroPype Academic icon. You can set to launch NeuroPype server on startup as well. 
    The large white area in the following screenshot is the ‘canvas’ that shows your current pipeline, which you can edit using drag-and-drop and double-click actions. On the left you have the widget panel, which has all the available widgets or nodes that you can drag onto the canvas.

    Create an Example Pipeline: Select LSL Input node in Network (green) section, Dejitter Timestamp in Utilities section (light blue), Assign Channel Locations in Source Localization section (Pink), Print To Console in Diagnostics section (pink) from widget panel.
    Canvas looks like Fig. Pipeline created in Neuropype after creating the pipeline. After getting the nodes to the canvas, you can connect them using the dashed curved lines on both sides of them. Double click on either of the dashed line of one node and drag the connecting line to a dashed curved line of the other node. It will create a connection between two nodes named Data.
    You can hover the mouse over any section or widget or click on a widget on canvas and see a tooltip that briefly summarizes it. 

    Figure: Pipeline created in Neuropype
    Start Processing
    LSL is not only a way to get data from one computer to another, but also to get data from your EEG system, or any other kind of sensor that supports it, into NeuroPype. You can also use it to get data out of NeuroPype into external real-time visualizations, stimulus presentation software, and so on.
    Make sure that the LSL Input node has a query string that matches that sensor. For instance, if you use PupilLabs, you need to enter type=’Pupil Capture’ as below. Then NeuroPype will pick up data from the PupilLabs eye tracker.
    Figure: Set up type of LSL Input
    To launch the current patch, click the BLUE pause icon in the toolbar (initially engaged) to unpause it. As a result, the Pipeline Designer will ask the NeuroPype server to start executing your pipeline. This will print some output. 

    Congratulations! You successfully set up LSL, PupilLabs and NeuroPype Academic version. Go ahead and experiment with your EEG system, or any other kind of sensor that supports LSL and NeuroPype.

    Feel free to tweet @Gavindya2 if you have any questions about this tutorial or need any help with your installation.

    --Gavindya Jayawardena (@Gavindya2)