2020-09-17: IEEE International Conference on Information Reuse and Integration for Data Science (IRI) 2020 Trip Report

The 21st International Conference on Information Reuse and Integration for Data Science (IRI 2020) was held virtually (due to the COVID-19 pandemic) instead of Las Vegas as originally planned. IRI 2020 was hosted by the University of Cincinnati, USA, between August 11 - 13, 2020. This conference explores three major tracks: (1) information reuse, (2) information integration, and (3) reusable systems. This conference serves as a forum for researchers and practitioners from academia, industry, and government to present, discuss, and exchange ideas that address real-world problems with real-world solutions. Both theoretical and applied papers are included in IRI 2020. Similar to last year's conference (IRI 2019), IRI 2020 program included paper sessions, workshops, panels and keynote speeches. Most of them were held in parallel sessions.

Day 1


Following Stuart Rubin's welcome note, Chengcui Zhang from University of Alabama at Birmingham, USA (@UofAlabama) introduced the general co-chairs, program co-chairs, and keynote speakers during her opening remarks. She mentioned the regular research paper acceptance rate of this year's conference is 28%.

Keynote 1

Prof. Elisa Bertino (Samuel D. Conte Professor) from Computer Science, Purdue University, USA delivered the first keynote of IRI 2020, titled "Security, Privacy and Safety in the IoT - Research Roadmap". Prof. Bertino, in her keymote, explained the risks associated with of IoT devices in terms of security, and how wearable devices such as contact lenses could invade privacy. She intoducing the LTEInspector, their approach for adversarial testing of 4G LTE and mentioned that they have uncovered 10 new attacks during the testing procedure of LTEInspector.

Session A11 - AI and Security

This session began with Kanwardeep Singh Walia from California State University, Sacramento, presenting the first full paper of the session, titled: "An Empirical Analysis on the Usability and Security of Passwords". The work explores the how secure are the passwords. They have identified repeating passwords, dictionary words as passwords, and short passwords as weak passwords. Key takeaways of their work shows that for passwords to be more secure, they should have higher pronounceability as well as higher Shannon entropy, Passphrase could be better than passwords, and compound words-passwords are better than single word passwords.

Next, Clifford Kemp from Florida Atlantic University presented their full paper presentation titled: "Detection Methods of Slow Read DoS Using Full Packet Capture Data". He explained their experimental procedure and data collection procedure. For this study, they have used eight classification algorithms including Random Forest, Decision Trees, SVM, Neural Networks, etc. to build predictive models to detect Slow Read DoS. They have applied Correlation feature selection, Consistency subset, and Single attribute to select features. Their final feature set has consist of eight features. Their results show that six out of eight predictive models performed well in detecting Slow Read HTTP DoS attacks.

Afterwards, Lydia Bouzar from Benlabiod (ESI) presented their full paper presentation titled: "RNN-VED for Reducing False Positive Alerts in Host-based Anomaly Detection Systems". She explained the architecture and the procedure of sequence to sequence model. For their study, they have used one class classification since their task was anomaly detection. Their method reports a BLEU score (used in NLP) of ~90-99%.

Finally, Ibrahim Yilmaz from Tennessee Tech University presented their short paper presentation, titled: "Addressing Imbalanced Data Problem with Generative Adversarial Network For Intrusion Detection". Their study has consisted of three steps: the dataset analysis, training of their neural network, and implementation of GAN. Their results have a higher accuracy due to the imbalanced dataset they used.

Session A21 - Computer Vision

This session began with Zhigang Zhu from The City College of New York, presenting their full paper, titled: "Multimodal Information Integration for Indoor Navigation Using a Smartphone". Next, I (Gavindya Jayawardena) presented our (Dr. Sampath Jayarathna) full paper presentation titled: "Automated Filtering of Advanced Eye Gaze Metrics from Dynamic Areas of Interest". We proposed a novel method to dynamically filter eye movement data from Areas of Interests (AOIs) for the analysis of advanced eye gaze metrics since the existing tools to define AOIs to extract eye movement data for the analysis of gaze measurements, require users to draw boundaries of AOIs on eye tracking stimuli manually or use markers to define AOIs in the space. In this study, we incorporate pre-trained object detectors for offline detection of dynamic AOIs in dynamic eye-tracking stimuli such as video streams. We presented our implementation and evaluation of object detectors to find the best object detector to be integrated in a real-time eye movement analysis pipeline to filter eye movement data that falls within the polygonal boundaries of detected dynamic AOIs. 

Presentation Slides of Automated Filtering of Advanced Eye Gaze Metrics from Dynamic Areas of Interest

Presentation Video of Automated Filtering of Advanced Eye Gaze Metrics from Dynamic Areas of Interest

Then, Trang Thanh Quynh Le from University of St. Thomas presented their full paper presentation titled: "Dynamic image for micro-expression recognition on region-based framework". They have used three databases, CASME-II, SMIC, and SAMM for their study. Finally, Fei Zhao from The University of Alabama at Birmingham presenting their full paper presenation, titled: "Building Damage Evaluation from Satellite Imagery using Deep Learning". They have used two methods for the building Damage Evaluation: Mask R-CNN models trained on COCO dataset to detect buildings which were destroyed, and Siamese based framework.

Session A31 - Machine Learning and Data Mining I

This session began with Srikanth Amudala from Concordia University presenting the first paper of the session, titled: "Background Subtraction with a Hierarchical Pitman-Yor Process Mixture Model of Generalized Gaussian Distributions". They have been able to subtract backgrounds successfully. Next, Mahsa Amirkhani from Concordia University presented their paper, titled: "Fully Bayesian Learning of Multivariate Beta Mixture Models". She explained their model and experimental results on malaria cell image categorization. Then, Jayesh Patel from Rockstar Games presented his paper, titled: "The Democratization of Machine Learning Features".
Finally, Paromita S. Nitu from Marquette University presenting their paper, titled: "Identifying Feature Pattern for Weighted Imbalance Data: A Feature Selection Study for Thoracolumbar Spine Fractures in Crash Injury Research".

Day 2

Keynote 2

Day 2 began with the second keynote of IRI 2020. Prof. Bhavani Thuraisingham, Founders Chair Professor of Computer Science, University of Texas at Dallas, USA delivered the second keynote, titled "SecAI: Integrating Cyber Security and Artificial Intelligence with Applications in Internet of Transportation and Infrastructures". She explained data Science for cyber security applications while introducing an ensemble of models which could block if an attack detected in a continuous stream of data. The introduced ensemble of models could be applied to detect insider threats as well. She also explained Internet of Things (IoT) in the domain of transportation. She introduced privacy-aware policy-based data management framework for Internet of transportation systems.

Session B31 - Machine Learning and Data Mining II

This session began with Salim Sazzed from Old Dominion University presenting the first paper of the session, titled: "Development of Sentiment Lexicon in Bengali utilizing Corpus and Cross-lingual Resources". He proposed an approach which consist of steps for the Development of Sentiment Lexicon in Bengali. He has used two datasets for the evaluation of the proposed method; Drama review dataset and News comments dataset. His results show that proposed lexicons have higher coverage in both document & word level, and it can be extended to other languages as well. Next, Tianyi Wang from Florida International University presented their paper presentation, titled: "Multi-Label Multi-Task Learning with Dynamic Task Weight Balancing". They proposed a novel deep multi-label task attention network that utilizes multi-task learning to solve multi-label video Information retrieval tasks. For the evaluation, they have created a disaster video dataset using the videos of 7 different types from YouTube. This video dataset was class imbalanced. Their results show that text specific network help their model to learn better. Afterwards, Michael J Mior from Rochester Institute of Technology, presented their paper, titled: "Semantic Data Understanding With Character Level Learning".
Then, Maria Diaz from California State University, Fullerton, presented their paper titled "Natural Language-based Integration of Online Review Datasets for Identification of Sex Trafficking Businesses". They have used two datasets: one with mostly illegal businesses data, and the other one with mostly legitimate business data. Three steps of their approach consist of data integration, processing natural language, and machine learning. Their results show similar classification performance among the machine learning models. They also have observed slight decrease of accuracy of machine learning models and drastic decrease of the running time, when sparse words are removed.
Finally, Yiu-Kai Ng from Brigham Young University, presented their paper, titled: "Using a Deep Learning Model, Content Features, and Author Metadata to Recommend Research Papers".

Virtual Banquet and Awards Ceremony

Day 2 of IRI 2020 ended with the Virtual Banquet and Awards Ceremony. The paper, titled: "Background Subtraction with a Hierarchical Pitman-Yor Process Mixture Model of Generalized Gaussian Distributions" by Srikanth Amudala, Samr Ali, and Nizar Bouguila from Concordia University, received an honorable mention.

Following papers won the best paper award and the best student paper award.

Day 3

Panel: IRI for Data Science - Ubiquitous AI and Reuse 

Day 3 of IRI 2020 started with panel, "IRI for Data Science - Ubiquitous AI and Reuse". Session chair, Stuart H. Rubin from SPAWAR Systems Center Pacific (SSC-Pacific), started the session. Prof. Bhavani Thuraisingham and Prof. Elisa Bertino were participants of this session. Elaborating on Reuse, Prof. Bhavani Thuraisingham talked about the possibility of reusing the components of AI while adopting to other use cases, and the security challenges such as how to identify malware using machine learning. Prof. Elisa Bertino talked about machine learning and security, while explaining transfer learning as an example of reuse of models.

Data Integration and Mining (DIM) Workshop

The 9th Data Integration and Mining (DIM) Workshop of IRI 2020 started with Christopher Giossi's presentation on "Towards Agile Integration: Specification-based Data Alignment". The presentation focused on how to align data in two different datasets when they have two different intervals in time. Next, Lacramioara Mazilu from University of Manchester presented "Fairness in Data Wrangling". She explained un-fairness, how to tackle fairness, and dataset fairness metrics (i.e. sample size, proxy attributes, etc.). She also explained their proposed approach step by step to introduce fairness during data wrangling.
Afterwards, Abdullah Alhejaili presented "Latent Features Modelling for Recommender Systems". Finally, Sudarshan S. Chawathe from University of Maine, presented "Mining Frequent Differences in File Collections". He explained how to use small differences of documents to understand the differences in File Collections. In the presented work, by only looking at the frequent small differences, he generates document subgraphs to view the differences. His claimed that his approach makes it easier to visualize the difference in a collection of documents.

Session C32 - Novel Applications

Novel applications was the final paper session of IRI. Starting this session, Long Cheng from ABB Inc. presented their paper presentation, titled: "Ultra Wideband Indoor Positioning System based on Artificial Intelligence Techniques". He introduced real-time ultra wideband indoor positioning system, elaborating on the positioning calculation, how it detects outliers, and how they correct the range measurement using regression analysis. According to Long Cheng, their goal is for robots navigation. Next, Ye Qiu from Peking University presented their paper presentation, titled: "BusinessDetect: An Advanced Business Information Mining Application for Intelligent Marketing". They propose BusinessDetect, a system for business data with data storage module, data analysis module, and user interface to select and view information on companies. They have evaluated the proposed system using a dataset with company descriptions and news articles. They have been able to successfully mine insightful data using various data mining methods. Then, Vincent Santore from the BurrProject presented their paper titled; "A Comparison of Machine Learning Algorithms Applied to American Legislature Polarization". They have used four datasets for the evaluation and according to their results, the best performing model has been the Neural Network. Finally, Safwa Ameer from the University of Texas at San Antonio, presented their paper, titled: "The EGRBAC Model for Smart Home IoT Access Control". The objective of their model is to change the permission of current active session of the user according to his/her role. He explained the system using a sample use case, where kids are only permitted to access entertainment devices on weekends. With the Novel Applications session, IRI 2020 officially came to an end.
-- Gavindya Jayawardena (@Gavindya2)