2023-11-22: IEEE International Conference on Information Reuse and Integration for Data Science (IRI) 2023 Trip Report

 


The 24th IEEE International Conference on Information Reuse and Integration for Data Science (IRI 2023) took place at the University of Washington, Bothell campus between August 4 - 6, 2023. We (Yasasi and Bathsheba) attended the IRI 2023 conference in person and virtually to present our research work.

The IRI conference provides a platform for scholars and professionals from academia, industry, and government to come together to showcase, deliberate, and share ideas exploring three major tracks: information reuse, information integration, and reusable systems. This year, the full research paper acceptance rate of IRI was 29%. Researchers from 18 countries and 5 continents (North America, Europe, Africa, Asia, and Australia) submitted their work to IRI 2023 research and poster tracks.

Conference Venue - University of Washington, Bothell Campus

Day 1

Keynote 1: Dr. Cissy Ma

Day 1 of the conference started with the keynote by Dr. Cissy Ma, a research engineer at the Water Infrastructure Division, U.S. Environmental Protection Agency. Her speech was titled "Transforming Urban Water Systems Towards A More Sustainable Future" and she presented a design of an urban water system for cities of the future to emphasize how different water system services can be organized in an organic entirety, and how such paradigm shift designs are substantially more efficient than existing centralized water systems. 

During her keynote, Dr. Ma discussed how the water industry is moving toward a more sustainable future with AI and Machine Learning to make the water systems more efficient. She highlighted the need for data scientists to explore potential research avenues to investigate the system changes in the water industry.

Session A: Image Processing

The Image Processing session was chaired by Dr. Yongjin Lu, an associate professor at Oakland University, USA. The session began with Anvaya Rai from the Center for Development of Telematics, New Delhi, India presenting their full paper on "InPosNet: Context-Aware DNN for Visual SLAM". This work provides a novel deep neural network, named InPosNet, which provides a mechanism to compute optimal feature space.

Next, Dr. Yongjin Lu, the session chair, presented "Hybrid Convolutional Autoencoder-Hierarchical Clustering Algorithm To Reveal Image Spam Source". In this work, they constructed a hybrid algorithm that uses convolution autoencoders (CAEs) to extract useful visual features and generated a hierarchical clustering framework for clustering images collected from spam emails using these features.

The final presentation of the session at IRI 2023 was by Dr. Mohammed Ouali from Adrian College, USA. He presented their paper titled "A new similarity measure and hierarchical clustering approach to color image segmentation". In this paper, they proposed a novel hierarchical clustering algorithm for color image segmentation that addresses the limitations of traditional clustering methods, such as partitioning and density-based approaches, in identifying natural clusters in datasets with elliptical and chained shapes.

Session B: Sequential Data/Time Series

The second paper session of the conference, "Sequential Data/Time Series", started after the lunch break. This session began with Somayeh Ghanbarzadeh from the University of North Texas, USA, presenting their full paper on "Improving the Reusability of Pre-trained Language Models in Real-world Applications". This work is a collaboration between Microsoft Research and the University of North Texas. This work proposes a training approach called Mask-tuning, which integrates Masked Language Modeling (MLM) training objectives into the fine-tuning process to enhance Pre-trained Language Models' generalization. This paper won the best paper award at IRI 2023.

Next, Abdulkareem Alsudais from PSAU, Saudi Arabia, presented their paper, "Comparing Open Arabic Named Entity Recognition (NER) Tools". They compared and evaluated the performance of three open Arabic NER tools: CAMeL, Stanza, and Hatmi. In this work, they improved the results using two approaches; merging the three NER tools and a voting approach. Their results indicate that merge or vote methods yield better performance than using CAMeL, Stanza, or Hatmi, individually.

Finally, the "Completeness of Natural Language Requirements: A Comparative Study of User Stories and Feature Descriptions" paper was presented at the Sequential Data/Time Series session by Dr. Nan Niu from the University of Cincinnati, USA. They proposed to measure the completeness of textual requirements based on a universal linguistic theory, namely Fillmore's frame semantics.

Session C: EM-RITE Workshop

The 12th IEEE International Workshop on Empirical Methods for Recognizing Inference in Text (IEEE EM-RITE 2023) was held after the evening coffee break. The EM-RITE workshop aims to provide a forum for original high-quality research contributions on empirical methods for recognizing inference in text as well as multidisciplinary research opportunities in conjunction with IRI 2023This session was chaired by Dr. Min-Yuh Day, Associate Professor at National Taipei University, Taiwan. 

The first presentation, "Enhancing Model Explainability in Financial Trading Using Training Aid Samples: A CNN-Based Candlestick Pattern Recognition Approach" was presented by Yun-Cheng Tsai from National Taiwan Normal University, Taipei, Taiwan. This work presents a framework that enhances the explainability of CNN-based candlestick pattern recognition models.

Next, Prof. Shih-Hung Wu from Chaoyang University of Technology, Taiwan presented their paper titled, "Integrating Sarcastic Language Datasets in Various Standards for Sarcasm Detection". This research explores the “generalizability” of sarcastic datasets by comparing six sarcastic datasets and a classification model trained by RoBERTa to investigate the generalizability among the datasets.

Next, Prof. Min-Yuh Day from National Taipei University, Taiwan presented "Speech-to-speech Low-resource Translation". This work addresses the lack of comprehensive research on speech-to-speech translation (S2ST) for low-resource languages. They conducted a systematic review of existing literature on S2ST for low-resource languages.

The final presentation of day 1, "Building and Validating a Clinical Ultrasound Image Reporting Model" by Prof. Kuo-Chung Chu from the National Taipei University of Nursing and Health Sciences, Taiwan. In this work, they developed a deep learning-based Encoder-Decoder model for generating reports from ultrasound images. 

Day 2

Keynote 2: Dr. Taghi M. Khoshgoftaar

Dr. Taghi M. Khoshgoftaar, Motorola Professor, Department of Electrical Engineering and Computer Science, Florida Atlantic University, USA, delivered the second keynote speech of the conference. Dr. Khoshgoftaar has been active in IRI since 2004, by publishing papers, delivering keynotes, and chairing the conference program. His speech was titled "Evaluating Machine Learning Algorithms with Highly Imbalanced Big Data". 

During his keynote, he discussed the challenges related to big data, class-imbalanced data, and performance evaluation. In his talk, he showed their evaluation of different random undersampling (RUS) levels on distinct highly imbalanced datasets on medicare fraud detection. He demonstrated how the area under the receiver operating characteristic curve (AUC) can cause potentially misleading outcomes. He also suggested considering the area under the precision-recall curve (AUPRC) for evaluating imbalanced big data.

Session D: AI for Human Activities Recognition

The next session, AI for Human Activities Recognition, began after the keynote speech and was chaired by Dr. Wentao Wang from Oracle. Our paper on "Gaze Analytic Dashboard for Distributed Eye Tracking" was presented at the AI for Human Activities Recognition session. In this paper, we (Yasasi, Bhanuka, Gavindya, Mohan, Dr. Vikas Ashok, and Dr. Sampath Jayarathna) introduced a distributed multiuser eye tracking system with advanced gaze measures and traditional gaze measures. This work is an extension of our previous work, DisETrac: Distributed Eye-Tracking for Online Collaboration which was published in the Proceedings of the Conference on Human Information Interaction and Retrieval (CHIIR) conference in March 2023. The system proposed in this work extracts eye movements from multiple participants utilizing common off-the-shelf eye trackers, generates real-time traditional positional gaze measures and advanced gaze measures such as ambient-focal coefficient K, and displays them in an interactive dashboard. We conducted a pilot user study with 10 participants to evaluate the user attention in an online collaborative jigsaw puzzle-solving task and to evaluate our dashboard using a user experience questionnaire (UEQ).

Next, Dr. Sandeep Roy, post-doctoral research associate at VMASC, Old Dominion University (ODU), presented their paper "A Novel Smartphone-Based Human Activity Recognition Approach using Convolutional Autoencoder Long Short-Term Memory Network". In this paper, they proposed a novel framework (CAEL-HAR), that combines CNN, Autoencoder, and long short-term memory network architectures for efficient smartphone-based human activity recognition operation.

Session E: ML and AI

The ML and AI session of the conference started after the lunch break and was chaired by Dr. Chengcui Zhang from the University of Alabama at Birmingham, USA. The first presentation of the session, "Enhancing Noisy Binary Search Efficiency through Deep Reinforcement Learning" by Rui Ma from the University of Miami, USA. In this study, they leverage deep reinforcement learning to approximate the optimal decision strategies. 

Next, Hiroshi Dozono from Saga University, Japan presented their paper titled, "The Method of Making the Low-dimensional Map that Preserves the Distance Relationships from Selected Data Point". In his presentation, Hiroshi discussed their work on Ego-SOM which is a method of making a low-dimensional map that preserves the distance relationships from selected data points.

Session F1: Social Media

Social Media session at IRI 2023 was chaired by Prof. Danda Rawat, a professor at Howard University, USA, and an alumnus of ODU Computer Science. This session began with Amani Alzahrani from Howard University, USA presenting "A Hybrid Deep Learning Architecture for Misinformation Detection on Social Media". In this study, they proposed a hybrid deep learning model that utilizes a Features-Based model combined with pre-trained text embedding models such as Global Vectors for word representation (GloVe) and Universal Sentence Encoders (USE).

Next, Mohammad Shiri from ODU presented their paper titled "Meme It Up: Patterns of Emoji Usage on Twitter" at the Social Media session. In this work, they investigated the impact of emojis in tweets. Dr. Sampath Jayarathna from ODU and WS-DL research group is one of the co-authors of this work. They evaluated the impact of emoji usage on different facets of success within social media by treating the emojis as standardized memes.

Next, Sangkeun Lee presented their poster paper on "Predicting Power Outages During Extreme Weather with EAGLE-I and NWS Datasets" at the social media session. They introduced machine learning models that predict power outage risk at the state level at extreme weather events. This poster paper won the best poster award of the IRI 2023.

The final presentation of the Social Media session at IRI 2023 was a poster paper titled, "Using BERT to Understand TikTok Users' ADHD Discussion". In this work, they analyzed ADHD discussion in TikTok through text analysis using BERT and set up a multi-label classifier to understand the general range of responses. This work was done by Kayla Pineda (an undergraduate student from our NIRDS Lab, ODU) in collaboration with Dr. Anne Perrotti (a professor at the Department of Communication Disorders & Special Education, ODU), Dr. Faryaneh Poursardar (a professor at WS-DL research group, ODU), Dr. Sampath Jayarathna (a professor at WS-DL research group, ODU), and Dani Graber (an undergraduate who participated in our NSF Research Experience for Undergraduates (REU) program at ODU Computer Science).


Session F2: AI for Health and Safety

In parallel to the Social Media session, the AI for Health and Safety session was held on day 2 of the IRI 2023 conference. This session was chaired by Dr. Wentao Wang from Oracle. Bathsheba Farrow from ODU and WS-DL research group presented "A Serverless Electroencephalogram Data Retrieval and Preprocessing Framework". Bathsheba presented a software-as-a-service (SaaS) solution for electroencephalogram (EEG) data retrieval and preprocessing. The system was deployed in the Amazon Web Services (AWS) public cloud and directly interfaces with the OpenNeuro cloud-based data repository.


Day 3

Keynote 3: Dr. Junwei Zhang

The third day of the IRI 2023 kicked off with a keynote speech from Dr. Junwei Zhang, a Senior Software Engineer at Door Dash, USA. Dr. Zhang delivered his speech on "Exploring LLMs in the Food Delivery Sector: Driving Growth and Improving Recommendations". 

He delved into the riveting fusion of state-of-the-art AI and the thriving food delivery tech industry. He highlighted GPT and its pivotal role in accelerating marketplace growth and enhancing food recommendation systems. His speech provided valuable insights into AI's current and potential applications in the industry, and how businesses can harness this technology for continual growth and high customer satisfaction.

Session G: Data Analysis

Following the keynote speech, the data analysis session started. The first presentation of the data analysis session was "Variability and Trend Analysis of a Grid-Scale Solar Photovoltaic Array above the Arctic Circle" by Henry Toal from the University of Alaska Fairbanks, USA. They analyzed data from a small, grid-scale PV array in Kotzebue, Alaska, located above the Arctic Circle 

The next presenter, Dr. Carson Leung from the University of ManitobaCanada presented "A transportation analytic solution for predicting weather-related flight cancellations". In this work, they presented a data science solution, which integrates flight data, weather data, and other related data to determine key factors contributing to flight cancellations.

Social Event and Awards

In the evening of Saturday, August 5th, the social event was held at Bothell North Creek Event Center. During the social event, the best paper, the best poster, and the best reviewers awards were announced.

Best Reviewer Awards

The reviewers of IRI 2023 who provided the most accountable, timely, detailed, and constructive reviews were awarded with best reviewer awards. Vaibhav And from Montclair State University, USA, Xiaoyu Jin, Google, USA, Esmaeil Shakeri, University of Calgary, Canada, and Shan Suthaharan, University of North Carolina at Greensboro, USA won the best reviewer awards.

Best Poster Award

The Best Poster Award went to “Predicting Power Outages During Extreme Weather with EAGLE-I and NWS Datasets" by Sangkeun Lee, Gang Seob Jung, Jong Youl Choi, Anika Tabassum, Nils Stenvig, and Supriya Chinthavali.

Best Paper Award

The best paper award at IRI 2023 was awarded to "Improving the Reusability of Pre-trained Language Models in Real-world Applications" by Somayeh Ghanbarzadeh, Hamid Palangi, Yan Huang, Radames Cruz Moreno, and Hamed Khanpour.

Wrap-up

After three years of conducting virtual IRIs, the IRI conference for this year took place in person. I was excited to present my first full paper as the first author in front of a live audience. I had the privilege of engaging with academic experts in the data science community from around the globe.  The IRI 2023 conference, situated in Bellevue, Washington, offered me the chance to immerse myself in the state's natural beauty and diverse cultural experiences. I would like to express my gratitude to my co-authors, advisor, and the NSF Career Grant for supporting this work which I presented at IRI 2023. Additionally, I extend thanks to IEEE TCMC for their travel support in facilitating my attendance at IRI 2023.

-- Yasasi Abeysinghe (@Yasasi_Abey)

Comments