2022-05-20: EEG Data Standardization

Introduction

Clinicians and researchers continue to look at more and more at ways to utilize Electroencephalography (EEG) data to support the diagnosis and treatment of various physical and mental health conditions.  However, not long after being involved in EEG related research, you will soon realize that there are numerous formats available for EEG data recordings.  The lack of EEG data format consistency continues to present challenges in the transmission, exchange, and analysis of such data amongst researchers and impedes advancements in brain-computer interface (BCI) and biomedical research.

At a minimum, EEG data consistency would reduce confusion amongst researchers and clinicians who rely on data sharing.  However, the benefits of EEG data standardization also include the reduction or elimination of expensive, time-consuming, and labor-intensive data conversions to other formats.  This is very significant as it supports interdisciplinary, multidisciplinary, and comprehensive diagnosis and treatment plans that often require cross-institutional data exchange and analysis.  This in turn, would lead to decreased healthcare costs, improved quality of care, and increased support for research to better serve the community.  

The reality is that it is not for lack of trying that an EEG data format standard does not exist.  There have been many attempts over the past few decades to establish a universally accepted standard.  This blog will review some of those efforts.

Current EEG Formats

There are a lot of options when it comes to EEG data formats.  As a matter of fact, the Brain Electrical Source Analysis (BESA) Research currently lists at least 48 different EEG file formats it supports but it is not even an all-inclusive list.  Many of these formats are proprietary and, therefore, probably would not best serve as a common format for electrophysiology research.  However, there are some that were designed to be the solution for the lack of EEG data standardization. They have evolved over the years as researchers seek the best way to universally capture the characteristics of EEG data collected with different sensors, amplifiers, environmental factors, and purpose.

The European Data Format (EDF) was first developed in 1987 to support the sharing of EEG sleep recordings amongst medical engineers. It was later published in 1992 and became the most commonly used today, thereby serving as the defacto standard for EEG data although never officially recognized.  EDF+ expanded the capabilities of EDF by supporting recording interruptions, using standard electrode names, and supporting four-digit years dates.

A simple binary Extensible Biosignal (EBS) file format was introduced to the research community in 1996.  Large data sets can be arranged in these files by temporal or channel order but requires the same data encoding, sampling frequency, and length for all channels.  It also provides support for certain metadata attributes per channel and additional ones per recording in its variable header.

Brain Vision, founded in 1999, is known for its widely used Brain Vision common data format in addition to its other hardware and software products.  Their data format requires three separate files to include a text header file (.vhdr) containing meta data, a text marker file (.vmrk) containing event data, and a binary data file (.eeg) containing raw EEG voltage values.  Brain Vision has made a number of updates to its format since its original release.

First cited in 2004, the EEGLAB interactive MATLAB toolbox has continuously added new features to become the product it is today.  It is intended for processing EEG and other electrophysiological data.  EEGLAB saves data, including metadata, to a .set file and the raw data to a .fdt file.  Because of the popularity of MATLAB and EEGLAB and the number of plugins that support the import and export of various file types into EEGLAB, EEGLAB can be used to convert data into the MATLAB or another more common file format to support data exchange and analysis.

The General Data Format (GDF) was released in 2005 to provide a general-purpose data format for biomedical signals to address some of the limitations with EDF.  Open-source software for reading and writing to and from the GDF format was also made available along with converters from other formats.  An update to this format was later released in 2011.  It became a standard recognized by the Austrian Standardization Institute in 2015 but never universally adopted.

The Biosemi Data Format (BDF) has a high sampling rate and uses a 24-bit in comparison to the 16-bit EDF format.  Unfortunately, the 24-bit format is non-standard on most computers which requires some time-consuming conversions to read and convert data to that format.  BDF+ extends the original BDF version to become a 24-bit version of EDF+ with only minor differences.

Extensible Data Format (XDF) is a general use format for capturing biosignal data.  These files types are produced by LabRecorder software as a part of the Lab Streaming Layer (LSL) system.  XDF is open source, highly extensible, and can support any number of streams.  The American National Standards Institute (ANSI) actually based one of its standards on version 1.0 of XDF.

Standardization Efforts

There have been attempts to solve the EEG data standardization issue for some time now.  Some involve developing some of the common data file formats as previously discussed.  Others involve developing standards for EEG file structures to support data exchange.  Some of these standards, to include their file formats, data model, and APIs, are further discussed as follows.

In 2014, the Neurodata Without Borders (NWB) developed a comprehensive standard for storing, sharing, and archiving neurophysiology data with a focus on ease-of-use, portability, extensibility, and preservability. The NWB Neurophysiology (NWB:N) standard uses Hierarchical Data Format 5 (HDF5) and defines formal structures for organizing the data using basic primitives.  The PyNWB python package has also been created to read, write, and manipulate NWB data.

The Electrophysiology Task Force produced the Neuroscience Information Exchange (NIX) format as cited in 2014 as a way to use generic data models to store fully annotated electrophysiology data. As some of the other standards, it used HDF5 to store its data model.  C++, Python, Java, and MATLAB libraries are available for data input and output processing as well as for performing data analysis.  Because the standard's implementation is so generic, it can be used for more than just electrophysiology data.

While the original Brain Imaging Structure (BIDS) publication in 2016 detailed a hierarchical file structure standard for neuroimaging data, it was expanded three years later to also include an EEG-BIDS extension.  The BIDS standard restricts raw EEG data file usage to four pre-existing formats:  EDF, Brainvision, EEGLAB, and BSD formats.  It also dictates that metadata be stored in JSON or TSV files.

The EEG Study Schema (ESS) is another specification intended to support the storage and sharing of EEG data.  It is XML-based approach developed around data containers at the Swartz Center for Computational Neuroscience (SCCN) as detailed in 2016.  The specification includes containers for both raw and processed with preprocessing automated by the PREP software pipeline, which outputs data in EEGLAB .set files.  Raw data can be stored in several formats (e.g., EDF, BDF).

A 2021 publication in the Clinical Neurophysiology journal revealed that the DICOM Working Group 32 has been working on its own standards for sharing and storing neurophysiological data.  It includes routine EEG data with support for more complex EEG data to be added at a later time.

Summary

With all of the simultaneous efforts ongoing for EEG data standardization, this was not intended to be a comprehensive list of the efforts, but more of an introduction to the efforts and challenges with EEG data sharing.  Also, not addressed in this blog are validation methods for ensuring the quality of data being shared, which is another concern.  Even when researchers are sharing the same file formats, issues may be present as a result of poorly recorded data.  Although there has been a lot of work done towards EEG data standardization over the years, there is still a lot more to do. 

--Bathsheba Farrow

Comments