2021-09-16: Train Detection - 2021 Summer Internship at Bihrle Applied Research Inc

During the end of my 2nd year as a Ph.D. student at Old Dominion University (ODU), I was fortunate to get a remote internship at Bihrle Applied Research Inc (BAR) -- an aerospace and aerodynamics company near NASA Langley Research Center in Hampton, VA. This is my 2nd remote internship opportunity in the United States after having an excellent internship experience at Los Alamos National Laboratory in Summer 2020. Although the internship was remote, I was called for three days of on-site training to familiarize myself with the project and tasks I had to accomplish during the summer. I worked as a summer intern on a Rail-Inspector project -- a cloud-based software that automatically processes aerial imagery of railroad tracks using machine learning and deep learning to identify track components, make measurements, and identify defects. During the internship, I worked with the technical development team of BAR’s Ardenna business unit to develop and enhance the AI-based algorithms used by Rail-Inspector. I was supervised by Stanton Coffey -- an ODU Mechanical Engineering alumnus, who is currently working as a Lead Computer Vision Engineer at BAR and heading up Ardenna's Rail-Inspector team. 

BAR's Business Nature

Bihrle Applied Research Inc (BAR) is an aerospace company specializing in wind tunnel testing and simulation of aircraft. The services include but are not limited to flight simulation, wind tunnel testing, and software solutions for the aircraft modeling & simulation market. In 2014, BAR’s UAS software development led to a partnership with BNSF Railway and FAA’s Pathfinder Program to study and develop technologies for drone-based supplemental inspection of railway infrastructure to reduce the gap in the overall inspection market for automated processing of the vast amount of aerial imagery collected during UAS inspection flights.

Wind Tunnel Testing

To reduce the significant gap, one of BAR’s software solutions, called Rail-Inspector, offers an AI-based solution that automates the detection, classification, and reporting of anomalies to provide insightful and actionable data for infrastructure inspections. Rail-Inspector utilizes machine learning and deep learning-based algorithms to automate the processing of aerial track imagery to identify and analyze dozens of track components in a consistent and repeatable way. This helps the track owners to (a) correct anomalies before they become costly problems, (b) conduct capital planning activities, and (c) maintain comparative databases for predictive analysis.

On-site Training and Remote Work 

My background was closely related to the task, such as employing different technologies related to deep learning and machine learning. At BAR, I was responsible for developing an algorithm for train detection using computer vision techniques, annotating rail imagery containing trains and other rails components, and assessing images for testing. During three days of on-site training, my supervisor helped me with different software programs and introduced me to the other team members on this project. In the first week, I understood the Labeler program to annotate the images, install various library packages and software, and run python scripts related to tracking segmentation tasks. 

Figure 1: Ardenna's Labeler program to label rail components and trains

Upon training, I started working from home and communicated with the team via the Slack channel. I also reported the progress of this project during the Scrum meeting. Scrum is an agile framework that most companies use to produce results faster by breaking down the larger development project into smaller pieces. This helps the team to achieve results and produce products in a short period of time. My supervisor was a Scrum master and used to hold the team meeting every day in the noon. We discussed various obstacles, results, progress, next task, and possible solutions for a specific problem.


I was assigned to Ardenna's Rail-Inspector project and worked on train segmentation and detecting trains on rails. I spent a month building ground truth. Labeling images is crucial for any image processing-related tasks. Ardenna's Labeler program (Figure 1) was an annotation tool that enabled users to draw rectangles, polygons, and arbitrary lines over images, outlining the object of interest within each image by defining its X and Y coordinates. This made it possible for machine learning algorithms to detect the location of a specific object within an image. The Labeler program could save the annotation results (e.g., coordinates) in XML files, containing polygons of various rails components. Figure 1 illustrates how Ardenna's Labeler program was used to draw polygons or bounding boxes around the trains, switchpoints, frogs, and arbitrary lines for rails, ties, and gaps.

Rail Concepts

Although I was responsible for labeling only trains, it was important to understand various rail components to avoid segmentation errors. This helped me during communication with the team while I was labeling trains. The following rail components were necessary to understand when I was working on this project.

Track on a railway or railroad consists of the rails, fasteners, railroad ties (e.g., Sleeper), and ballast. Figure 2 illustrates various track components.

Figure 2: Track Components

Switchpoints and Frogs
Switches are on the point where a single track (e.g., 2 rails) becomes a turnout (e.g., 4 rails). Frogs are the other end where the turnout is becoming 2 separate tracks. Figure 3 illustrates the switches and frogs on the railroad.

Figure 3: Switchpoints and Frogs

Figure 4: Joint Bars and Gaps

Joint Bars and Gaps

Rail joints or joint bars are connected to the ends of two rails to bond them together. The gap is simply visible where two rails are connected. Figure 4 illustrates joint bars and gaps on rails.

Work Accomplished

Ardenna's software was able to detect track and rail components. My task was to enhance the model's algorithm for train segmentation and detection. The dataset I labeled consisted of 3,952 images. In the following month, I started working on the implementation. I tweaked the Python scripts to extract the labels, wrote SQL queries,  used OpenCV to write code for train segmentation, modified the hyperparameters of Rail-Inspector's neural net model, and performed several troubleshooting steps and experiments to optimize the neural network. Extracting labels took approximately two and a half hours. The splitting algorithm uses K-fold cross-validation with K=5 to split the dataset into 80% train and 20% validation sets. 


Later, we fine-tuned the pre-trained ResNet18 model using the dataset we labeled. ResNet18 is a popular convolutional neural network (CNN) architecture, and Pytorch comes with pre-trained weights. Ardenna's Rail-Inspector uses a fully convolutional network (FCN) for segmentation tasks. It copies all the pre-trained layers in the ResNet18. FCN uses a CNN to transform image pixels to pixel classes. It also uses CNN to extract image features, then transforms the number of channels into the number of classes via a 1x1 convolutional layer, and finally transforms the height and width of the feature maps of the input image via the transposed convolution. Figure 5 illustrates the model architecture of FCN. As a result, the model output has the same height and width as the input image, where the output channel contains the predicted classes for the input pixel at the same spatial position.

Figure 5: Example of Fully Convolutional Network

In general, the convolutional layers and pooling layers of CNN reduce (i.e., downsample) the input's spatial dimensions (i.e., height and width). In semantic segmentation that classifies at the pixel level, however, it will be convenient if the spatial dimensions of the input and output are the same. For example, the channel dimension at one output pixel can hold the classification results for the input pixel at the same spatial position. To achieve this, especially after the spatial dimensions are reduced by CNN layers, we can use another type of CNN layer that can increase (i.e., upsample) the spatial dimension of intermediate feature maps. This is called transposed convolution or deconvolution. In the FCN literature [1], the author suggested a learnable deconvolution layer for upsampling. However, Ardenna's Rail-Inspector utilizing straight interpolation for upsampling and utilizing batch-norm and dropout layer. 

Training Results and Challenges

We trained the model with a balanced class weight mode, whereas the batch size was 8, the number of epochs was 15, the learning rate was 5e-3, and the loss function was the cross-entropy function.  With 15 epochs, the model achieved a poor accuracy of 27% for the train detection task on the validation set. There were three ways to improve the performance. Either we could add more training images, perform oversampling of the training set, or tweak the class weight parameters. We found out that the classes were imbalanced; as it turned out, we only had 761 images with trains on them. We also had to modify the splitting algorithm since this algorithm is used to keep frogs and switchpoints equal in each split. We added several conditions for tracks and trains such that the splitting algorithm checks if the images have both tracks and trains, tracks but no trains, and no tracks but only trains. We then used five-fold cross-validation and trained the model with 20 epochs. We achieved 94% accuracy for the train detection task on the validation set. However, the accuracy for switchpoints decreased. We visualized the prediction result using Tensorboard using the following command.

tensorboard --logdir=Experiments --port 80

Figure 6 illustrates the prediction result where the left image is the prediction, and the right is the ground truth. We noticed that shadows of trains on the surface had been picked up. This would be a problem because the train's shadow acts as a false train covering up the tracks.

Figure 6: The prediction result with 20 epochs and achieved 94% accuracy. Left image: prediction and Right image: ground truth

This was because the model overweighted trains. We solved this problem using the oversampling technique. Finally, we increased the epoch time to 50 epochs and trained with FCN using ResNet18. The accuracy was improved by 2% (e.g., 96%) for train detection without picking up any noise and shadows of the train. Figure 7(a) illustrates the visualization of the prediction result, and Figure 7(b) is the confusion matrix which describes the performance of each component of rails and tracks.

Figure 7 (a): The prediction result with 50 epochs and achieved 96% accuracy. Left image: prediction and Right image: ground truth

Figure 7 (b): Performance of train detection and other components of rails


We used FCN with ResNet18 for the train detection problem and achieved an accuracy of 96%. The overall experience while working as a summer intern at BAR was excellent. I actively reported my progress daily, reached out to other team members, asked questions to solve problems. This experience enhanced both my technical and soft skills greatly. I found this level of work very intriguing since I had the opportunity to work with Jeremy Tavrisov -- a Computer Vision Engineer at BAR, who helped me understand rails concepts and guided me throughout the internship. This was an excellent opportunity as I learned a few new concepts of computer vision and deep learning.


[1] E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks for semantic segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 4, pp. 640–651, 2017.


I am grateful to my advisor Dr. Jian Wu for encouraging me to apply for the summer internship. I am also grateful to the Department of Computer Science at ODU, through which I learned about this summer internship position. I am thankful to work with my supervisor Stanton Coffey, Jeremy Tavrisov, and a diverse and multi-disciplinary team at Bihrle Applied Research Inc.

-- Muntabir Choudhury (@TasinChoudhury)