Segmentation Analysis for Benthic Habitat Types and Coverage Using ROV Images

Matthew Liang - University of Washington

OCEAN 462 - Winter 2026

Scientific Context & Project Background

Nearshore benthic habitats are critical components of coastal ecosystems. The composition and distribution of seafloor features such as macroalgae, sediment, shell hash, rock, and other biological or anthropogenic materials influence biodiversity, productivity, and habitat quality, while also reflecting environmental change and ecosystem health. Because these habitats integrate both biological and physical processes, mapping their distribution can provide insight into coastal ecological health and the factors that structure nearshore communities.

At the same time, benthic habitats are difficult to quantify efficiently. ROVs and other imaging platforms can collect large volumes of underwater video and still imagery, but manual analysis of these data is slow, inconsistent, and difficult to scale. These challenges motivate the use of computer vision methods that can convert raw underwater imagery into quantitative ecological information.

This project, ROV images Analysis for Benthic Habitat Types and Coverage Using Segmentation, explores segmentation-based approaches for benthic habitat analysis using two related datasets: Seattle Aquarium imagery and BlueROV video collected near the Seattle Waterfront. By using segmentation to estimate habitat coverage and identify recurring seafloor patterns, this project aims to study the question of How Benthic Habitat Types and Coverage Differ in Different Proximity to Urban Infrastructures. Through analyzing this question, this project hopes to make underwater imagery more useful for ecological monitoring and for future studies of urban nearshore benthic environments.


Dataset Description

This project uses two related image datasets to explore segmentation methods for benthic habitat analysis. The first dataset contains aquarium imagery captured by a GoPro camera installed onto an ROV by the Seattle Aquarium, originally used for the Kelp research project; The second dataset contains underwater imagery collected by a BlueROV owned by University of Washington Applied Physics Lab, and used by the project group.

Dataset 1: Aquarium Images

This dataset is a series of images captured by a downward facing GoPro camera over a full 30 meters transect in Elliot Bay of Seattle Waterfront, in October 2024. The ROV flew over the transect areas with the camera taking images every 3 seconds to form a dataset of about 300 images. In the images, red and green seaweeds are common visual features, with sand and gravel seafloor environments. The dataset was used to train and evaluate the instance segmentation model that detects and classifies seaweed types.

image/jpg

Image 1: a sample Aquarium image in this dataset showing red and greed seaweeds as the dominant features

  • Source: Randell, Z. et al., Urban Kelp Research Project, Seattle Aquarium, March 2025. Images available at: https://www.dropbox.com/scl/fo/c6bw7jn0akgdu0wrshwnm/AM96nKZ9ApqJCPLfCwaqYsE?rlkey=9ej935k0ay349pj5w6v1er2qb&e=1&dl=0
  • Location: Elliott Bay
  • Images Background: a full 30m transect, with downward-facing GoPro 12 shooting 27.3MP .GPR photos every 3 seconds
  • Primary Classes:
    • Green seaweed
    • Red seaweed
    • Background
  • Dataset Size: ~300 raw images
  • Preprocessing:
    • Auto-Orient: Applied
    • Resize: Stretch to 1024x1024
  • Augmentation Methods:
    • Saturation: Between -34% and +34%
    • Brightness: Between -24% and +24%

Dataset 2: BlueROV Videos

This dataset consists of underwater imagery collected using a BlueROV operated by the project team near the Seattle Waterfront on 2026/2/20, inside a 20*5 meters rectangular survey area. We conducted 5 transects perpendicular to the shore, and produced about 40 minutes video using the downward facing camera installed onto the ROV. Frames from transect 2 and 3, which capture one complete back and forth of the survey area, are then extracted from the video for further analysis. Images capture the composition of nearshore benthic habitats, including sediment, shells, rocks, debris, and biological material.

image/jpg

Image 2: a sample frame extracted from the BlueROV in this dataset, showing muddy seafloor with white shell hash/biological materials, as well as some rocks and anthropogenic materials

  • Source: BlueROV underwater deployment
  • Location: Seattle nearshore environment
  • Image Background: Video frames extracted from ROV footage in a 20*5 m rectangular area
  • Preprocessing:
    • Adjusted brightness and contrast on the raw videos using Microsoft Clipchamp
    • Speed up 4x to lower the computational inputs.
  • Dataset Size: ~40 min raw videos, selected and preprocessed into two 30 s videos
  • Primary Benthic Features Observed:
    • mud / sediment
    • shell hash
    • rock fragments
    • biological material
    • anthropogenic debris

Unlike the aquarium dataset, these images contain complex natural substrates rather than clearly separated biological objects. Because detailed pixel-level annotations were not available, this dataset was used primarily for unsupervised segmentation analysis.


Model Selection & Description: Instance Segmentation

To analyze biological features in the Dataset 1: Aquarium Images, I implemented a supervised instance segmentation model to identify and classify seaweed types in underwater imagery.

Instance segmentation was selected instead of traditional object detection because the goal of this project is not only to detect the presence of seaweeds, but also to estimate their spatial coverage and structure within the image. Instance segmentation produces pixel-level masks for each detected object, allowing more detailed ecological interpretation than bounding boxes alone.

This project annotated 108 images of two classes:

  • Green seaweed
  • Red seaweed
  • All remaining pixels in the image were treated as background.

The annotated dataset was then split into training and validation sets:

  • Training set: 95%
  • Validation set: 5%

The annotated dataset was used to train an instance segmentation model using the Ultralytics YOLOv11 segmentation framework, with 100 epochs in the training process.


Model Performance Evaluation for Instance Segmentation

F1–Confidence Curve

image/jpg

Image 3: F1-Confidence curve for the instance segmentation model, showing the overall performance of the model and the optimal confidence level

The F1–confidence curve illustrates the relationship between prediction confidence thresholds and model performance. The model achieves a peak F1 score of approximately 0.63 at a confidence threshold of 0.55. This level of F1 is not the most ideal outcome, but is still a meaningful result given the complex lightings and contrast, irregular shapes, similar structures between seaweeds and other plants, and still a relatively small sample size.

The relatively smoother shape of the curve for green seaweed indicates more stable model behavior in recognizing green seaweeds, probably due to that there are more green seaweeds in the dataset.

Confusion Matrix

image/jpg

Image 4: Normalized confusion matrix for the instance segmentation model, showing the performance of the model on recognizing the two classes and how it is producing false positives or false negatives

The normalized confusion matrix summarizes the classification accuracy across the two classes and background. Overall, the model demonstrates a reasonable ability to distinguish between green seaweed and red seaweed. Green seaweed shows the strongest classification performance, with the highest proportion of correct predictions, and red seaweed is also detected relatively reliably. However, there are still a small portion of pixels are missed as background, producing some false negatives.

On the other hand, there are confusions between seaweed and background regions, likely due to similarities in color and lighting conditions in underwater imagery. The model predicts a large amount of areas with no seaweeds into one of the classes, producing many false positives.

Validation Predictions

image/jpg

Image 5: The validation predictions fro the instance segmentation model, showing what the model predicts to have the two classes of seaweeds

Predictions from the validation dataset show what the model detects and predicts both red and green seaweed regions in the aquarium images. The predicted masks generally follow the boundaries of seaweeds, but still with some false positives where the model shades background as seeweeds, which indicates that the model could not understand the background conditions well. This is expected given the relatively small size of the training dataset, and the complicated structures of the benthic habitats.

Overall Performance

These evaluation metrics suggest that the model has successfully learned the major visual differences between the two seaweed classes while still experiencing some difficulty separating complex textures from the surrounding environment. There are not a large amount of false negatives, but a relatively big number of false positives, but the model is capable of distinguishing between red and green seaweeds with little misclassification. Overall, the instance segmentation model performs reliably in identifying seaweed classes within the aquarium imagery, though additional improvement would be more beneficial.


Complementary Model: K-Means Segmentation

In addition to the supervised instance segmentation model, this project also explores an unsupervised image segmentation approach using K-means segmentation. This method was applied primarily to the BlueROV imagery, where the seafloor environment contains complex substrates rather than clearly defined biological objects, which makes the videos especially hard to annotate or distinguish classes. Each cluster therefore represents a dominant visual region within the image, allowing the model to approximate different substrate types or benthic features. By calculating the relative pixel coverage of each cluster, the method provides a rough estimate of habitat composition within an image.

While the clusters may not always directly correspond to certain benthic features, visual comparison between the clustered images and the original imagery allows interpretation of which clusters represent features such as seaweed, sediment, shells, or rock surfaces. This approach provides a simpler but still effective way to summarize benthic composition in images where labeled training data is not efficient.

K-Means on Aquarium Images: Testing the Method

image/jpg image/jpg image/jpg

Image 6, 7, & 8: A set of images comparing the original image with the K-means segmented images, as well as the color legends, showing how different colors and saturations on the original image (image 6) is segmented and transformed into the clustered image (image 7)

This model uses 6 clusters, and each one or two similar clusters successfully distinguish the specific benthic features. For example, red and purple cluster 5 and 2 mostly represent the non biological feactures like sand, rocks, and the brick, while cluster 4 mostly represent red seaweeds, and cluster 1 represent the green seaweeds. Comparing with the outcomes from the supervised instance segmentation model, the K-means model is significantly easier and more efficient in data processing and training, while giving a good overall understanding of the distribution of each benthic habitat features.

However, there are certain weaknesses for the K-means model as well. As the model clusters pixels based on color similarity in the HSV color space, it is very easy to be impacted by lighting, natural color differences of the same species, complicated sediment distributions, etc. It also shows inconsistency over the transects, that the model may group the same features into different clusters in different images because the lighting, contrast, brightness, etc. may vary.

K-Means on BlueROV Videos: Habitat Distribution Over Transect Analysis

This project then applies the same K-means segmentation model to the BlueROV videos. Unlike the aquarium dataset, these videos mostly show bare seafloor habitats, including mud, rocks, shells, logs, and small biogenic debris, with very few visible seaweeds or kelp.

image/jpg image/jpg image/jpg

Image 9, 10, & 11: A set of images selected from the Dataset 2: BlueROV videos, using an example in transect 3, comparing the original image with the K-means segmented images, as well as the color legends, showing how different colors and saturations on the original image (image 9) is segmented and transformed into the clustered image (image 10)

From the clustered images, several dominant benthic features can be interpreted from the color groups:

  • Cluster 1: shells
  • Cluster 2: rocks
  • Cluster 4: mud or fine sediment
  • Clusters 0 and 3: small white biogenic debris or mixed materials

Although these are the rough interpretation based on comparing between original and clustered images, they still provide a useful way to estimate relative habitat coverage across the transects.

image/jpg

Analyzing how different clusters vary along the ROV transects reveals spatial patterns in the benthic environment. Here in Transect 3, which moves from deeper offshore areas toward shallower water closer to the pier:

  • shell clusters show localized high concentrations at certain points,
  • rock clusters increase closer to shore,
  • clusters associated with small biogenic debris decrease toward shallower areas.

These patterns suggest that the benthic habitat composition changes gradually along the transect, potentially reflecting differences in depth, sediment transport, and proximity to urban infrastructure.

Overall, the K-means segmentation model provides a quick and computationally efficient way to obtain a broad overview of seafloor habitat composition from the ROV videos. However, similar to the aquarium K-means model, the clustering results are easily impacted by lighting conditions, contrast, and saturation variations between frames. These factors can cause the same habitat type to appear in different clusters across images, which affects our ability to analyze the changing percentage of each habitat type over the course of the transect.


Future Work & Application

Future Improvements

First, increasing the dataset size and diversity of the annotated dataset. This would likely improve the supervised instance segmentation model. The current training dataset contains a limited number of labeled aquarium images, which restricts the model’s ability to generalize to more complex environments. Expanding the dataset with additional annotations and including more varied lighting conditions, camera angles, and habitat types would improve model performance.

Second, implementing additional preprocessing techniques. This could help reduce the impact of lighting variability in underwater videos. Adjustments such as improved color correction could reduce inconsistencies between frames and improve both supervised and unsupervised segmentation results.

Finally, integrating both the supervised instance segmentation and the unsupervised K-means segmentation approaches into a single analysis project. Instance segmentation could be used to detect specific biological organisms such as seaweeds, while clustering methods could estimate the composition and coverage of the rest of the images. Together, these methods is expected to have better results to support benthic habitat monitoring.

Application of the Model

One application of this project is the automated analysis of benthic habitat composition along nearshore transects using ROV imagery. Coastal habitats near urban environments often experience changes in substrate type, biological diversity, and debris distribution due to human infrastructure.

Using the segmentation models and concepts developed in this project, underwater images from ROV surveys could be processed automatically to estimate the relative coverage of different benthic features across space. This would help understand the question of How Benthic Habitat Types and Coverage Differ in Different Proximity to Urban Infrastructures. Areas closer to urban infrastructure is expected show different habitat composition compared with areas with less human activities. If combined with other observations such as water quality or seasonal data, this project could provide useful method to help study the changes in benthic habitat, and understand the potential factors impacting our coastal environment.


Disclaimer

This computer vision project is part of a mentored research project in the Ocean Technology Studio, mentored by engineer Aaron Marburg, UW Applied Physics Lab. The bigger project aims to use an ROV to study the various types of environments around Seattle, including comparison between fresh and sea water, urban and natural areas, etc. Dataset 2: BlueROV videos come directly from a deployment of an ROV used by the project team. Dr. Zach Randell from Seattle Aquarium helpped set up the sea floor survey area at the bottom of Seattle Waterfront for the deployment, and provided access to Dataset 1: Aquarium images.

This project is still ongoing. We aim to conduct more deployments in the future both at the same locations in a different season and in different locations with various proximity to urban infrastructure, to study how may human activities impact our coastal benthic environments differently around the area and over time.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support