My name is Ade. I am a software engineering intern at JAIA Robotics and Final year Computer Science Masters Student at Brown University. I have been working at JAIA for 2 semesters now and while I dabble in a little bit of everything, I have been leveraging my AI knowledge to deliver value to the Jaiabot system.
I wanted to share a concrete example of how open source libraries significantly accelerated development for the JaiaBot.
Goal
We were interested in classifying bottom sediment types during dives using onboard sensors. Specifically, we wanted to distinguish between softer and harder bottom types using only signals available on the vehicle.
Rather than building custom ML infrastructure, we leveraged open source tooling to move quickly from HDF5 dive logs to a deployable model.
Step 1: Data Filtering from HDF5 Logs
JaiaBots log large HDF5 files containing many topics. For ML development we filtered each file down to only the datasets relevant to bottom interaction:
-
jaiabot::imu
-
jaiabot::pressure_adjusted
-
jaiabot::mission_dive
-
jaiabot::task_packet;14
Using h5py allowed us to programmatically extract only the needed groups and create smaller filtered files. This dramatically reduced iteration time when running experiments.
Step 2: Feature Extraction from Bottom Events
From each dive file we:
-
Identified bottom dive events from task packets.
-
Located the maximum depth point as the likely bottom impact moment.
-
Extracted a time window around that impact.
-
Computed physics motivated features:
-
Peak vertical acceleration
-
Impact duration above threshold
-
Maximum jerk
-
We used:
-
NumPy for vectorized computation
-
pandas for time aligned processing
Because these are mature libraries, the signal processing logic was concise and easy to validate.
Step 3: Unsupervised Structure Discovery
Instead of immediately training a supervised model, we first explored whether the data naturally clustered.
Using scikit-learn:
-
StandardScaler for normalization
-
KMeans for clustering
-
F1 scoring to compare clusters against ascent type
We searched over combinations of features to find the smallest set that best separated two clusters.
This gave us two emergent groups that strongly correlated with powered vs unpowered ascent behavior, which we treated as a proxy for soft vs hard bottom interaction as soft bottoms would usually trigger a powered ascent as the bots have to wiggle loose.
Step 4: Training a Lightweight Classifier
Once clusters were stable, we trained supervised classifiers to reproduce the cluster assignments:
-
RandomForest
-
GradientBoosting
-
LogisticRegression
-
SVM
Cross validation was handled entirely by scikit-learn utilities.
The best model was wrapped in a pipeline with scaling and exported using joblib. This produced a small serialized artifact that can be integrated into the Jaia software stack.
Why Open Source Made This Fast
Key advantages:
-
No custom clustering implementation required
-
Built in cross validation and metrics
-
Easy feature scaling and pipeline composition
-
Clean model serialization for deployment
-
Well tested numerical stability
Instead of debugging algorithms, we focused on domain questions:
-
Is the impact window correctly defined?
-
Are the features physically meaningful?
-
Are clusters consistent across dives?
The open source ecosystem allowed us to move from raw dive logs to a reproducible model quickly and confidently.
Lessons for Marine Robotics Teams
-
Start with unsupervised learning when labels are scarce.
-
Use physics informed features before deep models.
-
Lean heavily on mature open source libraries.
-
Serialize early and test deployment constraints early.
For small marine robotics teams, this approach avoids building infrastructure that already exists and lets you spend time where it matters most, which is on domain understanding and field validation.
If there is interest, I am happy to share more about:
-
Handling noisy underwater IMU data
-
Validating clusters across missions
-
Transitioning from clustering to fully supervised models
-
Constraints when deploying ML onboard embedded systems
Would love to hear how others are using open source ML in marine field systems