Roessner Restoration Initiative Stewardship · Sovereignty · Regeneration Donate

← Projects

WhaleID

Conservation tech · multiple hotspots

WhaleID

A computer vision pipeline for re-identifying individual humpback whales — built with local partners across a network of humpback hotspots.

Indian, Pacific & Atlantic basins · Wikidata Q139583604 · whaleid.org

Built with Dr. Caine Delacy and a network of local partners working across humpback whale hotspots in the Indian, Pacific, and Atlantic basins. The thesis is plain: effective protection starts with the ability to track whales as individuals, not as a population in aggregate.

The pipeline is computer vision built around an embedding model and similarity search. Re-identification reduces to a constant-time lookup against the catalog. Inputs are full-body underwater photographs — head tubercle patterns, dorsal ridges, pectoral scarring, fluke geometry, peduncle texture — not isolated fluke patches. Cross-population matches are ranked by a multiplicative fusion score and tiered into four outcomes: auto-merge, known individual, needs review, or new individual.

The pipeline runs five stages: detection (Roboflow YOLO), segmentation (SAM2 Hiera Large), orientation normalization (OpenCV PCA), multi-model feature extraction (MIEW-ID v3, MegaDescriptor-L, ALIKED + LightGlue, NeuralWhale shape and texture descriptors), and a conservative decisioning layer that prioritizes catalog integrity over false confidence. Every stage produces auditable artifacts — masks, overlays, keypoint visualizations — so an expert reviewer can see exactly how a match was made.

The architecture is meant to run locally. Field biologists, conservation NGOs, and community monitors can upload imagery, receive identifications, and contribute to a shared catalog without cloud infrastructure, proprietary tools, or outside control over the data.

The model is not the contribution. The contribution is conservation infrastructure for longitudinal study at population scale across multiple hotspots simultaneously. A co-authored paper on the methodology is expected in 2026.

Body regions detected
6
Networks in the model stack
7
Pipeline stages
5
Catalog search latency
<1ms

Six body regions

Most whale photos aren't tail shots. They're dorsal views from boats, lateral shots from shore, or head-on encounters. WhaleID extracts identity from whatever body parts are visible.

RegionWhat it identifies
HeadTubercle patterns — the bumpy nodules on the rostrum
Pectoral finScarring, white patches, trailing-edge geometry
Body (lateral)Scar patterns, pigmentation markings
Dorsal finShape, nicks, trailing edge
FlukeTrailing edges, pigmentation, distinctive notches
Caudal peduncleSkin texture, scarring near the tail base
Humpback Departure · Bazaruto Archipelago, Mozambique

The model stack

Detection runs first; per-region feature extractors run in parallel; multiplicative fusion produces a single identity score.

ComponentTechnology
Body-part detectionGemini 2.5 Pro · 4 regions per photo · ~20s
Cross-view networkDINOv2 · 768-dim
Identity networkMIEW-ID · 2,152-dim wildlife re-identification
Feature networkMegaDescriptor · 1,536-dim Swin-L
Custom extractorsTubercleMap · FlukeID · PectoralPattern · WhitePatch · DorsalFinprint
Keypoint matchingLightGlue (ALIKED descriptors) · NeuralWhale
Vector searchpgvector + HNSW · sub-millisecond
DecisionAtlas-first body-part matching with multiplicative fusion (8 tiers)
Field Capture — W004 · Bazaruto Archipelago, Mozambique