Skip to content

SainsburyWellcomeCentre/crabs-exploration

Repository files navigation

crabs-exploration

License CI codecov

A toolkit for detecting and tracking crabs in the field.

Getting Started

Prerequisites

crabs uses neural networks to detect and track multiple crabs in the field. The detection model is based on the Faster R-CNN architecture. The tracking model is based on the SORT tracking algorithm.

The package supports Python 3.9 or 3.10, and is tested on Linux and MacOS.

We highly recommend running crabs on a machine with a dedicated graphics device, such as an NVIDIA GPU or an Apple M1+ chip.

Installation

Users

To install the crabs package, first clone this git repository.

git clone https://github.com/SainsburyWellcomeCentre/crabs-exploration.git

Then, navigate to the root directory of the repository and install the crabs package in a conda environment:

conda create -n crabs-env python=3.10 -y
conda activate crabs-env
pip install .

Developers

For development, we recommend installing the package in editable mode and with additional dev dependencies:

pip install -e .[dev]  # or ".[dev]" if you are using zsh

CrabsField - Sept2023 dataset

We trained the detector model on our CrabsField - Sept2023 dataset. The dataset consists of 53041 annotations (bounding boxes) over 544 frames extracted from 28 videos of crabs in the field.

The dataset is currently private. If you have access to the GIN repository, you can download the dataset using the GIN CLI tool. To set up the GIN CLI tool:

  1. Create a GIN account.
  2. Download GIN CLI and set it up by running:
    $ gin login
    
    You will be prompted for your GIN username and password.
  3. Confirm that everything is working properly by typing:
    $ gin --version
    

Then to download the dataset, run the following command from the directory you want the data to be in:

gin get SainsburyWellcomeCentre/CrabsField

This command will clone the data repository to the current working directory, and download the large files in the dataset as lightweight placeholder files. To download the content of these placeholder files, run:

gin download --content

Because the large files in the dataset are locked, this command will download the content to the git annex subdirectory, and turn the placeholder files in the working directory into symlinks that point to that content. For more information on how to work with a GIN repository, see the corresponding NIU HowTo guide.

Basic commands

Train a detector

To train a detector on an existing dataset, run the following command:

train-detector --dataset_dirs <list-of-dataset-directories>

This command assumes each dataset directory has the following structure:

dataset
|_ frames
|_ annotations
    |_ VIA_JSON_combined_coco_gen.json

The default name assumed for the annotations file is VIA_JSON_combined_coco_gen.json. Other filenames (or full paths to annotation files) can be passed with the --annotation_files command-line argument.

To see the full list of possible arguments to the train-detector command run:

train-detector --help

Monitor a training job

We use MLflow to monitor the training of the detector and log the hyperparameters used.

To run MLflow, execute the following command from your crabs-env conda environment:

mlflow ui --backend-store-uri file:///<path-to-ml-runs>

Replace <path-to-ml-runs> with the path to the directory where the MLflow output is. By default, the output is placed in an ml-runs folder under the directory from which the train-detector is launched.

In the MLflow browser-based user-interface, you can find the path to the checkpoints directory for any run, under the path_to_checkpoints parameter. This will be useful to evaluate the trained model. The model saved at the end of the training job is saved as last.ckpt in the path_to_checkpoints directory.

Evaluate a detector

To evaluate a trained detector on the test split of the dataset, run the following command:

evaluate-detector --trained_model_path <path-to-ckpt-file>

This command assumes the trained detector model (a .ckpt checkpoint file) is saved in an MLflow database structure. That is, the checkpoint is assumed to be under a checkpoints directory, which in turn should be under a <mlflow-experiment-hash>/<mlflow-run-hash> directory. This will be the case if the model has been trained using the train-detector command.

The evaluate-detector command will print to screen the average precision and average recall of the detector on the validation set by default. To evaluate the model on the test set instead, use the --use_test_set flag.

The command will also log those performance metrics to the MLflow database, along with the hyperparameters of the evaluation job. To visualise the MLflow summary of the evaluation job, run:

mlflow ui --backend-store-uri file:///<path-to-ml-runs>

where <path-to-ml-runs> is the path to the directory where the MLflow output is.

The evaluated samples can be inspected visually by exporting them using the --save__frames flag. In this case, the frames with the predicted and ground-truth bounding boxes are saved in a directory called evaluation_output_<timestamp> under the current working directory.

To see the full list of possible arguments to the evaluate-detector command, run it with the --help flag.

Run detector+tracking on a video

To track crabs in a new video, using a trained detector and a tracker, run the following command:

detect-and-track-video --trained_model_path <path-to-ckpt-file> --video_path <path-to-input-video>

This will produce a tracking_output_<timestamp> directory with the output from tracking under the current working directory. To avoid adding the <timestamp> suffix to the directory name, run the command with the --output_dir_no_timestamp flag. To see the full list of possible arguments to the detect-and-track-video command, run it with the --help flag.

The tracking output consists of:

  • a .csv file named <video-name>_tracks.csv, with the tracked bounding boxes data;
  • if the flag --save_video is added to the command: a video file named <video-name>_tracks.mp4, with the tracked bounding boxes;
  • if the flag --save_frames is added to the command: a subdirectory named <video_name>_frames is created, and the video frames are saved in it.

The .csv file with tracked bounding boxes can be imported in movement for further analysis. See the movement documentation for more details.

Note that when using --save_frames, the frames of the video are saved as-is, without added bounding boxes. The aim is to support the visualisation and correction of the predictions using the VGG Image Annotator (VIA) tool. To do so, follow the instructions of the VIA Face track annotation tutorial.

If a file with ground-truth annotations is passed to the command (with the --annotations_file flag), the MOTA metric for evaluating tracking is computed and printed to screen.

Task-specific guides

For further information on specific tasks, such as launching a training job or evaluating a set of models in the HPC cluster, please see our guides.