Skip to content

Commit

Permalink
Merge pull request #8 from jaspersiebring/cli_rework
Browse files Browse the repository at this point in the history
Refactored monolithic CLI to more dedicated Typer commands with input validation
  • Loading branch information
jaspersiebring authored Sep 21, 2023
2 parents e05edbb + 572c43c commit 9f508a0
Show file tree
Hide file tree
Showing 12 changed files with 495 additions and 288 deletions.
149 changes: 83 additions & 66 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Built with `Pydantic` and `pycocotools`, it features a complete implementation o
# Key features
- User-friendly: GeoCOCO is designed for ease of use, requiring minimal configuration and domain knowledge
- Version Control: Datasets created with GeoCOCO are versioned and designed for expansion with future annotations
- Command-line Tool: Use GeoCOCO from your terminal for quick conversions
- Command-line Tool: Use GeoCOCO from your terminal to create, append and copy COCO datasets
- Python Module: Integrate GeoCOCO in your own data applications with the `geococo` package
- Representation: GeoCOCO maximizes label representation through an adaptive moving window approach
- COCO Standard: Output datasets are fully compatible with other COCO-accepting applications
Expand All @@ -29,67 +29,83 @@ pip install geococo
After installing `geococo`, there are a number of ways you can interact with its API.

#### Command line interface
The easiest way to use `geococo` is to simply call it from your preferred terminal. You can use the tool entirely from your terminal by providing paths to your input data and the desired output image sizes like this.

````
# Example with local data and non-existent JSON file
geococo image.tif labels.shp coco_folder dataset.json 512 512
Creating new dataset..
Dataset version: 0.1.0
Dataset description: Test dataset
Dataset contributor: User
Dataset date: 2023-09-05 18:12:31.435591
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 234/234 [00:04<00:00, 50.36it/s]
The easiest way to use `geococo` is to simply call it from your preferred terminal with one of three commands: `new`, `add` and `copy`.

````
$ geococo --help
Usage: geococo [OPTIONS] COMMAND [ARGS]...
Transform your GIS annotations into COCO datasets.
Options:
--help Show this message and exit.
Commands:
add Transform and add GIS annotations to an existing CocoDataset
copy Copy and (optionally) update the metadata of an existing CocoDataset
new Initialize a new CocoDataset with user-prompted metadata
````

Starting a new CocoDataset (will prompt user for metadata like Description and Contributor)

````
$ geococo new dataset.json
````
For more information on the different options, call `geococo` with `--help`

Adding new annotations to existing CocoDataset (will increment version based on input data and update any new categories/images)
````
$ geococo add image.tif labels.shp dataset.json images/ 512 512 --id-attribute ids
````
geococo --help
Usage: cli.py [OPTIONS] IMAGE_PATH LABELS_PATH JSON_PATH OUTPUT_DIR WIDTH HEIGHT

Transform your GIS annotations into a COCO dataset.
For more information on the different commands, call `geococo COMMAND` with `--help`.
````
$ geococo add --help
Usage: geococo add [OPTIONS] IMAGE_PATH LABELS_PATH JSON_PATH OUTPUT_DIR WIDTH
HEIGHT
Transform and add GIS annotations to an existing COCO dataset.
This method generates a COCO dataset by moving across the given image
(image_path) with a moving window (image_size), constantly checking for
intersecting annotations (labels_path) that represent image objects in said
image (e.g. buildings in satellite imagery; denoted by category_attribute).
Each valid intersection will add n Annotations entries to the dataset
(json_path) and save a subset of the input image that contained these entries
(output_dir).
(image_path) with a moving window (width, height), constantly checking for
intersecting annotations (labels_path) that represent image objects in said
image (e.g. buildings in satellite imagery; denoted by (super)category name
and/or id). Each valid intersection will add n Annotations entries to the
dataset (json_path) and save a subset of the input image that contained
these entries (output_dir).
The output data size depends on your input labels, as the moving window
adjusts its step size to accommodate the average annotation size, optimizing
dataset representation and minimizing tool configuration.
adjusts its step size to accommodate the average annotation size, optimizing
dataset representation and minimizing tool configuration. Each addition will
also increment the dataset version: patch if using the same image_path,
minor if using a new image_path, and major if using a new output_dir.
Arguments:
IMAGE_PATH Path to the geospatial image containing image
objects (e.g. buildings in satellite imagery)
[required]
LABELS_PATH Path to the annotations representing these image
objects (='category_id') [required]
JSON_PATH Path to the json file that will store the COCO
dataset (will be appended to if already exists)
[required]
OUTPUT_DIR Path to the output directory for image subsets
[required]
WIDTH Width of the output images [required]
HEIGHT Height of the output images [required]
IMAGE_PATH Path to geospatial image containing image objects [required]
LABELS_PATH Path to vector file containing annotated image objects
[required]
JSON_PATH Path to json file containing the COCO dataset [required]
OUTPUT_DIR Path to output directory for image subsets [required]
WIDTH Width of image subsets [required]
HEIGHT Height of image subsets [required]
Options:
--category-attribute TEXT Column that contains category_id values per
annotation feature [default: category_id]
--help Show this message and exit.
--id-attribute TEXT Name of column containing category_id values
(optional if --name_attribute is given)
--name-attribute TEXT Name of column containing category_name values
(optional if --id_attribute is given)
--super-attribute TEXT Name of column containing supercategory values
--help Show this message and exit.
````


#### Python module
This is recommended for most developers as it gives you more granular control over the various steps. It does assume a basic understanding of the `geopandas` and `rasterio` packages.
This is recommended for most developers as it gives you more granular control over the various steps. It does assume a basic familiarity with the `geopandas` and `rasterio` packages (i.e. GIS modules that help you manage vector and raster data respectively).

````
import pathlib
import geopandas as gpd
import rasterio
from datetime import datetime
from geococo import create_dataset, load_dataset, save_dataset, labels_to_dataset
from geococo import create_dataset, load_dataset, save_dataset, append_dataset
# Replace this with your preferred output paths
data_path = pathlib.Path("path/to/your/coco/output/images")
Expand All @@ -98,40 +114,41 @@ json_path = pathlib.Path("path/to/your/coco/json/file")
# Dimensions of the moving window and output images
width, height = 512, 512
# Creating dataset instance from scratch
# Starting a new CocoDataset
description = "My First Dataset"
contributor = "User'
date_created = datetime.now()
dataset = create_dataset(
version = version,
description = description,
contributor = contributor,
date_created = date_created
)
contributor = "User"
# version and date_created are automatically set
dataset = create_dataset(description=description, contributor=contributor)
# You can also load existing COCO datasets
# dataset = load_dataset(json_path=json_path)
# Loading GIS data with rasterio and geopandas
labels = gpd.read_file(labels_path)
raster_source = rasterio.open(image_path)
# Moving across raster_source and appending all intersecting annotations
dataset = labels_to_dataset(
dataset = dataset,
images_dir = output_dir,
src = raster_source,
labels = labels,
window_bounds = [(width, height)]
)
labels = gpd.read_file(some_labels_path)
raster_source = rasterio.open(some_image_path)
# (Optional) Apply any spatial or attribute queries here
# labels = labels.loc[labels["ids"].isin([1, 2, 3])]
# labels = labels.loc[labels.within(some_polygon)]
# Find and save all Annotation instances
dataset = append_dataset(
dataset=dataset,
images_dir=data_path,
src=raster_source,
labels=labels,
window_bounds=[(width, height)],
id_attribute=None, # column with category_id values
name_attribute="ids", # column with category_name values
super_attribute=None, # optional column with super_category values
)
# Encode CocoDataset instance as JSON and save to json_path
save_dataset(dataset=dataset, json_path=json_path)
````

# Visualization with FiftyOne
Like the official COCO project, the open source tool [FiftyOne](https://docs.voxel51.com/) can be used to visualize and evaluate your datasets. This does require the `fiftyone` and `pycocotools` packages (the former of which is not installed by `geococo` so you would need to install this separately, see https://docs.voxel51.com/getting_started/install.html for instructions). After installing `fiftyone`, you can run the following to inspect your data in your browser.
Like the official COCO project, the open source tool [FiftyOne](https://docs.voxel51.com/) can be used to visualize and evaluate your datasets. To do this, you'll need the `fiftyone` and `pycocotools` packages. Note that `geococo` does not install `fiftyone` by default, so you'll need to install it separately (instructions for installation can be found [here](https://docs.voxel51.com/getting_started/install.html)). Once you have `fiftyone` installed, you can use the following command to inspect your COCO dataset in your web browser.

````
# requires pycocotools and fiftyone
Expand Down
3 changes: 1 addition & 2 deletions geococo/__init__.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
import warnings
from rasterio.errors import NotGeoreferencedWarning
from .coco_processing import labels_to_dataset
from .coco_processing import append_dataset
from .coco_manager import create_dataset, load_dataset, save_dataset
from .cli import build_coco

# We make this specific warning 'catchable' to ensure that all rasterio clips are valid
warnings.filterwarnings(
Expand Down
Loading

0 comments on commit 9f508a0

Please sign in to comment.