Skip to content

Latest commit

 

History

History
1431 lines (1252 loc) · 67 KB

CHANGELOG.md

File metadata and controls

1431 lines (1252 loc) · 67 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

New features

  • Convert Cuboid2D annotation to/from 3D data (#1639)
  • Add label groups for hierarchical classification in ImageNet (#1645)

Enhancements

  • Enhance 'id_from_image_name' transform to ensure each identifier is unique (#1635)
  • Optimize path assignment to handle point cloud in JSON without images (#1643)
  • Add documentation for framework conversion (#1659)

Bug fixes

  • Fix assertion to compare hashkeys against expected value (#1641)

Q4 2024 Release 1.10.0

New features

  • Support KITTI 3D format (#1619, #1621)
  • Add PseudoLabeling transform for unlabeled dataset (#1594)

Enhancements

  • Raise an appropriate error when exporting a datumaro dataset if its subset name contains path separators. (#1615)
  • Update docs for transform plugins (#1599)
  • Update ov ir model for explorer openvino launcher with CLIP ViT-L/14@336px model (#1603)
  • Optimize path assignment to handle point cloud in JSON without images (#1643)
  • Set TabularTransform to process clean transform in parallel (#1648)

Bug fixes

  • Fix datumaro format to load visibility information from Points annotations (#1644)

Q4 2024 Release 1.9.1

Enhancements

  • Support multiple labels for kaggle format (#1607)
  • Use DataFrame.map instead of DataFrame.applymap (#1613)

Bug fixes

  • Fix StreamDataset merging when importing in eager mode (#1609)

Q3 2024 Release 1.9.0

New features

  • Add a new CLI command: datum format (#1570)
  • Add a new Cuboid2D annotation type (#1601)
  • Support language dataset for DmTorchDataset (#1592)

Enhancements

  • Change _Shape to Shape and add comments for subclasses of Shape (#1568)
  • Fix kitti_raw importer and exporter for dimensions (height, width, length) in meters (#1596)

Bug fixes

  • Fix KITTI-3D importer and exporter (#1596)

Q3 2024 Release 1.8.0

New features

  • Add TabularValidator (#1498)
  • Add Clean Transform for tabular data type (#1520)

Enhancements

  • Set label name with parents to avoid duplicates for AstypeAnnotations (#1492)
  • Pass Keyword Argument to TabularDataBase (#1522)
  • Support hierarchical structure for ImageNet dataset format (#1528)
  • Enable dtype argument when calling media.data (#1546)

Bug fixes

  • Preserve end_frame information of a video when it is zero. (#1541)
  • Changed the Datumaro format to ensure exported videos have relative paths and to prevent the same video from being overwritten. (#1547)

Q2 2024 Release 1.7.0

New features

  • Support 'Video' media type in datumaro format (#1491)
  • Add ann_types property for dataset (#1422, #1479)
  • Add AnnotationType.rotated_bbox for oriented object detection (#1459)
  • Add DOTA data format for oriented object detection task (#1475)
  • Add AstypeAnnotations Transform (#1484)
  • Enhance DatasetItem annotations for semantic segmentation model training use case (#1503)
  • Add TabularValidator (#1498)
  • Add Clean Transform for tabular data type (#1520)
  • Add notebook for data handling of kaggle dataset (#1534)

Enhancements

  • Fix ambiguous COCO format detector (#1442)
  • Get target information for tabular dataset (#1471)
  • Add ExtractedMask and update importers who can use it to use it (#1480)
  • Improve PIL and COLOR_BGR context image decode performance (#1501)
  • Improve get_area() of Polygon through Shoelace formula (#1507)
  • Improve _Shape point converter (#1508)

Bug fixes

  • Split the video directory into subsets to avoid overwriting (#1485)
  • Doc update to replace --save-images is replaced with --save-media (#1514)

May 2024 Release 1.6.1

Enhancements

  • Prevent AcLauncher for OpenVINO 2024.0 (#1450)

Bug fixes

  • Modify lxml dependency constraint (#1460)
  • Fix CLI error occurring when installed with default option only (#1444, #1454)
  • Relax Pillow dependency constraint (#1436)
  • Modify Numpy dependency constraint (#1435)
  • Relax old pandas version constraint (#1467)

Apr. 2024 Release 1.6.0

New features

  • Changed supported Python version range (>=3.9, <=3.11) (#1269)
  • Support MMDetection COCO format (#1213)
  • Develop JsonSectionPageMapper in Rust API (#1224)
  • Add Filtering via User-Provided Python Functions (#1230, #1233)
  • Remove supporting MacOS platform (#1235)
  • Support Kaggle image data (KaggleImageCsvBase, KaggleImageTxtBase, KaggleImageMaskBase, KaggleVocBase, KaggleYoloBase) (#1240)
  • Add __getitem__() for random accessing with O(1) time complexity (#1247)
  • Add Data-aware Anchor Generator (#1251)
  • Support bounding box import within Kaggle extractors and add KaggleCocoBase (#1273)

Enhancements

  • Optimize Python import to make CLI entrypoint faster (#1182)
  • Add ImageColorScale context manager (#1194)
  • Enhance visualizer to toggle plot title visibility (#1228)
  • Enhance Datumaro data format detect() to be memory-bounded and performant (#1229)
  • Change RoIImage and MosaicImage to have np.uint8 dtype as default (#1245)
  • Enable image backend and color channel format to be selectable (#1246)
  • Boost up CityscapesBase and KaggleImageMaskBase by dropping np.unique (#1261)
  • Enhance RISE algortihm for explainable AI (#1263)
  • Enhance explore unit test to use real dataset from ImageNet (#1266)
  • Fix each method of the comparator to be used separately (#1290)
  • Bump ONNX version to 1.16.0 (#1376)
  • Print the color channel format (RGB) for datum stats command (#1389)
  • Add ignore_index argument to Mask.as_class_mask() and Mask.as_instance_mask() (#1409)

Bug fixes

  • Fix wrong example of Datumaro dataset creation in document (#1195)
  • Fix wrong command to install datumaro from github (#1202, #1207)
  • Update document to correct wrong datum project import command and add filtering example to filter out items containing annotations. (#1210)
  • Fix label compare of distance method (#1205)
  • Fix Datumaro visualizer's import errors after introducing lazy import (#1220)
  • Fix broken link to supported formats in readme (#1221)
  • Fix Kinetics data format to have media data (#1223)
  • Handling undefined labels at the annotation statistics (#1232)
  • Add unit test for item rename (#1237)
  • Fix a bug in the previous behavior when importing nested datasets in the project (#1243)
  • Fix Kaggle importer when adding duplicated labels (#1244)
  • Fix input tensor shape in model interpreter for OpenVINO 2023.3 (#1251)
  • Add default value for target in prune cli (#1253)
  • Remove deprecated MediaManager (#1262)
  • Fix explore command without project (#1271)
  • Fix enable COCO to import only bboxes (#1360)
  • Fix resize transform for RleMask annotation
  • (#1361)
  • Fix import YOLO variants from extractor when urls is not specified (#1362)

Jan. 2024 Release 1.5.2

Enhancements

  • Add memory bounded datumaro data format detect to release 1.5.1 (#1241)
  • Bump version string to 1.5.2 (#1249)
  • Remove Protobuf version limitation (<4) (#1248)

Nov. 2023 Release 1.5.1

Enhancements

  • Enhance Datumaro data format stream importer performance (#1153)
  • Change image default dtype from float32 to uint8 (#1175)
  • Add comparison level-up doc (#1174)
  • Add ImportError to catch GitPython import error (#1174)

Bug fixes

  • Modify the draw function in the visualizer not to raise an error for unsupported annotation types. (#1180)
  • Correct explore path in the related document. (#1176)
  • Fix errata in the voc document. Color values in the labelmap.txt should be separated by commas, not colons. (#1162)
  • Fix hyperlink errors in the document (#1159, #1161)
  • Fix memory unbounded Arrow data format export/import (#1169)
  • Update CVAT format doc to bypass warning (#1183)

15/09/2023 - Release 1.5.0

New features

  • Add SAMAutomaticMaskGeneration transform (#1168)
  • Add tabular data import/export (#1089)
  • Support video annotation import/export (#1124)
  • Add multiframework (PyTorch, Tensorflow) converter (#1125)
  • Add SAM OVMS and Triton server Docker image builders (#1129)
  • Add SAMBboxToInstanceMask transform (#1133, #1134)
  • Add ConfigurableValidator (#1142)

Enhancements

  • Enhance ClassificationValidator for multi-label classification datasets with label_groups (#1116)
  • Replace Roboflow xml.etree with defusedxml (#1117)
  • Define GroupType with IntEnum for, where 0 is EXCLUSIVE (#1116)
  • Add Rust API to optimize COCOPageMapper performance (#1120)
  • Support a dictionary input in addition to a single image input for the model launcher to support Segment Anything Model (#1133)
  • Remove deprecates announced to be removed in 1.5.0 (#1140)
  • Add multi-threading option to ModelTransform and SAMBboxToInstanceMask (#1145, #1149)

Bug fixes

  • Coco exporter can export annotations even if there is no media, except for mask annotations which require media info. (#1147)(#1158)
  • Fix bugs for Tile transform (#1123)
  • Disable Roboflow Tfrecord format when Tensorflow is not installed (#1130)
  • Raise VcsAlreadyExists error if vcs directory exists (#1138)

27/07/2023 - Release 1.4.1

Bug fixes

  • Report errors for COCO (stream) and Datumaro importers (#1110)

21/07/2023 - Release 1.4.0

New features

  • Add documentation and notebook example for Prune API (#1070)
  • Changed supported Python version range (>=3.8, <=3.11) (#1083)
  • Migrate OpenVINO v2023.0.0 (#1036)
  • Add Roboflow data format support (COCO JSON, Pascal VOC XML, YOLOv5-PyTorch, YOLOv7-PyTorch, YOLOv8, YOLOv5 Oriented Bounding Boxes, Multiclass CSV, TFRecord, CreateML JSON) (#1044)
  • Add MissingAnnotationDetection transform (#1049, #1063, #1064)
  • Add OVMSLauncher (#1056)
  • Add Prune API (#1058)
  • Add TritonLauncher (#1059)
  • Migrate DVC v3.0.0 (#1072)
  • Stream dataset import/export (#1077, #1081, #1082, #1091, #1093, #1098, #1102)
  • Support mask annotations for CVAT data format (#1078)

Enhancements

  • Support list query for explorer (#1087)
  • update contributing.md (#1094)
  • Update 3rd-party.txt for release 1.4.0 (#1099)
  • Give notice that the deprecation works will be done in datumaro==1.5.0 (#1085)
  • Unify COCO, Datumaro, VOC, YOLO importer/exporter progress reporter descriptions (#1100)
  • Enhance import performance for built-in plugins (#1031)
  • Change default dtype of load_image() to np.uint8 (#1041)
  • Add OTX ATSS detector model interpreter & refactor interfaces (#1047)
  • Refactor Launcher and ModelInterpreter (#1055)
  • Add CVAT data format document (#1060)
  • Reduce peak memory usage when importing COCO and Datumaro formats (#1061)
  • Enhance the error message for datum stats to be more user friendly (#1069)
  • Refactor dataset.py to seperate DatasetStorage (#1073)

Bug fixes

  • Create cache dir under only writable filesystem (#1088)
  • Fix: Dataset infos() can be broken if a transform not redefining infos() is stacked on the top (#1101)
  • Fix warnings in test_visualizer.py (#1039)
  • Fix LabelMe data format (#1053)
  • Prevent installing protobuf>=4 (#1054)
  • Fix UnionMerge (#1086)

26/05/2023 - Release 1.3.2

Enhancements

  • Let CocoBase continue even if an InvalidAnnotationError is raised (#1050)

Bug fixes

  • Install dvc version to 2.x (#1048)
  • Replace np.append() in Validator (#1050)

26/05/2023 - Release 1.3.1

Bug fixes

  • Fix Cityscapes format mis-detection (#1029)

25/05/2023 - Release 1.3.0

New features

  • Add CocoRoboflowImporter (#976, #1000)
  • Add SynthiaSfImporter and SynthiaAlImporter (#987)
  • Add intermediate skill docs for filter (#996)
  • Add VocInstanceSegmentationImporter and VocInstanceSegmentationExporter (#997)
  • Add Segment Anything data format support (#1005, #1009)
  • Add Correct transformation (#1006)
  • Implement ReindexAnnotations transform (#1008)
  • Add notebook examples for importing/exporting detection and segmentation data (#1020, #1023)
  • Update CLI from diff to compare, add TableComparator (#1012)

Enhancements

  • Use autosummary for fully-automatic Python module docs generation (#973)
  • Enrich stack trace for better user experience when importing (#992)
  • Save and load hashkey for explorer (#981) (#1003)
  • Add MOT and MOTS data format docs (#999)
  • Improve RemoveAnnotations to remove specific annotations with ids (#1004)
  • Add Jupyter notebook example of noisy label detection for detection tasks (#1011)

Bug fixes

  • Fix Mapillary Vistas data format (#977)
  • Fix bytes property returning None if function is given to data (#978)
  • Fix Synthia-Rand data format (#987)
  • Fix person_layout categories and action_classification attributes in imported Pascal-VOC dataset (#997)
  • Drop a malformed transform from StackedTransform automatically (#1001)
  • Fix Cityscapes to drop ImgsFine directory (#1023)

04/05/2023 - Release 1.2.1

Bug fixes

  • Fix project level CVAT for images format import (#980)
  • Fix an info message when using the convert CLI command with no args.input_format (#982)
  • Fix media contents not returning bytes in arrow format (#986)

20/04/2023 - Release 1.2.0

New features

Enhancements

  • Add multiprocessing to DatumaroBinaryBase (#897)
  • Refactor merge code (#901, #906)
  • Refactor download CLI commands (#909)
  • Refactor CLI commands w/ and w/o project (#910, #952)
  • Refactor Media to be initialized from explicit sources (#911 #921, #944)
  • Refactor hl_ops.py (#912)
  • Add tfds:uc_merced and tfds:eurosat download (#914)
  • Migrate documentation framework to Sphinx (#917, #922, #947, #954, #958, #961, #962, #963, #964, #965, #969)
  • Update merge tutorial for real life usecase (#930)
  • Abbreviate "detect-format" to "detect" for prettifying (#951)

Bug fixes

  • Add UserWarning if an invalid media_type comes to image statistics computation (#891)
  • Fix negated is_encrypted (#907)
  • Save extra images of PointCloud when exporting to datumaro format (#918)
  • Fix log issue when importing celeba and align celeba dataset (#919)

28/03/2023 - Release 1.1.1

Bug fixes

  • Fix to not export absolute media path in Datumaro and DatumaroBinary formats (#896)
  • Change pypi_publish.yml to publish_sdist_to_pypi.yml (#895)

23/03/2023 - Release 1.1.0

New features

  • Add with_subset_dirs decorator (Add ImagenetWithSubsetDirsImporter) (#816)
  • Add CommonSemanticSegmentationWithSubsetDirsImporter (#826)
  • Add DatumaroBinary format (#828, #829, #830, #831, #880, #883)
  • Add Explorer CLI documentation (#838)
  • Add version to dataset exported as datumaro format (#842)
  • Add Ava action data format support (#847)
  • Add Shift Analyzer (both covariate and label shifts) (#855)
  • Add YOLO Loose format (#856)
  • Add Ultralytics YOLO format (#859)

Enhancements

  • Refactor Datumaro format code and test code (#824)
  • Add publish to PyPI Github action (#867)
  • Add --no-media-encryption option (#875)

Bug fixes

  • Fix image filenames and anomaly mask appearance in MVTec exporter (#835)
  • Fix CIFAR10 and 100 detect function (#836)
  • Fix celeba and align_celeba detect function (#837)
  • Choose the top priority detect format for all directory depths (#839)
  • Fix MVTec format detect function (#843)
  • Fix wrong __len__() of Subset when the item is removed (#854)
  • Fix mask visualization bug (#860)
  • Fix detect unit tests to test false negatives as well (#868)

24/02/2023 - Release v1.0.0

New features

  • Add Data Explorer (#773)
  • Add Ellipse annotation type (#807)
  • Add MVTec anomaly data support (#810)

Enhancements

  • Refactor existing tests (#803)
  • Raise ImportError on importing malformed COCO directory (#812)
  • Remove the duplicated and cyclical category context in documentation (#822)

Bug fixes

27/01/2023 - Release v0.5.0

New features

  • Add Tile transformation (#790)
  • Add Video keyframe extraction (#791)
  • Add TileTransform documentation and Jupyter notebook example (#794)
  • Add MergeTile transformation (#796)

Enhancements

  • Improved mask_to_rle performance (#770)

Deprecated

  • N/A

Removed

  • N/A

Bug fixes

  • Fix MacOS CI failures (#789)
  • Fix auto-documentation for the data_format plugins (#793)

Security

  • Add security.md file for the SDL (#798)

06/12/2022 - Release v0.4.0.1

New features

  • Support for exclusive of labels with LabelGroup (#742)
  • Jupyter samples
    • Introducing how to merge datasets (#738)
    • Introducing how to visualize dataset (#747)
    • Introducing how to filter dataset (#748)
    • Introducing how to transform dataset (#759)
  • Visualization Python API
    • Bbox feature (#744)
    • Label, Points, Polygon, PolyLine, and Caption visualization features (#746)
    • Mask, SuperResolution, Depth visualization features (#747)
  • Documentation for Python API (#753)
    • dataset handler, visualizer, filter descriptions (#761)
  • __repr__ for Dataset (#750)
  • Support for exporting as CVAT video format (#757)
  • CodeCov coverage reporting feature to CI/CD (#756)
  • Jupyter notebook example rendering to documentation (#758)
  • An interface to manipulate 'infos' to store the dataset meta-info (#767)
  • 'bbox' annotation when importing a COCO dataset (#772)

Enhancements

  • Wrap title text according to its plot width (#769)
  • Get list of subsets and support only Image media type in visualizer (#768)

Deprecated

  • N/A

Removed

  • N/A

Bug fixes

  • Correcting static type checking (#743)
  • Fixing a VOC dataset export when a label contains 'space' (#771)

Security

  • N/A

06/09/2022 - Release v0.3.1

New features

  • Support for custom media types, new PointCloud media type, DatasetItem.media and .media_as(type) members (#539)
  • [API] A way to request dataset and extractor media type with media_type (#539)
  • BraTS format (import-only) (.npy and .nii.gz), new MultiframeImage media type (#628)
  • Common Semantic Segmentation dataset format (import-only) (#685)
  • An option to disable data/ prefix inclusion in YOLO export (#689)
  • New command describe-downloads to print information about downloadable datasets (#678)
  • Detection for Cityscapes format (#680)
  • Maximum recursion --depth parameter for detect-dataset CLI command (#680)
  • An option to save a single subset in the download command (#697)
  • Common Super Resolution dataset format (import-only) (#700)
  • Kinetics 400/600/700 dataset format (import-only) (#706)
  • NYU Depth Dataset V2 format (import-only) (#712)

Enhancements

  • env.detect_dataset() now returns a list of detected formats at all recursion levels instead of just the lowest one (#680)
  • Open Images: allowed to store annotations file in root path as well (#680)
  • Improved parsing error messages in COCO, VOC and YOLO formats (#684, #686, #687)
  • YOLO format now supports almost any subset names, except backup, names and classes (instead of just train and valid). The reserved names now raise an error on exporting. (#688)

Deprecated

  • --save-images is replaced with --save-media in CLI and converter API (#539)
  • [API] image, point_cloud and related_images of DatasetItem are replaced with media and media_as(type) members and c-tor parameters (#539)

Removed

  • N/A

Bug fixes

  • Detection for LFW format (#680)
  • Adding depth value of image when dataset is exported in VOC format (#726)
  • Adding to handle the numerical labels in task chains properly (#726)
  • Fixing the issue that annotations inside another annotation (polygon) are duplicated during import for VOC format (#726)

Security

  • N/A

21/02/2022 - Release v0.3

New features

  • Ability to import a video as frames with the video_frames format and to split a video into frames with the datum util split_video command (#555)
  • --subset parameter in the image_dir format (#555)
  • MediaManager API to control loaded media resources at runtime (#555)
  • Command to detect the format of a dataset (#576)
  • More comfortable access to library API via import datumaro (#630)
  • CLI command-like free functions (export, transform, ...) (#630)
  • Reading specific annotation files for train dataset in Cityscapes (#632)
  • Random sampling transforms (random_sampler, label_random_sampler) to create smaller datasets from bigger ones (#636, #640)
  • API to report dataset import and export progress; API to report dataset import and export errors and take action (skip, fail) (supported in COCO, VOC and YOLO formats) (#650)
  • Support for downloading the ImageNetV2 and COCO datasets (#653, #659)
  • A way for formats to signal that they don't support detection (#665)
  • Removal transforms to remove items/annoations/attributes from dataset (remove_items, remove_annotations, remove_attributes) (#670)

Enhancements

  • Allowed direct file paths in datum import. Such sources are imported like when the rpath parameter is specified, however, only the selected path is copied into the project (#555)
  • Improved stats performance, added new filtering parameters, image stats (unique, repeated) moved to the dataset section, removed mean and std from the dataset section (#621)
  • Allowed Image creation from just size info (#634)
  • Added image search in VOC XML-based subformats (#634)
  • Added image path equality checks in simple merge, when applicable (#634)
  • Supported saving box attributes when downloading the TFDS version of VOC (#668)
  • Switched to a pyproject.toml-based build (#671)

Deprecated

  • TBD

Removed

  • Official support of Python 3.6 (due to it's EOL) (#617)
  • Backward compatibility annotation symbols in components.extractor (#630)

Bug fixes

  • Prohibited calling add, import and export commands without a project (#555)
  • Calling make_dataset on empty project tree now produces the error properly (#555)
  • Saving (overwriting) a dataset in a project when rpath is used (#613)
  • Output image extension preserving in the Resize transform (#606)
  • Memory overuse in the Resize transform (#607)
  • Invalid image pixels produced by the Resize transform (#618)
  • Numeric warnings that sometimes occurred in stats command (e.g. #607) (#621)
  • Added missing item attribute merging in simple merge (#634)
  • Inability to disambiguate VOC from LabelMe in some cases (#658)

Security

  • TBD

28/01/2022 - Release v0.2.3

New features

  • Command to download public datasets (#582)
  • Extension autodetection in ByteImage (#595)
  • MPII Human Pose Dataset (import-only) (.mat and .json) (#584)
  • MARS format (import-only) (#585)

Enhancements

  • The pycocotools dependency lower bound is raised to 2.0.4. (#449)
  • smooth_line from datumaro.util.annotation_util - the function is renamed to approximate_line and has updated interface (#592)

Deprecated

  • Python 3.6 support

Removed

  • TBD

Bug fixes

  • Fails in multimerge when lines are not approximated and when there are no label categories (#592)
  • Cannot convert LabelMe dataset, that has no subsets (#600)

Security

  • TBD

24/12/2021 - Release v0.2.2

New features

  • Video reading API (#521)
  • Python API documentation (#526)
  • Mapillary Vistas dataset format (Import-only) (#537)
  • Datumaro can now be installed on Windows on Python 3.9 (#547)
  • Import for SYNTHIA dataset format (#532)
  • Support of score attribute in KITTI detetion (#571)
  • Support for Accuracy Checker dataset meta files in formats (#553, #569, #575)
  • Import for VoTT dataset format (#573)
  • Image resizing transform (#581)

Enhancements

  • The following formats can now be detected unambiguously: ade20k2017, ade20k2020, camvid, coco, cvat, datumaro, icdar_text_localization, icdar_text_segmentation, icdar_word_recognition, imagenet_txt, kitti_raw, label_me, lfw, mot_seq, open_images, vgg_face2, voc, widerface, yolo (#531, #536, #550, #557, #558)
  • Allowed Pytest-native tests (#563)
  • Allowed export options in the datum merge command (#545)

Deprecated

  • Using Image, ByteImage from datumaro.util.image - these classes are moved to datumaro.components.media (#538)

Removed

  • Equality comparison support between datumaro.components.media.Image and numpy.ndarray (#568)

Bug fixes

  • Bug #560: import issue with MOT dataset when using seqinfo.ini file (#564)
  • Empty lines in VOC subset lists are not ignored (#587)

Security

  • TBD

16/11/2021 - Release v0.2.1

New features

  • Import for CelebA dataset format. (#484)

Enhancements

  • File people.txt became optional in LFW (#509)
  • File image_ids_and_rotation.csv became optional Open Images (#509)
  • Allowed underscores (_) in subset names in COCO (#509)
  • Allowed annotation files with arbitrary names in COCO (#509)
  • The icdar_text_localization format is no longer detected in every directory (#531)
  • Updated pycocotools version to 2.0.2 (#534)

Deprecated

  • TBD

Removed

  • TBD

Bug fixes

  • Unhandled exception when a file is specified as the source for a COCO or MOTS dataset (#530)
  • Exporting dataset without color attribute into the icdar_text_segmentation format (#556)

Security

  • TBD

14/10/2021 - Release v0.2

New features

  • A new installation target: pip install datumaro[default], which should be used by default. The simple datumaro is supposed for library users. (#238)
  • Dataset and project versioning capabilities (Git-like) (#238)
  • "dataset revpath" concept in CLI, allowing to pass a dataset path with the dataset format in diff, merge, explain and info CLI commands (#238)
  • import, remove, commit, checkout, log, status, info CLI commands (#238)
  • Coco*Extractor classes now have an option to preserve label IDs from the original annotation file (#453)
  • patch CLI command to patch datasets (#401)
  • ProjectLabels transform to change dataset labels for merging etc. (#401, #478)
  • Support for custom labels in the KITTI detection format (#481)
  • Type annotations and docs for Annotation classes (#493)
  • Options to control label loading behavior in imagenet_txt import (#434, #489)

Enhancements

  • A project can contain and manage multiple datasets instead of a single one. CLI operations can be applied to the whole project, or to separate datasets. Datasets are modified inplace, by default (#328)
  • CLI help for builtin plugins doesn't require project (#328)
  • Annotation-related classes were moved into a new module, datumaro.components.annotation (#439)
  • Rollback utilities replaced with Scope utilities (#444)
  • The Project class from datumaro.components is changed completely (#238)
  • diff and ediff are joined into a single diff CLI command (#238)
  • Projects use new file layout, incompatible with old projects. An old project can be updated with datum project migrate (#238)
  • Inheriting CliPlugin is not required in plugin classes (#238)
  • Importers do not create Projects anymore and just return a list of extractor configurations (#238)

Deprecated

  • TBD

Removed

  • import, project merge CLI commands (#238)
  • Support for project hierarchies. A project cannot be a source anymore (#238)
  • Project cannot have independent internal dataset anymore. All the project data must be stored in the project data sources (#238)
  • datumaro_project format (#238)
  • Unused path field of DatasetItem (#455)

Bug fixes

  • Deprecation warning in open_images_format.py (#440)
  • lazy_image returning unrelated data sometimes (#409)
  • Invalid call to pycocotools.mask.iou (#450)
  • Importing of Open Images datasets without image data (#463)
  • Return value type in Dataset.is_modified (#401)
  • Remapping of secondary categories in RemapLabels (#401)
  • VOC dataset patching for classification and segmentation tasks (#478)
  • Exported mask label ids in KITTI segmentation (#481)
  • Missing label for Points read in the LFW format (#494)

Security

  • TBD

24/08/2021 - Release v0.1.11

New features

Enhancements

  • Datumaro no longer depends on scikit-image (#379)
  • Dataset remembers export options on saving / exporting for the first time (#386)

Deprecated

  • TBD

Removed

  • TBD

Bug fixes

  • Application of remap_labels to dataset categories of different length (#314)
  • Patching of datasets in formats (#348)
  • Improved Cityscapes export performance (#367)
  • Incorrect format of *_labelIds.png in Cityscapes export (#325, #342)
  • Item id in ImageNet format (#371)
  • Double quotes for ICDAR Word Recognition (#375)
  • Wrong display of builtin formats in CLI (#332)
  • Non utf-8 encoding of annotation files in Market-1501 export (#392)
  • Import of ICDAR, PASCAL VOC and VGGFace2 images from subdirectories on WIndows (#392)
  • Saving of images with Unicode paths on Windows (#392)
  • Calling ProjectDataset.transform() with a string argument (#402)
  • Attributes casting for CVAT format (#403)
  • Loading of custom project plugins (#404)
  • Reading, writing anno file and saving name of the subset for test subset (#447)

Security

  • Fixed unsafe unpickling in CIFAR import (#362)

14/07/2021 - Release v0.1.10

New features

  • Support for import/export zip archives with images (#273)
  • Subformat importers for VOC and COCO (#281)
  • Support for KITTI dataset segmentation and detection format (#282)
  • Updated YOLO format user manual (#295)
  • ItemTransform class, which describes item-wise dataset Transforms (#297)
  • keep-empty export parameter in VOC format (#297)
  • A base class for dataset validation plugins (#299)
  • Partial support for the Open Images format; only images and image-level labels can be read/written (#291, #315).
  • Support for Supervisely Point Cloud dataset format (#245, #353)
  • Support for KITTI Raw / Velodyne Points dataset format (#245)
  • Support for CIFAR-100 and documentation for CIFAR-10/100 (#301)

Enhancements

  • Tensorflow AVX check is made optional in API and disabled by default (#305)
  • Extensions for images in ImageNet_txt are now mandatory (#302)
  • Several dependencies now have lower bounds (#308)

Deprecated

  • TBD

Removed

  • TBD

Bug fixes

  • Incorrect image layout on saving and a problem with ecoding on loading (#284)
  • An error when XPath filter is applied to the dataset or its subset (#259)
  • Tracking of Dataset changes done by transforms (#297)
  • Improved CLI startup time in several cases (#306)

Security

  • Known issue: loading CIFAR can result in arbitrary code execution (#327)

03/06/2021 - Release v0.1.9

New features

  • Support for escaping in attribute values in LabelMe format (#49)
  • Support for Segmentation Splitting (#223)
  • Support for CIFAR-10/100 dataset format (#225, #243)
  • Support for COCO panoptic and stuff format (#210)
  • Documentation file and integration tests for Pascal VOC format (#228)
  • Support for MNIST and MNIST in CSV dataset formats (#234)
  • Documentation file for COCO format (#241)
  • Documentation file and integration tests for YOLO format (#246)
  • Support for Cityscapes dataset format (#249)
  • Support for Validator configurable threshold (#250)

Enhancements

  • LabelMe format saves dataset items with their relative paths by subsets without changing names (#200)
  • Allowed arbitrary subset count and names in classification and detection splitters (#207)
  • Annotation-less dataset elements are now participate in subset splitting (#211)
  • Classification task in LFW dataset format (#222)
  • Testing is now performed with pytest instead of unittest (#248)

Deprecated

  • TBD

Removed

  • TBD

Bug fixes

  • Added support for auto-merging (joining) of datasets with no labels and having labels (#200)
  • Allowed explicit label removal in remap_labels transform (#203)
  • Image extension in CVAT format export (#214)
  • Added a label "face" for bounding boxes in Wider Face (#215)
  • Allowed adding "difficult", "truncated", "occluded" attributes when converting to Pascal VOC if these attributes are not present (#216)
  • Empty lines in YOLO annotations are ignored (#221)
  • Export in VOC format when no image info is available (#239)
  • Fixed saving attribute in WiderFace extractor (#251)

Security

  • TBD

31/03/2021 - Release v0.1.8

New features

  • TBD

Enhancements

  • Added an option to allow undeclared annotation attributes in CVAT format export (#192)
  • COCO exports images in separate dirs by subsets. Added an option to control this (#195)

Deprecated

  • TBD

Removed

  • TBD

Bug fixes

  • Instance masks of background class no more introduce an instance (#188)
  • Added support for label attributes in Datumaro format (#192)

Security

  • TBD

24/03/2021 - Release v0.1.7

New features

  • OpenVINO plugin examples (#159)
  • Dataset validation for classification and detection datasets (#160)
  • Arbitrary image extensions in formats (import and export) (#166)
  • Ability to set a custom subset name for an imported dataset (#166)
  • CLI support for NDR(#178)

Enhancements

  • Common ICDAR format is split into 3 sub-formats (#174)

Deprecated

  • TBD

Removed

  • TBD

Bug fixes

  • The ability to work with file names containing Cyrillic and spaces (#148)
  • Image reading and saving in ICDAR formats (#174)
  • Unnecessary image loading on dataset saving (#176)
  • Allowed spaces in ICDAR captions (#182)
  • Saving of masks in VOC when masks are not requested (#184)

Security

  • TBD

03/02/2021 - Release v0.1.6.1 (hotfix)

New features

  • TBD

Enhancements

  • TBD

Deprecated

  • TBD

Removed

  • TBD

Bug fixes

  • Images with no annotations are exported again in VOC formats (#123)
  • Inference result for only one output layer in OpenVINO launcher (#125)

Security

  • TBD

02/26/2021 - Release v0.1.6

New features

  • Icdar13/15 dataset format (#96)
  • Laziness, source caching, tracking of changes and partial updating for Dataset (#102)
  • Market-1501 dataset format (#108)
  • LFW dataset format (#110)
  • Support of polygons' and masks' confusion matrices and mismathing classes in diff command (#117)
  • Add near duplicate image removal plugin (#113)
  • Sampler Plugin that analyzes inference result from the given dataset and selects samples for annotation(#115)

Enhancements

  • OpenVINO model launcher is updated for OpenVINO r2021.1 (#100)

Deprecated

  • TBD

Removed

  • TBD

Bug fixes

  • High memory consumption and low performance of mask import/export, #53 (#101)
  • Masks, covered by class 0 (background), should be exported with holes inside (#104)
  • diff command invocation problem with missing class methods (#117)

Security

  • TBD

01/23/2021 - Release v0.1.5

New features

  • WiderFace dataset format (#65, #90)
  • Function to transform annotations to labels (#66)
  • Dataset splits for classification, detection and re-id tasks (#68, #81)
  • VGGFace2 dataset format (#69, #82)
  • Unique image count statistic (#87)
  • Installation with pip by name datumaro

Enhancements

  • Dataset class extended with new operations: save, load, export, import_from, detect, run_model (#71)
  • Allowed importing Extractor-only defined formats (in Project.import_from, dataset.import_from and CLI/project import) (#71)
  • datum project ... commands replaced with datum ... commands (#84)
  • Supported more image formats in ImageNet extractors (#85)
  • Allowed adding Importer-defined formats as project sources (source add) (#86)
  • Added max search depth in ImageDir format and importers (#86)

Deprecated

  • datum project ... CLI context (#84)

Removed

  • TBD

Bug fixes

  • Allow plugins inherited from Extractor (instead of only SourceExtractor) (#70)
  • Windows installation with pip for pycocotools (#73)
  • YOLO extractor path matching on Windows (#73)
  • Fixed inplace file copying when saving images (#76)
  • Fixed labelmap parameter type checking in VOC converter (#76)
  • Fixed model copying on addition in CLI (#94)

Security

  • TBD

12/10/2020 - Release v0.1.4

New features

  • CamVid dataset format (#57)
  • Ability to install opencv-python-headless dependency with DATUMARO_HEADLESS=1 environment variable instead of opencv-python (#62)

Enhancements

  • Allow empty supercategory in COCO (#54)
  • Allow Pascal VOC to search in subdirectories (#50)

Deprecated

  • TBD

Removed

  • TBD

Bug fixes

  • TBD

Security

  • TBD

10/28/2020 - Release v0.1.3

New features

  • ImageNet and ImageNetTxt dataset formats (#41)

Enhancements

  • TBD

Deprecated

  • TBD

Removed

  • TBD

Bug fixes

  • Default label-map parameter value for VOC converter (#34)
  • Randomness of random split transform (#38)
  • Transform.subsets() method (#38)
  • Supported unknown image formats in TF Detection API converter (#40)
  • Supported empty attribute values in CVAT extractor (#45)

Security

  • TBD

10/05/2020 - Release v0.1.2

New features

  • ByteImage class to represent encoded images in memory and avoid recoding on save (#27)

Enhancements

  • Implementation of format plugins simplified (#22)
  • default is now a default subset name, instead of None. The values are interchangeable. (#22)
  • Improved performance of transforms (#22)

Deprecated

  • TBD

Removed

  • image/depth value from VOC export (#27)

Bug fixes

  • Zero division errors in dataset statistics (#31)

Security

  • TBD

09/24/2020 - Release v0.1.1

New features

Enhancements

  • TBD

Deprecated

  • TBD

Removed

  • TBD

Bug fixes

  • TBD

Security

  • TBD

09/10/2020 - Release v0.1.0

New features

  • Initial release

Template

## [Unreleased]
### New features
- TBD

### Enhancements
- TBD

### Deprecated
- TBD

### Removed
- TBD

### Bug fixes
- TBD

### Security
- TBD