-
Notifications
You must be signed in to change notification settings - Fork 398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add possibility to merge several classes to dataset scripts #156
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be implemented in all the remaining readers (openimages, csv, etc).
Maybe replace self.classes
with a dict and when merging all values are 0
? Or just use a new parent method?
Now when there are merged classes the class.json file, which is used for labeling predicted videos/pics, will be empty. This fixes a problem in which all predictions generated from a dataset that had merged classes would be labeled with the label of the first class. Eg.: If you generated a dataset based on coco that filtered all classes except car, bus, and truck, and that also merged these classes into a single class. The predictions generated by predict.py or the web server would be labeled as car, when in fact they could've been a truck or a bus too. The idea behind completely removing the label instead of picking a new one is that a model that was trained to predict a single type of object, would portray this information more globally, instead of on an per object basis. For example in the name of the .jpg or .mpg file it created, or something like this. Still, there are probably use cases in which it would be useful to let the user pick the label. This could be added in the future after choosing a good console argument name for this parameter, and seeing if we could somehow merge it with the --merge argument while also maintaining the ability to have the label be empty.
@@ -67,16 +68,18 @@ def transform(dataset_reader, data_dir, output_dir, splits, only_classes, | |||
# All splits must have a consistent set of classes. | |||
classes = None | |||
|
|||
merge_classes = merge_classes in ('True', 'true', 'TRUE') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use click types instead.
@@ -30,16 +30,17 @@ def get_output_subfolder(only_classes, only_images, limit_examples, | |||
@click.option('--data-dir', help='Where to locate the original data.') | |||
@click.option('--output-dir', help='Where to save the transformed data.') | |||
@click.option('splits', '--split', required=True, multiple=True, help='Which splits to transform.') # noqa | |||
@click.option('--only-classes', help='Whitelist of classes.') | |||
@click.option('--only-classes', multiple=True, help='Whitelist of classes.') | |||
@click.option('--merge-classes', help='Merge all classes into a single class') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we name it --single-class
or --merge-all
to make the fact that only a single class creatd more explicit?
json.dump(self._reader.classes, tf.gfile.GFile(classes_file, 'w')) | ||
if self._reader.merge_classes: | ||
# Don't assign a name to the class if its a merge of several others | ||
json.dump([''], tf.gfile.GFile(classes_file, 'w')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have an option to set the class name before merging the pull request.
Adds the ability to merge classes with the
--merge-classes
option. Very useful when used with--only-classes
to create datasets for detecting certain type of objects. For example, a dataset for detecting 4 wheeled vehicles that mergescar
,bus
andtruck
from the coco dataset.Could also be useful without the
--only-classes
option to train the network to behave as a sort of more discriminative RPN.