Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance on CUHK03 #14

Open
Edelbert opened this issue Dec 6, 2017 · 10 comments
Open

Performance on CUHK03 #14

Edelbert opened this issue Dec 6, 2017 · 10 comments

Comments

@Edelbert
Copy link

Edelbert commented Dec 6, 2017

Hello, authors.
I was wondering, if you could provide some extra details about training on CUHK03. There is third-party re-implementation of your work. This implementation shows almost the same performance on Market1501, according to their benchmarks they did not use Test-time data augmentation. However, your performance on chuk03 is a little bit far away from theirs. Why? Can Test-data augmentation influence final result that much? By the way, did you use only one GPU fro training?

@lucasb-eyer
Copy link
Member

  1. Since they compute a mAP score on CUHK03, I'm assuming they use the "new" evaluation strategy introduced by Liang Zheng. We used the "old" one described in the original CUHK03 paper, because that's what most papers use. You cannot meaningfully compare scores across the two.
  2. Yes, augmentation can make a big difference, especially on smaller datasets.
  3. Only one GPU.

@Pandoro
Copy link
Member

Pandoro commented Dec 6, 2017

Regarding the scores from the third-party re-implementation, I quickly skimmed their code and they do actually load the original splits and have the option to train with each of the splits. However, it is a little unclear if the scores in their benchmark are obtained from actually doing the 20 trainings and reporting the average. If the results come from a single split, this does explain why the scores are different since the performance on the different splits varies quite a bit.

It's important not to look at the CMC allshot evaluation. This is not the typical CUHK03 evaluation protocol and thus not comparable to numbers you find in the literature. When comparing their CUHK03 results (85.4) with ours (89.6/87.6) I think that slightly different implementations and test-time augmentation can explain the difference.

For our CUHK03 experiments we combined the training and validation sets and used all hyperparameters as also used for our Market-1501 and MARS training (hence we don't need the validation set). The only thing we changed was the input size where we used 256x96 instead of 256x128 to better match the original CUHK03 aspect ratio.

@nihao88
Copy link

nihao88 commented Feb 3, 2018

Dear authors,
as I see from discussion, for CUHK03 you use same training procedure with almost same parameters as for Market-1501... Am I correct? The thing is when I use code as is for CUHK-03 training I got 30% cmc rank-1.
For testing I use 100 persons from 1 and 100 persons for second camera. Could I ask you to reveal CUHK-03 testing procedure that you use? Thank you.

@lucasb-eyer
Copy link
Member

Almost same parameters, yes. The main difference is "H = 256, W = 96". As for the testing procedure, we follow the "original" 20-split one, which is detailed in the original CUHK03 paper.

We have never gotten anything nearly as low as 30% rank-1, that's a very bad score indicating you're doing something very wrong or have a bug hidden somewhere. The most frequent mistake we see is that people forget to load the pre-trained ImageNet weights.

@nihao88
Copy link

nihao88 commented Feb 3, 2018

Thank you for quick respone. I will double check and come back with details

@liangbh6
Copy link

liangbh6 commented Apr 2, 2018

Actually I have an interest in mAP score of TriNet on cuhk03.

@lucasb-eyer
Copy link
Member

I don't fully understand what you mean @liangbh6. In case you didn't notice, we have included CUHK03 scores in the latest arxiv version of the paper.

@liangbh6
Copy link

liangbh6 commented Apr 4, 2018

I have found the rank-1 and rank-5 scores on cuhk03 in the latest arxiv version of the paper. But mAP is a measure different from them.

@Pandoro
Copy link
Member

Pandoro commented Apr 4, 2018

@liangbh6 aaah! In fact both @lucasb-eyer and me were a bit confused by your comment since we do provide CUHK03 results, but only now I realize we do not provide the mAP score. This is simply based on the fact that the mAP score is not meaningful on the CUHK03 dataset since you can only retrieve a single ground truth match.

Some of the more recent papers stopped using the original evaluation protocol and rather created a single new train-test-split for which mAP seems to make more sense. It should be noted though that these scores are not compatible and you should always pay attention to the evaluation protocol when looking at CUHK03 scores in a paper. To be honest, even within the original evaluation protocol, there are some ambiguities and a lot of the papers seem to evaluate in a slightly different way. I have always wondered how comparable the scores are at all. The new split might actually fix this to some extend.

@liangbh6
Copy link

Well, thanks for your explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants