Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Quantisation of floating point arguments in MaxLIPO+TR optimiser #1608

Open
MattWenham opened this issue Jan 3, 2019 · 9 comments

Comments

@MattWenham
Copy link

MattWenham commented Jan 3, 2019

I am writing in interface between XGBoost and the MaxLIPO+TR optimiser in dlib, using the latter to tune the hyperparameters of the former. I have found a situation in which it would be useful to have quantisation of the floating point arguments being optimised, or alternatively be able to switch on the Trust Region (TR) part of the optimiser for integer arguments. I will attempt to explain below.

Some hyperparameters of XGBoost, including subsample and colsample_by* represent the proportion of the input data to be (in these cases) subsampled. They therefore effectively take discrete quantised values between 0 and 1, quantised to 1/N where N is the extent of the input data in the appropriate dimension.

When attempting to optimise using these parameters, I have two choices with the MaxLIPO+TR optimiser in its current configuration:

  1. Treat these arguments as true floating point, or
  2. Treat these arguments as integers, and include a divisor to quantise the values actually sent to XGBoost to appropriate values.

Here's what appears to happen in the two cases:

  1. TR is turned on for these arguments (✔ VG), but the TR part of the optimiser eventually wastes function calls to XGBoost by making tiny changes to these arguments - smaller than the quantisation size - which result in exactly the same function output due to me specifying a seed value to XGBoost for reproducibility. In the cases I have looked at so far, the correct 'valley' is apparently found relatively quickly, but TR makes tiny changes to no avail in an attempt to refine an unrefinable solution.

  2. TR is turned off for these arguments as they are integers. MaxLIPO still does a good job, but for a set number of function calls, the optimisation is not as good as in 1.

Ideally, I would like to be able to try a combination of the two methods, i.e. be able to use quantisation with TR - if the quantisation is sufficiently fine - as my suspicion is that the lack of TR in case 2 means that its performance is not as good as case 1.

My C++ is not up to attempting to implement this, and I have no idea how much work it would be. I would be grateful if you could consider this as an enhancement, and if you would like to see example output to support my case, I will be happy to provide some.

@davisking
Copy link
Owner

Yes, I think this is a good idea. To make sure I understand, upgrading the interface to allow a per parameter epsilon would do the trick right? Right now there is an overall epsilon parameter which can be set and once the solver finds a local solution to that accuracy it stops running the TR part and does only MaxLIPO in an attempt to find a better place to run TR. This could be upgraded to allow setting different epsilon values for each parameter. Does that sound like it addresses this issue?

@MattWenham
Copy link
Author

This is not about quantisation of the solution, but quantisation of the parameters. If what you are proposing is a lower bound on the amount by which each individual parameter should be refined, then yes, that would work. The current epsilon could also remain as an and/or option.

@davisking
Copy link
Owner

Yes, I mean an epsilon on the parameter values. The behavior would be to disable the TR solver when all parameter changes the TR wants to do are below their specific epsilon values.

Right, the current epsilon on the objective value would be kept as a feature.

@MattWenham
Copy link
Author

MattWenham commented Jan 3, 2019

That sounds to me like it would work as I'd need it. Presumably, the refinement of each parameter by TR would be turned off individually - as it is in the case of integers - as each parameter change falls below its given threshold?

I have also noticed that in a very restricted integer-only search space, that the solver 'repeats itself', i.e. revisits combinations of parameters that it has already visited rather than terminating early or trying as-yet unvisited combinations. This is a bit of an edge case, and is effectively a fallback to a grid search, but I can open another issue for it if you'd like me to.

@davisking
Copy link
Owner

davisking commented Jan 4, 2019

I think turning them off as a group is fine. That is, letting TR run on them all, even when some of the parameters are below their epsilon, should not cause any issues. It also mitigates users giving bad epsilon values, that is, values that are accidentally too large for one parameter. Or as another case, it may be that there is an epsilon below which a parameter is "good enough", however, maybe the objective is still continuous below that epsilon and if the TR model is going to keep running since other parameters are still too large it might as well improve the "good enough" parameters as well.

Optimization of the LIPO surrogate function is done by randomly sampling 5000 points and picking the max. So it's possible that it fails to find a new point to sample when using MaxLIPO. This should be pretty unlikely to occur. But maybe it should just terminate. Does it happen in cases when there really are more unvisited combinations? I would expect that to be rare. This is not to say that this behavior shouldn't be improved though.

@MattWenham
Copy link
Author

Surrage? Typo, or am I insufficient of vocabulary? 😊

I have seen both behaviours in an integer-only search space, i.e. revisiting previously visited sets of parameters whether or not all combinations have already been visited.

@davisking
Copy link
Owner

Oops, I mean surrogate :)

Ok, both should be possible. But just to make sure it's not something surprising, can you say how many parameters were being optimized and what their ranges were?

@MattWenham
Copy link
Author

One case was a simple test: two integer parameters with four possible values each, and 100 function evaluations. I was surprised to see that all 100 were used and there was no fallback to grid search.

The other was of the order of 4x4x250x15 possible values of four integer parameters. I can try to reproduce if you need me to.

@davisking
Copy link
Owner

Na, that all is as expected. Thanks for reporting this. I'll update the solver at some point in the coming weeks when I get some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants