[Feature request] Quantisation of floating point arguments in MaxLIPO+TR optimiser #1608

MattWenham · 2019-01-03T11:02:32Z

I am writing in interface between XGBoost and the MaxLIPO+TR optimiser in dlib, using the latter to tune the hyperparameters of the former. I have found a situation in which it would be useful to have quantisation of the floating point arguments being optimised, or alternatively be able to switch on the Trust Region (TR) part of the optimiser for integer arguments. I will attempt to explain below.

Some hyperparameters of XGBoost, including subsample and colsample_by* represent the proportion of the input data to be (in these cases) subsampled. They therefore effectively take discrete quantised values between 0 and 1, quantised to 1/N where N is the extent of the input data in the appropriate dimension.

When attempting to optimise using these parameters, I have two choices with the MaxLIPO+TR optimiser in its current configuration:

Treat these arguments as true floating point, or
Treat these arguments as integers, and include a divisor to quantise the values actually sent to XGBoost to appropriate values.

Here's what appears to happen in the two cases:

TR is turned on for these arguments (✔ VG), but the TR part of the optimiser eventually wastes function calls to XGBoost by making tiny changes to these arguments - smaller than the quantisation size - which result in exactly the same function output due to me specifying a seed value to XGBoost for reproducibility. In the cases I have looked at so far, the correct 'valley' is apparently found relatively quickly, but TR makes tiny changes to no avail in an attempt to refine an unrefinable solution.
TR is turned off for these arguments as they are integers. MaxLIPO still does a good job, but for a set number of function calls, the optimisation is not as good as in 1.

Ideally, I would like to be able to try a combination of the two methods, i.e. be able to use quantisation with TR - if the quantisation is sufficiently fine - as my suspicion is that the lack of TR in case 2 means that its performance is not as good as case 1.

My C++ is not up to attempting to implement this, and I have no idea how much work it would be. I would be grateful if you could consider this as an enhancement, and if you would like to see example output to support my case, I will be happy to provide some.

The text was updated successfully, but these errors were encountered:

davisking · 2019-01-03T12:25:27Z

Yes, I think this is a good idea. To make sure I understand, upgrading the interface to allow a per parameter epsilon would do the trick right? Right now there is an overall epsilon parameter which can be set and once the solver finds a local solution to that accuracy it stops running the TR part and does only MaxLIPO in an attempt to find a better place to run TR. This could be upgraded to allow setting different epsilon values for each parameter. Does that sound like it addresses this issue?

MattWenham · 2019-01-03T12:43:30Z

This is not about quantisation of the solution, but quantisation of the parameters. If what you are proposing is a lower bound on the amount by which each individual parameter should be refined, then yes, that would work. The current epsilon could also remain as an and/or option.

davisking · 2019-01-03T12:55:23Z

Yes, I mean an epsilon on the parameter values. The behavior would be to disable the TR solver when all parameter changes the TR wants to do are below their specific epsilon values.

Right, the current epsilon on the objective value would be kept as a feature.

MattWenham · 2019-01-03T13:01:07Z

That sounds to me like it would work as I'd need it. Presumably, the refinement of each parameter by TR would be turned off individually - as it is in the case of integers - as each parameter change falls below its given threshold?

I have also noticed that in a very restricted integer-only search space, that the solver 'repeats itself', i.e. revisits combinations of parameters that it has already visited rather than terminating early or trying as-yet unvisited combinations. This is a bit of an edge case, and is effectively a fallback to a grid search, but I can open another issue for it if you'd like me to.

davisking · 2019-01-04T12:39:05Z

I think turning them off as a group is fine. That is, letting TR run on them all, even when some of the parameters are below their epsilon, should not cause any issues. It also mitigates users giving bad epsilon values, that is, values that are accidentally too large for one parameter. Or as another case, it may be that there is an epsilon below which a parameter is "good enough", however, maybe the objective is still continuous below that epsilon and if the TR model is going to keep running since other parameters are still too large it might as well improve the "good enough" parameters as well.

Optimization of the LIPO surrogate function is done by randomly sampling 5000 points and picking the max. So it's possible that it fails to find a new point to sample when using MaxLIPO. This should be pretty unlikely to occur. But maybe it should just terminate. Does it happen in cases when there really are more unvisited combinations? I would expect that to be rare. This is not to say that this behavior shouldn't be improved though.

MattWenham · 2019-01-10T18:45:11Z

Surrage? Typo, or am I insufficient of vocabulary? 😊

I have seen both behaviours in an integer-only search space, i.e. revisiting previously visited sets of parameters whether or not all combinations have already been visited.

davisking · 2019-01-11T00:20:16Z

Oops, I mean surrogate :)

Ok, both should be possible. But just to make sure it's not something surprising, can you say how many parameters were being optimized and what their ranges were?

MattWenham · 2019-01-11T17:18:37Z

One case was a simple test: two integer parameters with four possible values each, and 100 function evaluations. I was surprised to see that all 100 were used and there was no fallback to grid search.

The other was of the order of 4x4x250x15 possible values of four integer parameters. I can try to reproduce if you need me to.

davisking · 2019-01-12T13:32:42Z

Na, that all is as expected. Thanks for reporting this. I'll update the solver at some point in the coming weeks when I get some time.

davisking added the enhancement label Jan 3, 2019

davisking added help wanted and removed help wanted labels Jan 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request] Quantisation of floating point arguments in MaxLIPO+TR optimiser #1608

[Feature request] Quantisation of floating point arguments in MaxLIPO+TR optimiser #1608

MattWenham commented Jan 3, 2019 •

edited

Loading

davisking commented Jan 3, 2019

MattWenham commented Jan 3, 2019

davisking commented Jan 3, 2019

MattWenham commented Jan 3, 2019 •

edited

Loading

davisking commented Jan 4, 2019 •

edited

Loading

MattWenham commented Jan 10, 2019

davisking commented Jan 11, 2019

MattWenham commented Jan 11, 2019

davisking commented Jan 12, 2019

[Feature request] Quantisation of floating point arguments in MaxLIPO+TR optimiser #1608

[Feature request] Quantisation of floating point arguments in MaxLIPO+TR optimiser #1608

Comments

MattWenham commented Jan 3, 2019 • edited Loading

davisking commented Jan 3, 2019

MattWenham commented Jan 3, 2019

davisking commented Jan 3, 2019

MattWenham commented Jan 3, 2019 • edited Loading

davisking commented Jan 4, 2019 • edited Loading

MattWenham commented Jan 10, 2019

davisking commented Jan 11, 2019

MattWenham commented Jan 11, 2019

davisking commented Jan 12, 2019

MattWenham commented Jan 3, 2019 •

edited

Loading

MattWenham commented Jan 3, 2019 •

edited

Loading

davisking commented Jan 4, 2019 •

edited

Loading