-
-
Notifications
You must be signed in to change notification settings - Fork 662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[hlopt] cache based on opcode array #11797
base: development
Are you sure you want to change the base?
Conversation
It's a curious approach to avoid looking at every opcode by looking at every opcode... Is the goal here to map unoptimized code to optimized code? In that case I wonder why you even need to identify the function, in theory multiple functions could have the same code which then ends up as the same optimized code. |
Yes the goal is to map unoptimized code ( You're right, multiple functions could have the same code. That would require a more complexe data structure for the cache and I'm not sure if it's a true gain, let me check. |
There might be a world where this is implemented as a I didn't exactly think this through though. |
Where f has more unused regs at the end but the code is the same.
I tried several implementation, the performance is very hard to measure so I also tried to explain my feelings. I think that use a pure code based cache can only reduce memory footprint and can potentially have lower timing overhead for the first compilation, especially for very small function (40% functions share the "same code" mostly < 10 opcodes I think); it's slower for subsequent runs' lookup due to the creation of a new array. |
@@ -1051,121 +1051,24 @@ let _optimize (f:fundecl) = | |||
|
|||
let same_op op1 op2 = | |||
match op1, op2 with | |||
| OMov (a1,b1), OMov (a2, b2) -> a1 = a2 && b1 = b2 | |||
| OInt (r1,_), OInt (r2, _) -> r1 = r2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this and other lines here, I don't understand why OInt(1, 1)
and OInt(1, 2)
should be considered the same op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's OInt reg * int index
, what only matters in optimized code is the used register / control flow, and not the index in global int table (e.g. in two different run, the same int 55 can have index 10 or 35). They are "fixed" by code related to c_remap_indexes.
I should double check if the replacement is always good, I'm trying to also remap field index but there are some errors x(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same_op
is confusing. I should probably rename the function but I don't have a good idea. Maybe same_op_except_index
unused args' reg should remain
code can be send as optimize result as-is if no reg map and no nop operations. Make a code copy if the cache entry is used in this run.
I think now I need to bind the hl_no_opt_cache option with server mode (e.g. only activate the cache if |
Related to #8082
Use function path - or function position if the path is empty - as cache key, then compare the whole opcode array (except constant/function indexes) to ensure the cache is valid before replace with the cached result.
The performance is not easy to measure but I did some tests on different game project.
For a game taking around 6s in generate-hl (2-2.5s measured for hlopt), this add 1s overhead on the first run (not all of them are measured inside hlopt, but is likely related to
Array.copy
on the opcode array), and it will reduce the subsequent runs to 3.5-4.5s (gain 2s compared to no cache).Currently the cache can be disabled by
-D hl_no_opt_cache
. I would like to only enable the cache when running with a compilation server, but I didn't find out how to do it.