-
-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ideas on how to improve compile-time #987
Comments
Thanks for bringing up ideas for a way forward to reducing compile times, it's very much appreciated and I think it's good to start this sooner than later. MonomorphizationFirst of all, I agree that in most cases, monomorphization can be considered a convenience rather than a necessity (i.e. performance). Thus I believe that unless a function or closure isn't going to be called millions of times per second, there is no need to monomorphize it. Most methods and functions provided here fulfil this criterium. This affects most places where callbacks and the about
|
Thanks, I've opened Byron/prodash#22
That can also be done easily,
Once For sharing across threads, I think |
It already does it, but it's using shared ownership. Doing so actually makes it easier to use as those who don't need ownership over the AtomicUsize which lies underneath can always borrow it to scoped threads if they wish. Changing this is probably more like a rewrite of what I couldn't resit but to get an overview for myself of what's possible here, and I think it's a lot. For one, I already started the compartmentalization in
It may look overwhelming, layout is hard 😅. These big crates at the bottom are basically foundation, they are always pulled in one way or another. Everything prefixed with |
I thought about monomorphization a little more and after consulting with @joshtriplett (thank you 🙏) came to a conclusion that I find exciting enough to share. First of all, what are the costs of monomorphism? For the Rust compiler, at least the simple way they are used here, there is no significant overhead, so With this in mind, and knowing that monomorphism is best for performance in this very performance-minded crate-ecosystem, and also good for convenience, I arrived at the following conclusions.
Since I invite everyone involve to let me know their thoughts or point out mistakes or other flaws that ought to be improved. After all, I am threading on new grounds here. Thank you🙏. |
@Byron Another advice for speeding up compilation of gitoxide itself: I think it might be a good idea to support multi-call binary that can do both This would reduce:
|
That's a good idea! It would check what is name is and then act differently depending on that. I think from a setup and installation perspective, it's easiest to declare to binaries with the same |
I spiked it and think that doing so only benefits installation if links are used, overall compile time should then be lower, as well as the overall size on disk. However, for CI it would then take longer as it rebuilds |
This increases installation size, but should decrease build time as symbols and optimizations can be shared more. Once deduplication of generics has happened, these wins should increase even more. However, as it probably increases compile times for CI as it will take longer to build each binary with different compile time flags. Thus I just leave the `uni.rs` file for reference.
New idea: I will open a separate PR to de- I also found several functions using Originally posted on #992 (comment) P.S. I'm really busy recently, would definitely get around to gix-macros |
Maybe it's OK to just save the time, as these functions could also be de-momo'ed by hand. What's much more important to make
I thought the same and have done a lot of work in that regard in this PR. The intermediate results are in and in terms of compile time, there is no change at all 😩. dynified many plumbing crates
main - no change
Only the debug-binary sizes show some promise at least, but everything else is still unchanged. It seems like it's a lot of work for nothing more than a few scraps :/. I will continue with the This probably also means that compartmentalizing |
That's exactly my thought, I want to keep mono simple and maintainable and de-momo TryInto by hand since it is a bit too complex to do it in proc-macro.
I think we should do the benchmark on GHA since this is the main blocking when creating and updating a PR, especially Windows and MacOS, which are much slower than the Linux machine with less cores. Not to mention MacOS has security checking turned on by default which slow down I/O and add more CPU usage, Windows might also have defender and virus scanning turned on and its disk I/O is also not as good as Linux.
This could be useful on GHA where maximal size for a single cache is 10G after compression using zstd (unless you compress the data yourself before passing to @actions/cache)
Definitely not true, I believe our effort would add up and positively affect the GHA CI time and local dev.
Yes, it definitely will be the most impactful.
I think even objects interaction can be behind feature flags, since many would just hit the checked out worktree directly. |
In the end, this is what matters. However, I think that local build times relate strongly to GHA build times, and a crate taking half the time to compile locally should take half on GHA as well. Not seeing any improvements locally is quite an indicator that nothing great can be expected on GHA either.
Let's wait for GHA - as of now the data I gathered suggests it's not worth it, at least from that point of view. From another point of view, it's valuable as the method signatures seem simpler while avoiding duplication in the final binary.
There is also practical concerns - features interact with each other and it's simply not feasible to offer something like "want objects, but not refs, this, but not that" - it's going to make me crazy (I have been there 😅). Feature toggles must be well-chosen to optimize for typical use-cases, only the most important of them. Some features are orthogonal though, and these are mix-and-match, but for basics like objects, these just need to be there. Feature toggles are a huge burden, and not a single one will be added without knowing that it has high-enough value. |
One important difference to note is that, GHA machines are much less powerful than a dev's own laptop, and they are heavily overloaded so even a tiny bit of improvements that can be ignored on the local build can turn out to be a noticeable win on GHA. GHA's default runner:
The CPU for Windows & Linux is not only old, but also server CPUs which has weaker single-core perf in exchange for better scale-up, while the MacOS x86_64 is known to have heating issues and slow until M1 comes along... So I don't think running locally is good indicator for GHA, the changes could have much more effect on GHA than locally.
If the data from GHA is insignificant, then it's probably insignificant in reality. Though the reduced binaries is a good thing consider GHA only provides 14G of storage and 10G of cache.
That's true, I didn't consider that. |
I apologize for having been so negative - this was an expression of my disappointment, having expected massive wins but getting only marginal ones. Having thought about this more, it's clear now that Rust is actually pretty good and fast when handling with typical-use generics. This comes at no surprise as after all, the standard library is full of it. It's also what makes it useful and convenient while delivering optimal performance. With the learnings I am now going through it all again and take a more differentiated approach. Small methods, especially those with just a single generic type which doesn't have too many possible source types, can stay generic as the total amount of cost through duplication is small - a price worth paying. However, when the methods get larger, I while still wanting the extra convenience, I put in an inner function myself to limit the amount of code generated. Finally, In the end, if nothing else, it sets a nice precedent for the codebase to maintain from now on. The reason I like it in particular is that |
I'm also a bit disappointed here, though optimizing compile-time ia indeed hard problems.
I think you can still use momo for where it can apply since it's more readable.
👍 though FnOnce is indeed a hard problem that needs rustc to support "move-by-ref" and better Unsize support. For now, we can create a wrapper (NOTE that I wrote this on my iPad and haven't run through miri and is just a proof of concept):
Yeah, that's unfortunate that generics have a cost, but there's no free lunch (unless rustc implements parallel frontend or other improvments). |
You obviously know a lot about these things, and maybe there are solutions the problems I couldn't resolve yet. The bigger picture is An annoying, but probably surmountable problem is the error type. By now I am avoiding having generic errors, and instead just box the errors returned by these closures. Unfortunately, these The next problem for which I have absolutely no feasible workaround is that those who use threads to parallelize object access need the One solution I am contemplating with to at least solve the But I don't like it very much and will probably just keep it generic for now for a lack of indication that it actually makes a meaningful difference. Next up is to try more granular feature toggles in |
Is it reasonable to have a gix-error crate where it has a newtype over Box<dyn Error + Send + Sync> and provides various convenience function and From impl? We could also add an ErrorKind for gix and a message field to it for more readable error messages so that we can avoid creating new error enums using thiserror.
I think it might be possible
And then we can add a new proc-macro to gix-macros:
Though I have to finish momo first, and then I will have time for this one. Rant on dtolnay's absolute magic yet fragile I was searching on libs.rs and found I was quite happy but I decided to read the source code to understand how this magic crate works and found it using a fragile hack that can be broken by rustc at anytime. And given that author is David Tolnay and all his recent history and my experience with his crates, I now found it hard to continue unconditionally trusting his crates and probably better to avoid them unless necessary. I used his crate gh-token in cargo-binstall and turns out he didn't review the PR contributed carefully and introduced a bug in binstall (it executes |
I think this is a stroke of genius! But let me explain! Having a But even with that said, I wouldn't want to fork
This is always a point of contention as these are two different and incompatible paradigms. One error to rule them all, which is easy to use if you want to forget about it, but hard to use if you actually want to check for particular errors, as any error is possible and there is no compile time protection. But what if one could chose the error style at compile time? Truth to be told, I think treating errors like What if there was a compile-time feature in And yes, an And here comes another twist: You would have to be the one to do it 😅 - I really am no good for proc-macros just yet. What do you think? |
Would this not cause problems with libraries that depend on gix? My thinking is that if a library does not enable the |
That's a valid point, thanks for bringing it up (I didn't think of it)! The end-user should be the one who choses, and those in-between libraries would then not be allowed to rely on capabilities of these errors that might not exist in the final application, as there can only be one. The baseline for matching on an error would then have to be that basic one-for-all enum Thinking further, this probably means that such an error should really be specific to the But maybe that isn't a problem if it's made clear that the only proper usage is to enable |
My concern with the-one-error is same as @niklaswimmer : it would be breaking code since cargo unifies the features. However I do get what you are proposing and I think we can do it in reverse: Have an "error-tree" feature which implements a function on the gix_errors::Error to obtain the tree of errors:
The node name will be something like "clone.fetch.http". This would avoid all the annoying and repeated use of thiserror just to generate different kinds of error in a lot of mods. Alternatively, a provider-like function would also be very helpful in traversing deep-tree of errors: rust-lang/rust#96024 |
❤️ This is even more bold than I originally stated (if I understand it correctly) as it would be much like I invited you to Lastly, I think that using Thinking about it, this probably means the I think technically, all this can work, but it also sounds like a lot of it 😅. Can this be done, or is it a pipe-dream? Because one thing is for sure - I wouldn't be the one to do it. PS: I am happy to transfer full ownership over |
Thanks @Byron I had accepted the invitation, though I have to finish the gix-macors PR first. We would have to try this before knowing if it would work. Though I do think that keep making new enums is not sustainable for large codebase, we hit the same limitation in cargo-binstall https://github.com/cargo-bins/cargo-binstall/blob/b36674a1c4f14211e70998bde91b23e7d7f1d15d/crates/binstalk/src/errors.rs#L56 We use mette and thiserror for error with helps and it's extra hard to remove them due to the convenience of proc-macro. Maintenance of such large enum isn't easy since some error types are so large and it requires boxing and we have to manually implement From impl for these types. If the error types are too large, then it would cause extra codegen and perf issues due to having to copy them around the stack and it often also occupies unused space on stack. Combined with async which often has problems optimizing out copying, it makes it worse by also enlarging the future auto-generated. |
Great, thanks so much!
I agree that once these grow, it becomes unfeasible. The 'higher' the level of function to call, the more errors are contained in that tree, possibly with multiple boxes within it to reduce stack size. But, I also think that up to some size, it's feasible, doable, and a 'perfect' representation of what can happen, even though it's not accessible by traversal or more generic means of accessing the tree. And this brings me another possible insight that makes me feel even better about this. What if the This is how high-level functions can opt to use the generic version of the tree right away, and the underlying methods that it calls may provide enum-style/thiserror-style errors, but these easily convert with Speaking of data-model - maybe initially it is 'simple' with a path to the node in the tree, along with a string that was generated as message, for example based on the To me all this seems very feasible to happen over time, and technically it seems more than possible and a natural addition to what And yes, it is exciting to see how this could also be used in |
👍 that is probably how we are going to approach these problems.
We can already provide that information with Error::source. |
I have added as many feature toggles as possible and will cut a new release now. The recommendation is to go with It's a little comical that these measures improve everyones CI times except for the ones of |
Just as a bit of a note. I've found that setting the linker to mold makes a huge difference, in overall speed. |
Summary 💡
De-monomorphisation
Function like
gix::clone::PrepareFetch::new
can be trivally de-monomorphised by extracting aninner
function.@Byron It's possible we can do this without sacrificing code readability by creating our own variant of
momo
.For
gix::clone::PrepareFetch::fetch_then_checkout
it would be harder sinceProgress
is not andyn
-safe trait.I would like to add another provided method to
Progress
:Features
For starters, I think we can put
gix_negotiate
,gix_ignore
andgix_utils
behind new feature flags.Then we can put ability to fetch and update new repository into its own features since many crates might just want to discover local repository, e.g.
vergen
.Motivation 🔦
As shown in cargo-bins/cargo-binstall#1304 (comment) , compilation of gix (not its dependencies) on GHA MacOS runner took more than 60s, which is a bit too long.
simple-git
, which only uses a fewgix
interface, also takes 34s just to compile.I believe this is caused by two factors:
Progress
dyn
where possible, includingdyn
based progress traits #1008gix
feature toggles #1010gix-error
crate as replacement forthiserror
The text was updated successfully, but these errors were encountered: