Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] for testing : Fix 4Gb limit for large files on Git for Windows #2179

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

PhilipOakley
Copy link

This patch series should fix the large file limit on Windows where 'long' is only 32 bits.

Hopefully this PR will go through the CI/Test pipeline to check if all the patches pass the test suit
in a proper incremental manner. Plus test the MSVC=1 option.

Signed-off-by: Philip Oakley [email protected]

The series did compile at every step, and rebased cleanly on the latest shears/pu.

@PhilipOakley PhilipOakley requested a review from dscho May 3, 2019 18:15
@PhilipOakley
Copy link
Author

The DCO bot is in error (at least from my viewpoint). The ref commit was provided by the signed off by author. It was a necessary pre-requisite, though does have some obvious conflict fixes with the upstream pu.

Commit sha: 9d0e45d, Author: Philip Oakley, Committer: Philip Oakley;
Expected "Philip Oakley [email protected]", but got "Torsten Bögershausen [email protected]".

Not sure if I should just double sign Torsten's patch?

@PhilipOakley PhilipOakley marked this pull request as ready for review May 3, 2019 19:09
@PhilipOakley
Copy link
Author

I went back and rebase the series -i and amended the author for the first patch to be Torsten and force pushed.

Unsure how to get the CI testing to get started.

@dscho dscho force-pushed the shears/pu branch 2 times, most recently from c686e70 to ed36d4f Compare May 7, 2019 14:55
@git-for-windows-ci git-for-windows-ci force-pushed the shears/pu branch 2 times, most recently from ff345ba to 66c432b Compare May 8, 2019 17:23
@dscho dscho force-pushed the shears/pu branch 3 times, most recently from ed04f07 to 2e5a910 Compare May 9, 2019 17:53
@dscho
Copy link
Member

dscho commented May 9, 2019

Not sure if I should just double sign Torsten's patch?

Probably, but you will want to make sure that the recorded author matches the first Signed-off-by: line.

@dscho
Copy link
Member

dscho commented May 9, 2019

Could I ask you to re-target this PR to master instead of shears/pu? The latter is a really fast-moving target, hard to catch.

@dscho
Copy link
Member

dscho commented May 9, 2019

(You will of course want to run git rebase -i --onto git-for-windows/master 5539b3d26c81c622c40ea34f117a0afe4c789293 and force-push...)

@PhilipOakley
Copy link
Author

@dscho I'll rebuild/rebase the series in the next day or three, applying Torsten's patch directly at the start of the series.

I'd deliberately targeted pu to at least be ahead of the game for the conflicts with the extern that are being considered upstream (which raises the issue of how to pass around pre-determined rerere redo fixups for such conflicts

As a side issue, I'm having problems fathoming how the MSVC=1 build should work seeing as I need to patch compat\vcbuild\MSVC-DEFS-GEN which is generated apparently by the vcpkg but I can't find where. See my [email protected] on the googlegroups list https://groups.google.com/forum/#!topic/git-for-windows/Y99a0dzlVJY.

@PhilipOakley
Copy link
Author

@dscho

I've decided against fighting the conflicts until the extern series lands in shears/master.

(What's cooking in git.git (May 2019, #1; Thu, 9))

  • dl/no-extern-in-func-decl (2019-05-05) 3 commits
    (merged to 'next' on 2019-05-09 at d165ac4)
    + *.[ch]: manually align parameter lists
    + *.[ch]: remove extern from function declarations using sed
    + *.[ch]: remove extern from function declarations using spatch
Mechanically and systematically drop "extern" from function
declarlation.

Will merge to 'master'.

Once that's landed my rebase should be cleaner and easier - it was too easy to get confused as to which way all the conflicts were going, especially as they are, for the purpose of the series, incidental irrelevances.

@dscho
Copy link
Member

dscho commented May 13, 2019

I've decided against fighting the conflicts until the extern series lands in shears/master.

Makes a ton of sense to me, @PhilipOakley !

@PhilipOakley
Copy link
Author

I've decided against fighting the conflicts until the extern series lands in shears/master.

Makes a ton of sense to me, @PhilipOakley !

@dscho I see Junio has announced Git v2.22.0-rc0, with the extern series included. Could you ping me when shears/master gets the update? 🚥

@PhilipOakley
Copy link
Author

just seen (follow up to the rc0 announcement):

phili@Philip-Win10 MINGW64 /usr/src/git (size_t7)
$ git fetch --all
Fetching origin
From https://github.com/git-for-windows/git
   d4e5e1ea92..898cf2707e  master     -> origin/master
Fetching my
Fetching junio
Fetching gitster
Fetching dscho-git

So I hope to get on it today.

@dscho
Copy link
Member

dscho commented May 14, 2019

So I hope to get on it today.

Oh, sorry! I was so heads-down into getting Git for Windows v2.22.0-rc0 out the door (and then on trying to tie up a few loose ends in Git v2.22.0-rc0 in time for -rc1) that I missed this.

The update of master that you received does not include the changes in v2.22.0-rc0.

You may ask "why?" and it is a very legitimate question. The answer boils down to "I want to keep the door open for Git for Windows v2.21.1, if need be".

You see, I am not at all a fan of many release branches. And my automation really is centered around master, so it would make things a bit hairy if I had to publish v2.21.1 after rebasing master onto the v2.22.0-rc* train.

So the update to master that you got is a rebase onto v2.21.0, essentially, for the sole purpose of restructuring the entire branch thicket (and applying all the fixup!s/squash!s). In other words, if you call git diff d4e5e1ea92..898cf2707e, you will see that I only reworded one comment and moved it slightly. That is the tree diff. The range-diff looks quite a bit different, though! I moved tons of branches into the ready-for-upstream thicket, in preparation for upstreaming them after Git for Windows v2.22.0 is out.

However, hope is near ;-)

I do, of course, make those -rc* previews from Git commits, but those live on prepare-for-v2.22.0 (and are tested in #2192).

And once Git v2.22.0 is close enough (I won't publish a Git for Windows v2.21.1 if I expect to publish a v2.22.0 within a week, unless there are serious security concerns), I will fast-forward master to prepare-for-v2.22.0 (possibly merging master into the prepare-for-v2.22.0 before that).

Therefore, I would suggest to simply re-target your branch to prepare-for-v2.22.0 for the moment, that'll be stable enough.

@dscho
Copy link
Member

dscho commented Jun 4, 2019

Now for updating this here PR.

Updated! Let's see where the Pipeline leads us.

@dscho
Copy link
Member

dscho commented Jun 4, 2019

@tboegi you were active over in #1848... did you work more on the patches? If so: I would like to have your updated in this here PR. If not, could I ask you to give this a look-over? (I will wait with my review until the build is complete, in case that I need to fix anything there.)

@danilaml
Copy link

danilaml commented Jun 4, 2019

I vaguely remember that it (%z) wasn't fully compatible with all the different type and compiler systems

Indeed, it's not in C89/C90 (which is what Git aims for).

I've seen quite a few C99 features in current git source. Indeed, even inttypes.h, which is used in a workaround, comes from C99. However, I understand the compatibility concern. For example, MSVC annoyingly didn't support %z specifiers until VS2015.

@dscho
Copy link
Member

dscho commented Jun 4, 2019

For example, MSVC annoyingly didn't support %z specifiers until VS2015.

And we try to stay compatible even with setups that are ridiculously different from Linux, such as NonStop. There is even work under way to port Git to Plan 9.

So yes, while some of us (although not me) are always eager to make use of these shiny features, Git is actually quite conservative (in a good sense, of course).

Copy link
Member

@dscho dscho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of comments, but the biggest one will be to break down the huge commit (I have no doubt that it will be rejected by the Git mailing list, both because the mail server will bounce it, and reviewers will refuse to even look at it).

The real trick for this entire PR will be to draw appropriate boundaries between logically separate changes.

test_description='test large file handling on windows'
. ./test-lib.sh

test_expect_success SIZE_T_IS_64BIT 'require 64bit size_t' '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit message claims that this is for Windows, but the prereq does not actually test that. You will want to use the prereq MINGW for that.

Besides, I think that SIZE_T_IS_64BIT is only part of the remaining prereqs: we probably want to also ensure that sizeof(long) == 4, because otherwise this test case will trivially succeed, even if nothing in Git's source code was fixed.

To that end, I would like to imitate the way we test for time_t-is64bit via the test-tool.

And when we do that, we do not even need the MINGW prereq, nor do we need to mention Windows specifically: it is no longer about Windows, it is about systems where the size of long differs from that of void *.

Finally, I think it would make sense to fold the commit that introduces the SIZE_T_IS_64BIT prereq into this commit, and probably even move it into this here file, as that would make it easier on reviewers (both present as well as future ones).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the initial test was incomplete, and just sufficient to ensure I could see all the steps that include a long conversion. When I looked at the use of zlib there were 28 places! so a lot of different parts of the test could be impacted, hence checks everywhere.

I also wanted to explicitly test that the zlib compile flags were the problematic ones.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's indeed what you should do while developing the test case. Once your done, it needs to be polished to serve the particular use case to allow easy debugging of future regressions. And that is where I think that the git log --stat and the compile flags and stuff like that won't help, so it needs to go before we merge.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compile flags should be part of the pre-requisite to detect that Size_t is 64 bit but zlib is only 32bit.. which is currently part of the >4Gb failing case. Your right it's not in there yet but should be (not sure how to do it yet though).

@@ -0,0 +1,23 @@
#!/bin/sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After I read this commit message, I wonder how we can munge it to look more like the other commit messages in Git. I.e. if you compare the commit message e.g. of 581d2fd to this:

 Add test for large files on windows

Original test by Thomas.
Add the extra fsck to get diagnostics after the add.
Verify the pack at earliest opportunity
Slight confusion as to why index-pack vs verify-pack...
It's -v (verbose) not --verify
Specify an output file to index-pack, otherwise it clashes with
the existing index file.
Check that the sha1 matches the existing value;-)

then it is hard not to spot a stark difference. Maybe we could write something like this instead:

Add a test for files >2GB

There is nothing in the C specification that claims that `long` should
have the same size as `void *`. For that reason, Git's over-use of
`unsigned long` when it tries to store addresses or offsets is
problematic.

This issue is most prominent on Windows, where the size of `long` is
indeed 32 bit, even on 64-bit Windows.

Let's add a test that demonstrates problems with our current code base
when `sizeof(long)` is different from `sizeof(void *)`.

Thoughts?

BTW I have no idea what you mean by this:

Check that the sha1 matches the existing value;-)

I do not see this at all in this here patch.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sha1 check was one of the reported values in the test but I only manually checked it against what I'd seen via WSL.

the test (and hence message) should also cover the zlib compile flags (not sure if we can determine the crc32 compile flags)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test (and hence message) should also cover the zlib compile flags (not sure if we can determine the crc32 compile flags)

Would a change of those flags constitute a regression? If not (which I think is the case), then it should not be tested.

The sha1 check was one of the reported values in the test but I only manually checked it against what I'd seen via WSL.

Ah, okay. I am not sure that is strictly an interesting information for the commit message, but more for the cover letter.

git fsck --verbose --strict --full &&
git commit -m msg file &&
git log --stat &&
git gc &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to git gc both here and 3 lines further down? Would this really help debugging regressions in the future?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost very step is error prone and some do not repeat outside of the test environment (differing big file threshold effects I think) i.e. pack vs loose object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we really need a more robust test. Otherwise a developer might be tempted to think that they fixed a flawed regression test, all the while a real regression crept in.

So please indulge me: what exactly is this regression test case supposed to test?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an expansion of the 'does it perform correctly' test that Thomas B initially developed.

Each of the fsck, verify-pack, log and gc commands can barf. Plus if run manually it actually does a slightly different test (add to loose vs pack), which I didn't get to the bottom of!

In this case he had deliberately set the compression to zero so that the compressed file would exceed the limit, we also 'need' (if we are to be complete) to check for packed and loose encoding, etc etc (including breakages at various process stages, add vs commit,

I'd agree that the testing still has a long way to go to get a smooth set of tests that satisfy all. It's all low level code issues that can pop up in many places.

It may be that we have a two phase plan. First make the code patches visible (if the code isn't working...), then make the testing acceptable (to as yet unknown/discussed/decided standards).

git verify-pack -s .git/objects/pack/*.pack &&
git fsck --verbose --strict --full &&
git commit -m msg file &&
git log --stat &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather leave this out. It might have made sense for debugging when this patch was developed, but the test case should be optimized for catching, diagnosing and debugging future regressions, nothing more, nothing less.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depends on the testing criteria.
Maybe we have a full fledged 'check everything every step' commit immediately followed by a commit that trims it back to 'just detect an end to end fail', with message that states that the debug version is the previous commit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I regularly investigate test failures in Git's test suite, as I am one of the very few who even looks at our CI builds.

You should try it, too. It's very educative.

I can tell you precisely how I feel about test cases like this one, but you would have to hide younger children first.

In short: no, let's not do this. Let's not make life hard on the developers who will inevitably have to investigate regressions. Let's make it easy instead. The shorter, conciser, and more meaningful we can make the test case, the better.

Remember: regression tests are not so much about finding regressions, as they are about helping regressions to be fixed. That is the criteria you should aim to optimize for. Always.

@@ -370,7 +370,7 @@ static void start_put(struct transfer_request *request)

/* Set it up */
git_deflate_init(&stream, zlib_compression_level);
size = git_deflate_bound(&stream, len + hdrlen);
size = git_deflate_bound(&stream, len + (size_t) hdrlen);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since len is already of type size_t, this cast is superfluous. The only real difference this would make was if hdrlen was negative. Which cannot be the case because xsnprintf() is explicitly not allowed to return negative values.

So I'd just drop this hunk.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't assume the type of len because it (git_deflate_bound) may be a macro, or a function, and it is then subject to the compile flags.

Also, one of my readings of the various 'standards' suggested that it is within the implementation defined definition to downcast oversize variables if one (of the computation) was the normal 'right' size (e.g. long).

The whole up/down cast cascade and dual default function templates thing made me pedantic/explicit!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't assume the type of len because it (git_deflate_bound) may be a macro, or a function, and it is then subject to the compile flags.

Yes we can. Because we declared it ourselves, a couple lines above the shown diff context, as size_t len. So we can very much assume that its type is size_t.

it is within the implementation defined definition to downcast oversize variables if one (of the computation) was the normal 'right' size (e.g. long).

Nope. Arithmetic expressions are always evaluated after upcasting narrower data types to larger ones. A compiler that downcasts is buggy.

You probably think about an example like this:

short result;
long a = 1;
int b = 2;

result = a + b;

This is a bit tricky to understand, as the evaluation of the arithmetic expression a + b really upcasts the int to a long (unless they already have the same bit size). Only after that is the result downcast in order to be assigned to the narrower result.

That example has little relationship to the code under discussion, though, except in the case where git_deflate_bound() is defined as a function whose second parameter is declared with a too-narrow datatype. In which case you simply cannot do anything about it, except replace the function by our custom macro (as you did in another patch in this PR).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree about the upcast to long. But when long is meant to be the maximum arithmetic calculation, then size_t may be down cast (as per Microsoft's desire that 32 bit code stays unchanged on 64 bit machines, so only (arithmetically) address 32bit range, unless one specially changes to full pointer calculation. As I understand it it's a bit open ended (which standard is your go to reference document?)

It maybe that the commit message needs to be long to explain the ripple through consequences, or we make the code informative. Its one of those 'discursive' areas.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't assume the type of len

But we can. It is declared as size_t len;. See here:

git/http-push.c

Line 363 in a4b2bb5

size_t len;

* On Windows, uInt/uLong are only 32 bits.
*/
extern uLong xcrc32(uLong crc, const unsigned char *buf, size_t bytes);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The crc32() function is declared and defined in zlib. It would be most helpful to mention this in the code comment, along with the hint that zlib.h is already included in cache.h.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I had though that we got it from the gnu stuff (i.e. I found it's definition from the gnu ref pages, not the zlib pages).
Double check needed. It's good if you are right (probably are..)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like I was misinformed. The crc32 is part of the zlib library and I had missed that fact.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I did not know either until I looked which C standard defines the crc32() function, and did not find any.

@@ -352,7 +352,7 @@ static int write_zip_entry(struct archiver_args *args,
if (!buffer)
return error(_("cannot read %s"),
oid_to_hex(oid));
crc = crc32(crc, buffer, size);
crc = xcrc32(crc, buffer, size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense, but the sentence "Zero length initialisations are not converted." is very cryptic.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any better suggestions. Zero length fits into both 32bit and 64bit longs ;-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what you mean by that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(That sentence makes as much sense to me as "Correct horse battery staple".)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe? Zero length initialisations work with both 32bit and 64bit uInt len values ;-) (only slightly less cryptic)

Otherwise it's a lot of noise changes for no benefit (there are quite a few initialisations..).

unsigned long newpos, newlines;
size_t leading, trailing;
size_t oldpos, oldlines;
size_t newpos, newlines;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think that we can, and should, break apart this huge patch. In other words, I don't believe the statement that this is the minimal patch ;-)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minimal patch

Can I just say that the Github review interface is really poor here. I have no idea (from the web page) which patch is which, which belongs to which commit etc.

Assuming this was Torsten's original patch it did fit on the mailing list as a 'less than sufficient' patch https://public-inbox.org/git/[email protected]/ (it's how I applied it, from an email on-list;-).

If the 'minimal' comment is about something else, excuse my mistake.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea (from the web page) which patch is which, which belongs to which commit etc.

That's an optimization in the web UI, with a very easy way out: click on the file name above the diff, i.e. https://github.com/git-for-windows/git/pull/2179/files/5b60f72b34f213309d5870b7df4b1038914e4fc0..9c2f8de134713f7e04fc2402567c5b0c29605737#diff-3edc96d0eefd86960d342f661171a62c

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this was Torsten's original patch it did fit on the mailing list as a 'less than sufficient' patch

It changes 85 files. Eighty-five! Nobody can review such a patch properly, not @tboegi, not you, not me.

And when I look at the diffstat, I immediately see e.g. that there are plenty of changes in apply.c and plenty in builtin/pack-objects.c. Those two files share preciously little concern. So I am rather certain that it would be easy even to split this at the file boundary.

Which still does not make sense, as it is still not a logically-minimal patch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to split this at the file boundary.

Sorry, I meant at that file boundary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And indeed, if I apply only the builtin/pack-objects.c part, for example, it compiles Just Fine. So the commit message is maybe not quite truthful when it says

The smallest step seems to be much bigger than expected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or when it says this:

However, when the Git code base is compiled with a compiler that complains that "unsigned long" is different from size_t, we end up in this huge patch, before the code base cleanly compiles.

As I just proved, there is a quite a bit smaller patch that already compiles cleanly: just the 70 changed lines of builtin/pack-objects.c. And I bet that we can do more of the same to cut this into nice mouthfuls, without breaking the build.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PhilipOakley I'd like to focus your attention on the two comments above this one.

@@ -481,7 +481,7 @@ static void *unpack_raw_entry(struct object_entry *obj,
unsigned char *p;
size_t size, c;
off_t base_offset;
unsigned shift;
size_t shift;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done, but let's not state that this is for Windows. This is a data type cleanup that benefits 64-bit systems where long is 32 bit wide.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@@ -1057,7 +1057,7 @@ size_t unpack_object_header_buffer(const unsigned char *buf,
size = c & 15;
shift = 4;
while (c & 0x80) {
if (len <= used || bitsizeof(long) <= shift) {
if (len <= used || bitsizeof(size_t) <= shift) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good change. The commit message still needs to be changed to answer the question "why?" instead of the question "what?" because the latter leaves the reader puzzled while the former will leave the reader enlightened.

@dscho
Copy link
Member

dscho commented Jun 4, 2019

I also wonder whether we should try to pour the learnings from #1848 into a test case: after all, I found a couple of issues with an earlier iteration of these patches, while the "big file" test case seemed to pass already...

@PhilipOakley
Copy link
Author

I'll just respond to a few of the points that I have immediate recollection of. Many of the casts were about having an abundance of caution regarding the whole 'implementation defined' aspect of this here 'Bug' (i.e. we can't trust anything for the 64 vs 32 bit long issue ;-)

@dscho
Copy link
Member

dscho commented Jun 5, 2019

Many of the casts were about having an abundance of caution regarding the whole 'implementation defined' aspect of this here 'Bug' (i.e. we can't trust anything for the 64 vs 32 bit long issue ;-)

I understand that. Yet I think some of that caution erred on the side of redundancy, and I would rather not spend the time on the mailing list trying to defend unnecessary casts...

@tboegi
Copy link

tboegi commented Aug 28, 2019

@dscho:
I couldn't find a way to split up the "big patch" into smaller ones without risking
that the steps in between don't compile cleanly.
(And my personal view is that it is the same amount of work to review a patch
with 84 file changed or to review 84 patches changing 1 file each)
@ PhilipOakley I re-worded the commit message and rebased to gfw/master:

https://github.com/tboegi/git/tree/tb.190828_convert_size_t_mk_size_t

@dscho
Copy link
Member

dscho commented Aug 28, 2019

my personal view is that it is the same amount of work to review a patch
with 84 file changed or to review 84 patches changing 1 file each

Having reviewed many patch series, I much prefer reviewing a dozen relatively short patch series over the course of a dozen weeks to having to review all the changes in one big patch. Remember: there are patches so simple and elegant that there is no obvious place for bugs to hide, and there are patches so gigantic and repetitive that there is no obvious place for bugs to hide (but plenty of non-obvious places).

No, I still think we need to do our best job to make this thing easy to review.

If you can augment that big patch by a small Coccinelle patch that generates it, that would make it reviewable, of course. But that enormous wall of diff? Not reviewable at all.

@PhilipOakley
Copy link
Author

@dscho The first size_t commit is not going away. It has always been that big because there are compilers that do not accept less. This is part of the inconsistent behaviours of these implementation defined behaviours (the type size_t is not a C89 feature, even though large pointer spaces are allowed you can't name them).

It maybe that someone wants to split it into a few "won't compile on my machine" patches, but I'd rather we stay with what we have. Torsten was compiling for Rasbian (gcc (Raspbian 6.3.0-18+rpi1+deb9u1) 6.3.0 20170516), not GfW anyway. Martin Koegler's series (mk/use-size-t-in-zlib still on pu) was started in 2017 as a >4Gb issue (found by searching the archive for LLP64)

The size_t stuff actually had compile warnings back as far as 2007. see https://public-inbox.org/git/[email protected]/, so at some point maybe we need to bite the bullet and actually do the big change.

...

my main problem is the test system, which is something I'm not that familiar with.

I now see the tests as being:
Test 0 check we are linked to the right library (the compile flags helper reports, and match prereqs)
Test 1 needs to: 'add large file as loose object', invokes zlib only, check it reads back OK, without invoking wider code sections.
Test 2 should be ' commit large file as a pack', invokes crc32 checking (same pointer issue as zlib, but fresh code)

Maybe need a size graded test: 1.5Gb, 3.5Gb, 5.5Gb to walk through the two potential barriers at 2Gb and 4Gb.

We also need an easily accessible compiler that has something equivalent to the former -Wshorten-64-to-32 (or vice versa) e.g. Clang? (un-tested). - Just found https://public-inbox.org/git/[email protected]/ in the archives, so it's something.
...

@dscho
Copy link
Member

dscho commented Sep 3, 2019

The first size_t commit is not going away.

I remember clearly that I was able to split out a part from it, earlier this year, into a self-contained commit.

Yes, it is possible to do this incrementally. It absolutely is. You can easily pass unsigned long to functions accepting size_t, that is not a problem! So you can fix an API function to accept size_t, leave its callers alone (until you take care of them, in a later patch). The trick is to break this PR down into bite-sized chunks, so that we can benefit maximally from code review (if the patches are too big, you will not benefit from that extra safety).

@dscho
Copy link
Member

dscho commented Sep 3, 2019

The first size_t commit is not going away.

I remember clearly that I was able to split out a part from it, earlier this year, into a self-contained commit.

I think I even mentioned this somewhere in this PR or in a related one. And note: I did not even try hard. So I don't buy the argument "this patch is big, it cannot be broken down further". I don't buy that at all.

@git-for-windows-ci git-for-windows-ci changed the base branch from master to main June 17, 2020 18:10
PhilipOakley added a commit to PhilipOakley/git that referenced this pull request Oct 21, 2021
Test whether the Git CI system allows, or fails, when LLP64 systems
need variable type coercion from size_t back to unsigned long.

All the *nix systems should pass, but compile checks for LLP64 systems
may fail. This is a crucial information spike for Git files > 4Gb,
i.e. Issues: git-lfs git-for-windows#2434, GfW git-for-windows#2179, git-for-windows#1848, git-for-windows#3306.

Signed-off-by: Philip Oakley <[email protected]>
@dscho
Copy link
Member

dscho commented Oct 27, 2021

The first size_t commit is not going away. It has always been that big because there are compilers that do not accept less.

Let #3487 serve as a counter argument, Tiny, well-contained changes from unsigned long to size_t that compile and pass, and even demonstrably fix bugs.

@PhilipOakley If you are truly interested in pushing this forward, I encourage you to have a look at the end of this PR comment, where I follow the rabbit hole of the call chain of read_object_file(). This lays out a natural plan to split up that big commit in neat, compileable patches: just convert the individual functions of that call tree, leaves first.

Granted, that won't be as easily written as a semi-automated search-and-replace. But then, such a semi-automated, huge commit has not a slither of a chance to get reviewed on the Git mailing list, let alone accepted.

@PhilipOakley
Copy link
Author

a natural plan to split up that big commit in neat, compileable patches: just convert the individual functions of that call tree, leaves first.

I had a simplistic look at the call tree to see how many times each was referenced.

    read_object_file()  38 ***
        repo_read_object_file() 5 
            read_object_file_extended() 2
                read_object() 3 (2 identical fns)
                    oid_object_info_extended() 26 [if's]
                        do_oid_object_info_extended() 2 [ret]
                            loose_object_info() 1 [self]
                                parse_loose_header_extended()  3[if's]
                            packed_object_info()  5
                                cache_or_unpack_entry()  1  **
                                    unpack_entry() 1
                                        unpack_object_header() 7
                                        -> unpack_object_header_buffer() 3 + t/1
                                get_size_from_delta() 2
                                    get_delta_hdr_size() 4

following your rabbit hole of the call chain (URLs in link) of read_object_file().

My first wondering was : Depth first or Width first?
There is the potential for a bit of a combinatorial explosion there.
(I did notice how the solution there for git-lfs used casts and flipped to the fn marking for the tbd steps)

I noticed there were two identical static void *read_object( definitions (object-file.c and packfile.c) though I wasn't sure if it was supposed to happen.

I may, as a narrow starter, just do hash-object for a loose object only (behind a bigFileThreshold protection..), to try to drain rod the process with a simple testable example. (i.e. no pack files at this step.! )

@dscho
Copy link
Member

dscho commented Oct 28, 2021

My first wondering was : Depth first or Width first?

You could try both approaches, and then go with the one that is easier to review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants