Support of PyPy interpreter (smmap, gitdb, and GitPython) #49

zougloub · 2023-02-18T01:04:29Z

I was hoping to get this to run under PyPy but this is happening.

======================================================================
FAIL: test_basics (smmap.test.test_buf.TestBuf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "smmap-5.0.0/smmap/test/test_buf.py", line 114, in test_basics
    assert manager.collect()
AssertionError

I may take a look and suggest a fix, but maybe a someone who knows the codebase can figure this problem out easily?

Edit: the code makes a heavy use of del expecting instant behavior, but this is not automatic under pypy. See http://docs.cython.org/en/latest/src/userguide/pypy.html

The text was updated successfully, but these errors were encountered:

Byron · 2023-02-18T06:27:07Z

I wouldn't worry too much about it and rather deactivate the test under PyPy. By now, I would be surprised if smmap is working well at all, and believe that the recommended codepaths in GitPython won't be using it anyway.

Once PyPy works, please feel free to submit a PR and add it to the workflow.

zougloub · 2023-02-18T07:54:35Z

The failure occurs with SlidingWindowMapManager, not StaticWindowMapManager, the tests pass when SlidingWindowMapManager is not used.

diff --git a/smmap/test/test_buf.py b/smmap/test/test_buf.py
index f0a86fb..0a6eea8 100644
--- a/smmap/test/test_buf.py
+++ b/smmap/test/test_buf.py
@@ -82,5 +85,5 @@ class TestBuf(TestBase):
             for item in (fc.path, fd):
-                for manager, man_id in ((man_optimal, 'optimal'),
-                                        (man_worst_case, 'worst case'),
-                                        (static_man, 'static optimal')):
+                for manager, man_id in (#(man_optimal, 'optimal'),
+                                        #(man_worst_case, 'worst case'),
+                                        (static_man, 'static optimal'),):
                     buf = SlidingWindowMapBuffer(manager.make_cursor(item)

gitdb and GitPython fail their tests too, my guess is that they seem to rely on CPython-specific behavior of del working immediately (use of context managers can fix that).

zougloub · 2023-02-18T08:01:37Z

It's also use of del that causes the smmap to fail.
The following use modification makes the tests pass:

diff --git a/smmap/test/test_buf.py b/smmap/test/test_buf.py
index f0a86fb..0ab7310 100644
--- a/smmap/test/test_buf.py
+++ b/smmap/test/test_buf.py
@@ -1,126 +1,126 @@
 from .lib import TestBase, FileCreator
 
 from smmap.mman import (
     SlidingWindowMapManager,
     StaticWindowMapManager
 )
 from smmap.buf import SlidingWindowMapBuffer
 
 from random import randint
 from time import time
 import sys
 import os
 
 
 man_optimal = SlidingWindowMapManager()
 man_worst_case = SlidingWindowMapManager(
     window_size=TestBase.k_window_test_size // 100,
     max_memory_size=TestBase.k_window_test_size // 3,
     max_open_handles=15)
 static_man = StaticWindowMapManager()
 
 
 class TestBuf(TestBase):
 
     def test_basics(self):
         with FileCreator(self.k_window_test_size, "buffer_test") as fc:
 
             # invalid paths fail upon construction
             c = man_optimal.make_cursor(fc.path)
             self.assertRaises(ValueError, SlidingWindowMapBuffer, type(c)())            # invalid cursor
             self.assertRaises(ValueError, SlidingWindowMapBuffer, c, fc.size)       # offset too large
 
             buf = SlidingWindowMapBuffer()                                              # can create uninitialized buffers
-            assert buf.cursor() is None
 
-            # can call end access any time
-            buf.end_access()
-            buf.end_access()
-            assert len(buf) == 0
+            with (buf, c):
+                assert buf.cursor() is None
 
-            # begin access can revive it, if the offset is suitable
-            offset = 100
-            assert buf.begin_access(c, fc.size) == False
-            assert buf.begin_access(c, offset) == True
-            assert len(buf) == fc.size - offset
-            assert buf.cursor().is_valid()
+                # can call end access any time
+                buf.end_access()
+                buf.end_access()
+                assert len(buf) == 0
 
-            # empty begin access keeps it valid on the same path, but alters the offset
-            assert buf.begin_access() == True
-            assert len(buf) == fc.size
-            assert buf.cursor().is_valid()
+                # begin access can revive it, if the offset is suitable
+                offset = 100
+                assert buf.begin_access(c, fc.size) == False
+                assert buf.begin_access(c, offset) == True
+                assert len(buf) == fc.size - offset
+                assert buf.cursor().is_valid()
 
-            # simple access
-            with open(fc.path, 'rb') as fp:
-                data = fp.read()
-            assert data[offset] == buf[0]
-            assert data[offset:offset * 2] == buf[0:offset]
+                # empty begin access keeps it valid on the same path, but alters the offset
+                assert buf.begin_access() == True
+                assert len(buf) == fc.size
+                assert buf.cursor().is_valid()
 
-            # negative indices, partial slices
-            assert buf[-1] == buf[len(buf) - 1]
-            assert buf[-10:] == buf[len(buf) - 10:len(buf)]
+                # simple access
+                with open(fc.path, 'rb') as fp:
+                    data = fp.read()
+                assert data[offset] == buf[0]
+                assert data[offset:offset * 2] == buf[0:offset]
 
-            # end access makes its cursor invalid
-            buf.end_access()
-            assert not buf.cursor().is_valid()
-            assert buf.cursor().is_associated()         # but it remains associated
+                # negative indices, partial slices
+                assert buf[-1] == buf[len(buf) - 1]
+                assert buf[-10:] == buf[len(buf) - 10:len(buf)]
 
-            # an empty begin access fixes it up again
-            assert buf.begin_access() == True and buf.cursor().is_valid()
-            del(buf)        # ends access automatically
-            del(c)
+                # end access makes its cursor invalid
+                buf.end_access()
+                assert not buf.cursor().is_valid()
+                assert buf.cursor().is_associated()         # but it remains associated
+
+                # an empty begin access fixes it up again
+                assert buf.begin_access() == True and buf.cursor().is_valid()
 
             assert man_optimal.num_file_handles() == 1
 
             # PERFORMANCE
             # blast away with random access and a full mapping - we don't want to
             # exaggerate the manager's overhead, but measure the buffer overhead
             # We do it once with an optimal setting, and with a worse manager which
             # will produce small mappings only !
             max_num_accesses = 100
             fd = os.open(fc.path, os.O_RDONLY)
             for item in (fc.path, fd):
                 for manager, man_id in ((man_optimal, 'optimal'),
                                         (man_worst_case, 'worst case'),
-                                        (static_man, 'static optimal')):
-                    buf = SlidingWindowMapBuffer(manager.make_cursor(item))
-                    assert manager.num_file_handles() == 1
-                    for access_mode in range(2):    # single, multi
-                        num_accesses_left = max_num_accesses
-                        num_bytes = 0
-                        fsize = fc.size
+                                        (static_man, 'static optimal'),):
+                    with SlidingWindowMapBuffer(manager.make_cursor(item)) as buf:
+                        assert manager.num_file_handles() == 1
+                        for access_mode in range(2):    # single, multi
+                            num_accesses_left = max_num_accesses
+                            num_bytes = 0
+                            fsize = fc.size
 
-                        st = time()
-                        buf.begin_access()
-                        while num_accesses_left:
-                            num_accesses_left -= 1
-                            if access_mode:  # multi
-                                ofs_start = randint(0, fsize)
-                                ofs_end = randint(ofs_start, fsize)
-                                d = buf[ofs_start:ofs_end]
-                                assert len(d) == ofs_end - ofs_start
-                                assert d == data[ofs_start:ofs_end]
-                                num_bytes += len(d)
-                                del d
-                            else:
-                                pos = randint(0, fsize)
-                                assert buf[pos] == data[pos]
-                                num_bytes += 1
-                            # END handle mode
-                        # END handle num accesses
+                            st = time()
+                            buf.begin_access()
+                            while num_accesses_left:
+                                num_accesses_left -= 1
+                                if access_mode:  # multi
+                                    ofs_start = randint(0, fsize)
+                                    ofs_end = randint(ofs_start, fsize)
+                                    d = buf[ofs_start:ofs_end]
+                                    assert len(d) == ofs_end - ofs_start
+                                    assert d == data[ofs_start:ofs_end]
+                                    num_bytes += len(d)
+                                    del d
+                                else:
+                                    pos = randint(0, fsize)
+                                    assert buf[pos] == data[pos]
+                                    num_bytes += 1
+                                # END handle mode
+                            # END handle num accesses
 
-                        buf.end_access()
-                        assert manager.num_file_handles()
-                        assert manager.collect()
-                        assert manager.num_file_handles() == 0
-                        elapsed = max(time() - st, 0.001)  # prevent zero division errors on windows
-                        mb = float(1000 * 1000)
-                        mode_str = (access_mode and "slice") or "single byte"
-                        print("%s: Made %i random %s accesses to buffer created from %s reading a total of %f mb in %f s (%f mb/s)"
-                              % (man_id, max_num_accesses, mode_str, type(item), num_bytes / mb, elapsed, (num_bytes / mb) / elapsed),
-                              file=sys.stderr)
-                    # END handle access mode
-                    del buf
+                            buf.end_access()
+                            assert manager.num_file_handles()
+                            assert manager.collect()
+                            assert manager.num_file_handles() == 0
+                            assert res
+                            elapsed = max(time() - st, 0.001)  # prevent zero division errors on windows
+                            mb = float(1000 * 1000)
+                            mode_str = (access_mode and "slice") or "single byte"
+                            print("%s: Made %i random %s accesses to buffer created from %s reading a total of %f mb in %f s (%f mb/s)"
+                                  % (man_id, max_num_accesses, mode_str, type(item), num_bytes / mb, elapsed, (num_bytes / mb) / elapsed),
+                                  file=sys.stderr)
+                        # END handle access mode
                 # END for each manager
             # END for each input
             os.close(fd)

Byron · 2023-02-19T07:22:53Z

Thanks for figuring this out!

Yes, there were times when Python was reference counted, which allowed to trigger destructors/finalizers somewhat consistently, but these times are long gone.

I don't know to what extend it's worth putting time into this - it's clearly not working anymore (at least not as intended) and nobody seems to have taken issue with this. Maybe it's used nowhere anymore.

The reason it can't just be removed is the reluctance to break backwards compatibility.

If you think it's worth making some adjustments to make the tests pass and take it where it needs to be, please feel free to submit PRs for that.

zougloub · 2023-02-19T09:07:54Z

The tests pass with the change above, but it would only fix testing of smmap ; gitdb and GitPython are also broken on PyPy, for the same reasons (closing stuff in destructors, perhaps also how they use smmap, I haven't exhaustively checked). I can't promise I'll find the time to fix all of this by myself...

Byron · 2023-02-20T06:51:23Z

Understandable!

Since smmap and gitdb are not very useful and I don't think they are used that much, maybe there is another way to handle this.

What about detecting, in GitPython, that the runtime environment is PyPy, and based on that certain features won't be imported or made available at all. That way, one can focus on what matters, and probably get to run all tests successfully as well.

zougloub changed the title ~~Unit tests fail under PyPy interpreter~~ Support of PyPy interpreter (smmap, gitdb, and GitPython) Feb 18, 2023

Byron added the waiting for feedback label Feb 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support of PyPy interpreter (smmap, gitdb, and GitPython) #49

Support of PyPy interpreter (smmap, gitdb, and GitPython) #49

zougloub commented Feb 18, 2023 •

edited

Loading

Byron commented Feb 18, 2023

zougloub commented Feb 18, 2023 •

edited

Loading

zougloub commented Feb 18, 2023

Byron commented Feb 19, 2023

zougloub commented Feb 19, 2023

Byron commented Feb 20, 2023

Support of PyPy interpreter (smmap, gitdb, and GitPython) #49

Support of PyPy interpreter (smmap, gitdb, and GitPython) #49

Comments

zougloub commented Feb 18, 2023 • edited Loading

Byron commented Feb 18, 2023

zougloub commented Feb 18, 2023 • edited Loading

zougloub commented Feb 18, 2023

Byron commented Feb 19, 2023

zougloub commented Feb 19, 2023

Byron commented Feb 20, 2023

zougloub commented Feb 18, 2023 •

edited

Loading

zougloub commented Feb 18, 2023 •

edited

Loading