Autogeneration of Python bindings from manually annotated C++ headers
Genpybind is a tool based on clang that automatically generates code to expose a C++ API as a Python extension via pybind11. Say goodbye to the tedious task of writing and updating binding code by hand! Genpybind ensures that your Python bindings always stay in sync with your C++ API, complete with docstrings, parameter names, and default arguments. This is especially valuable for still-evolving APIs where manual bindings can quickly become outdated.
While genpybind does require some manual hints in the form of unobtrusive annotation macros1, it results in a self-contained header file that concisely describes both the C++ and Python interfaces of your library. This approach keeps you in control and requires less heuristics in genpybind's implementation, thereby reducing complexity. Though it does require the ability to modify the original interface declarations, so code which is not under your control needs to fall back on manually written bindings.
Besides the main use case of exposing a C++ API to Python, genpybind has proven useful during C++ library development:
- It enables interactive exploration of a library's API via the Python REPL.
- This exploration can form the basis for unit tests using Python's low-boilerplate testing frameworks like pytest.
- And maybe most importantly, it enables hassle-free property-based testing via hypothesis, which still has no C++-native equivalent.
To expose a C++ interface via a Python module, GENPYBIND
annotations are added
to the C++ declarations:
#pragma once
#include "genpybind.h"
namespace readme GENPYBIND(visible) {
/// Describes how the output will taste.
enum class Flavor {
/// Like you would expect.
bland,
/// It tastes different.
fruity,
};
/// A contrived example.
class Example {
public:
static constexpr int GENPYBIND(hidden) not_exposed = 10;
/// Do a complicated calculation.
int calculate(Flavor flavor = Flavor::fruity) const;
GENPYBIND(getter_for(something))
int getSomething() const;
GENPYBIND(setter_for(something))
void setSomething(int value);
private:
int m_value = 0;
};
} // namespace readme
The resulting module can then be used like this:
>>> import readme as m
>>> obj = m.Example()
>>> obj.something
0
>>> obj.something = 42
>>> obj.something
42
>>> obj.calculate() # default argument
-42
>>> obj.calculate(m.Flavor.bland)
42
>>> print(m.Example.__doc__)
A contrived example.
>>> print(m.Flavor.__doc__)
Describes how the output will taste.
Members:
bland : Like you would expect.
fruity : It tastes different.
>>> help(obj.calculate)
Help on method calculate in module readme:
calculate(...) method of readme.Example instance
calculate(self: readme.Example, flavor: readme.Flavor = <Flavor.fruity: 1>) -> int
Do a complicated calculation.
For the example presented above, genpybind
will generate code equivalent to
the following: (Note that docstrings, argument names and default arguments work
out of the box, without extra annotations.)
void expose_context_readme_Flavor(py::enum_<readme::Flavor>& context);
void expose_context_readme_Example(py::class_<readme::Example>& context);
PYBIND11_MODULE(readme, root) {
auto context_readme_Flavor = py::enum_<readme::Flavor>(
root, "Flavor", "Describes how the output will taste.");
auto context_readme_Example = py::class_<readme::Example>(
root, "Example", "A contrived example.");
expose_context_readme_Flavor(context_readme_Flavor);
expose_context_readme_Example(context_readme_Example);
}
void expose_context_readme_Flavor(py::enum_<readme::Flavor>& context) {
context.value("bland", readme::Flavor::bland, "Like you would expect.");
context.value("fruity", readme::Flavor::fruity, "It tastes different.");
}
void expose_context_readme_Example(py::class_<readme::Example>& context) {
context.def(py::init<>(), "");
context.def(py::init<const readme::Example&>(), "", py::arg(""));
context.def("calculate",
py::overload_cast<readme::Flavor>(&readme::Example::calculate, py::const_),
"Do a complicated calculation.",
py::arg("flavor") = readme::Flavor::fruity);
context.def_property(
"something",
py::overload_cast<>(&readme::Example::getSomething, py::const_),
py::overload_cast<int>(&readme::Example::setSomething));
}
The current implementation is a prototype based on clang's libtooling
API.
A previous proof-of-concept Python implementation I developed at the
Electronic Vision(s) Group ran into limits of the libclang
bindings
and required a patched LLVM/clang build. Still, it's used successfully in the
experiment software stack of their neuromorphic computing platform, i.e., the
described approach is viable for an existing code base.
The current iteration still lacks some polishing. Some known shortcomings remain, the documentation is still lacking, and build support (and automated testing) on different platforms is pending.
- Documentation is minimal at the moment. If you want to look at example use-cases the integration tests might provide a starting point.
- Expressions and types in default arguments, return values, or
GENPYBIND_MANUAL
instructions are not consistently expanded to their fully qualified form. As a workaround it is suggested to use the fully-qualified name where necessary.
Apart from the less involved build process, the current implementation comes with many new features and improvements. For example, considerably better error messages. As a small price to pay there are several breaking changes:
opaque
is now known asexpose_here
.expose_as(__repr__)
(or__str__
) should be used in place ofstringstream
.tag
should be replaced byonly_expose_in
.inline_base
no longer supports globs/wildcards.accessor_for
is no longer supported, usegetter_for
/setter_for
instead.writeable
is no longer supported, usereadonly
instead.
So far, the only tested platform is Fedora Workstation 40, though at least Debian has been tested in the past. You should be able to adapt the instructions to other distributions.
- Check out the repo, the following commands should be run from the repo root.
- Install dependencies:
Inside a virtual environment (e.g., via direnv with
dnf install llvm-devel clang-devel gtest-devel gmock-devel cmake ninja-build
layout python
), install the Python dependencies (used in tests):pip install -r requirements.txt
- Set up the build:
cmake -B build -G Ninja .
- Build and run the tests:
PYTHONPATH=$PWD/build/tests ninja -C build test
See genpybind_add_module
in tests/CMakeLists.txt
for how to integrate genpybind into your build.
Top-level declarations are only exposed via the Python bindings (“visible”) if
they have a GENPYBIND(…)
annotation. Nested declarations, such as member
variables and member functions inherit the visibility of their parent
by default.
There are several possible modifiers that can be passed as arguments to
GENPYBIND(…)
to affect how and where a declaration is exposed or to make use
of advanced pybind11
features.
Behind the scenes, the GENPYBIND
macro expands to an attribute,
in particular the older GNU extension syntax __attribute__
at this time.
Consequently, you can consult the GCC documentation on details
w.r.t. attribute placement. Here are some common examples for your convenience:
struct GENPYBIND(visible) Example {
void hidden_method() GENPYBIND(hidden);
GENPYBIND(readonly)
int readonly_field = 3;
};
enum class GENPYBIND(visible) Enum {};
void example() GENPYBIND(visible);
namespace readme GENPYBIND(visible) {}
TODO: Describe annotation argument types and when quotes can be omitted for string arguments.
visible
and hidden
visible
and hidden
can be used to override the default visibility of
a declaration. By default, top-level declarations are hidden, and nested
declarations inherit the visibility of their parent. So one has to explicitly
“opt-in” to exposing a declaration. Any use of GENPYBIND(…)
annotations (even
without arguments) implies visible
, unless hidden
is used explicitly.
Namespaces are a special case: By default, they have no effect on the visibility
of contained declarations and other attributes on namespaces do not imply visible
.
However, an explicit visible
annotation on a namespace can be used to make all
nested declarations visible by default. The hidden
keyword can then be used to
exclude individual declarations again.
struct GENPYBIND() A {
GENPYBIND(hidden)
int some_field;
};
struct GENPYBIND(visible) B {};
// This would not have been exposed anyways, but we can
// include `hidden` to document our intent explicitly.
struct GENPYBIND(hidden) C {};
namespace example GENPYBIND(visible) {
struct Example {}; // Visible, even though there is no annotation.
}
By default a declaration will be exposed using the name of its C++ identifier.
expose_as
can be used to choose a different name in the Python bindings:
struct GENPYBIND(expose_as(Example)) example {};
This can also be used to define special methods like __repr__
or __hash__
:
GENPYBIND(expose_as(__hash__))
int hash() const;
You can always fall back on hand-written bindings that is embedded in the generated binding code. This can be a convenient escape hatch for pybind11 features that are not (yet) supported by genpybind.
Inside structs and classes, parent
can be used to refer to the corresponding
pybind11::class_
instance. If you need to access members of the parent class,
you can use GENPYBIND_PARENT_TYPE
instead of directly referring to its name.
This is necessary, as the definitions is not yet complete at the point of
the macro.
struct GENPYBIND(visible) Example {
bool values[2] GENPYBIND(hidden) = {false, false};
GENPYBIND_MANUAL({
using Example = GENPYBIND_PARENT_TYPE;
parent.def("__getitem__",
[](Example& self, bool key) { return self.values[key]; });
parent.def("__setitem__", [](Example& self, bool key, bool value) {
self.values[key] = value;
});
})
If GENPYBIND_MANUAL
is usde at the top-level, the contained code is emitted
before all auto-generated binding code. This can be useful to, e.g., import
another module (see the only_expose_in
annotation on namespaces) that is used
in function signatures:
GENPYBIND_MANUAL({
::pybind11::module::import("common");
})
The postamble
modifier can be used to embed code after all auto-generated
binding code, e.g., to dynamically patch the generated bindings:
GENPYBIND(postamble)
GENPYBIND_MANUAL({
auto example = parent.attr("Example");
// …patch example…
})
Note that parent
can be used to refer to the corresponding pybind11::module
.
In general, different GENPYBIND_MANUAL
blocks are emitted in the order in
which they were defined.
For all accessible headers, the annotations of a particular header have to match, as long as the namespace contains at least one annotated declaration exposed via the bindings:
namespace example GENPYBIND(module) {
struct GENPYBIND(visible) Example {};
}
// OK: No annotated declarations
namespace example {
struct Hidden {};
}
// OK: Same annotations
namespace example GENPYBIND(module) {
struct GENPYBIND(visible) Other {};
}
Namespaces can be annotated using module
to turn them into sub-modules of the
generated Python module. Namespaces that do not have this annotation have no
effect on the module hierarchy of the generated Python bindings.
E.g., if readme
is the name of the top-level module, X
in the following
example would be exposed as readme.nested.X
:
namespace nested GENPYBIND(module) {
class GENPYBIND(visible) X {};
} // namespace nested
When generating multiple Python libraries, only_expose_in
should be used to
only expose declarations in the corresponding module. When used on a namespace,
all nested declarations are only exposed if one of the arguments to
only_expose_in
matches the name of the top-level module, which is derived from
the basename of the header file passed to genpybind. For example:
// In common.h:
namespace common GENPYBIND(only_expose_in(common)) {
struct GENPYBIND(visible) Example {};
}
// In downstream.h:
# include <…/common.h>
namespace downstream GENPYBIND(only_expose_in(downstream)) {
void sink(common::Example input) GENPYBIND(visible);
}
Example
is only available via the common
module, instead of being duplicated
/ exposed twice:
from common import Example
from downstream import sink
sink(Example())
The arithmetic
modifier can be used to expose arithmetic operations on the
generated enum by passing the pybind11::arithmetic()
tag to the pybind11::enum_
constructor:
enum GENPYBIND(arithmetic) Access { READ = 4, WRITE = 2, EXECUTE = 1 };
The export_values
modifier controls whether enumerators are available in the
parent scope. By default, this is only the case for unscoped enums.
In the following example defaults are overridden s.t. RED
is only available as
example.Color.RED
and HIGH
is available as example.HIGH
:
enum GENPYBIND(export_values(false)) Color { RED, GREEN, BLUE };
enum class GENPYBIND(export_values) Level { HIGH, MEDIUM, LOW };
The dynamic_attr
modifier can be used to allow additional attributes to be set
at runtime, by passing the pybind11::dynamic_attr()
tag to
the pybind11::class_
constructor. I.e., in the following example,
thing.unknown_attribute = 5
would work on an instance thing = Thing()
.
struct GENPYBIND(dynamic_attr) Thing {};
By default, base classes included as template parameters of pybind11::class_
,
which has the effect that the inheritance relationship is represented on the
Python side. If that's not what you want, you can opt out using hide_base
:
struct GENPYBIND(hide_base) HideAll : common::Base, Base2, Base3 {};
struct GENPYBIND(hide_base("common::Base")) HideOne : common::Base, Base2, Base3 {};
struct GENPYBIND(hide_base("Base2", "Base3")) HideTwo : common::Base, Base2, Base3 {};
The holder_type
modifier can be used to set the [holder type][pybind11-smart]
used to manage references to objects (defaults to std::unique_ptr<…>
).
struct GENPYBIND(holder_type("std::shared_ptr<Example>")) Example
: public std::enable_shared_from_this<Example> {
std::shared_ptr<Example> clone();
};
The implicit_conversion
modifier can be added to converting constructors to
denote that the corresponding conversion should be registered as an implicit
conversion via pybind11::implicitly_convertible<…>
:
struct GENPYBIND(visible) Implicit {
explicit Implicit(int value) GENPYBIND(implicit_conversion);
Implicit(Example example) GENPYBIND(implicit_conversion);
};
Similar to hide_base
described above, inline_base
has the effect that the
inheritance relationship is not represented on the Python side. In addition,
declarations nested in the base class are pulled in, as if they were defined in
the current class. This is useful for mixins / CRTP code.
struct GENPYBIND(inline_base) InlineAll : common::Base, Base2, Base3 {};
struct GENPYBIND(inline_base("common::Base")) InlineOne : common::Base, Base2, Base3 {};
struct GENPYBIND(inline_base("Base2", "Base3")) InlineTwo : common::Base, Base2, Base3 {};
Explicit template instantiations have the same visibility as the corresponding
template by default. They can be selectively exposed by adding any GENPYBIND
annotation. expose_as
can be used to rename individual instantiations. Else,
a fallback name is generated by replacing special characters with underscores.
E.g., Some<int>
is exposed as Some_int_
.
template <typename T> struct ExposeSome {};
extern template struct GENPYBIND(expose_as(IntSomething))
ExposeSome<int>; // selectively exposed
extern template struct ExposeSome<double>; // not exposed
template <typename T> struct GENPYBIND(visible) ExposeAll {};
extern template struct ExposeAll<int>;
extern template struct GENPYBIND(expose_as(BoolEx)) ExposeAll<bool>;
Type aliases are hidden (i.e., not exposed) by default and they do not inherit
the default visibility. If they are marked as visible
, a simple alias is
created in the Python bindings by assigning a reference to the alias target to
an attribute. I.e., using
in the following example is equivalent to the
assignment X.Alias = Y
in Python.
struct GENPYBIND(visible) X {
using Alias GENPYBIND(visible) = Y;
};
Note: [Using declarations][using-decl] are not type aliases. [using-decl]: https://en.cppreference.com/w/cpp/language/using_declaration
The expose_here
modifier can be used to influence where the alias target is
exposed. This can be useful to, e.g., pull in / “transplant” declarations from
another module or a nested scope. Or to selectively expose single-purpose
template instances in a particular scope. The corresponding declarations are
then no longer exposed in their original declaration context.
struct GENPYBIND(visible) Example {
using tag_type GENPYBIND(expose_here) = common::Tag<Example>;
};
The encourage
modifier can be used to make the target of a type alias
visible in its original scope. This can be useful to selectively instantiate
templates. (This implies an “assignment”-style alias on the Python side, as
described above.)
struct GENPYBIND(visible) Example {
using value_type GENPYBIND(encourage) =
common::Ranged<int, common::Gt<0>, common::Lt<5>>;
};
The keep_alive
modifier corresponds to pybind11's call
policy of the same name. It can be used to indicate the
intended lifetime of objects passed to or returned from (member) functions:
keep_alive(<bound>, <who>)
means that <who>
should be kept alive at least as
long as <bound>
. <who>
and <bound>
can either be the name of a function
parameter, return
(the function's return value), or this
(the instance
a member function is called on). Behind the scenes this is translated into the
index-based notation used by pybind11.
struct GENPYBIND(visible) Container {
GENPYBIND(keep_alive(this, resource))
Container(Resource *resource);
};
noconvert
can be used to disable implicit conversion for
arguments passed via certain function parameters (multiple parameter names can
be specified):
GENPYBIND(noconvert(value))
double no_ints_please(double value);
The required
modifier can be used to prohibit None
arguments
for certain function parameters (multiple parameter names can be specified).
It is equivalent to calling .none(false)
on the corresponding pybind11::arg
object.
GENPYBIND(required(Example))
void required(Example *example)
The return_value_policy
modifier can be used to set any return value
policy supported by pybind11:
struct GENPYBIND(visible) Example {
GENPYBIND(return_value_policy(reference_internal))
Thing& thing();
};
getter_for
and setter_for
can be used to expose member function as Python properties:
struct GENPYBIND(visible) Example {
GENPYBIND(getter_for(value))
int getValue() const;
GENPYBIND(setter_for(value))
void setValue(int value);
GENPYBIND(getter_for(readonly))
bool getReadonly() const;
};
Special methods like __eq__
are emitted for unary (+
,
-
, !
) and binary (+
, -
, *
, /
, %
, ^
, &
, |
, <
, >
, <<
,
>>
, ==
, !=
, <=
, >=
) operators defined on classes. Operators can be
either member functions or free functions in a the associated namespace of the
class (found via ADL). Where necessary, operators and parameters are switched:
E.g., operator<(int, T)
cannot be exposed as int.__lt__
so it is exposed as
T.__gt__
instead.
struct GENPYBIND(visible) Number {
bool operator==(Number other) const { return value == other.value; }
friend bool operator<(const Number &lhs, const Number &rhs) {
return lhs.value < rhs.value;
}
friend bool operator>(int lhs, Number rhs) { return lhs > rhs.value; }
};
TODO: Support for the spaceship operator is pending.
std::ostream
operators are only exposed when opted in via, e.g.,
expose_as(__repr__)
:
struct GENPYBIND(visible) Example {
GENPYBIND(expose_as(__str__))
friend std::ostream& operator<<(std::ostream& os, const Example& value);
};
Variables are exposed using def_readonly
and def_readwrite
(and their
_static
variants) according to their constness.
The readonly
modifier can be used if a non-const variable should be exposed as
read-only:
struct GENPYBIND(visible) Example {
GENPYBIND(readonly)
int readonly_field = 0;
};
Footnotes
-
During normal compilation these macros have no effect on the generated code, as they are defined to be empty. The annotation system is implemented using the
annotate
attribute specifier, which is available as a GNU language extension via__attribute__((...))
. As the annotation macros only have to be parsed by clang and are empty during normal compilation the annotated code can still be compiled by any C++ compiler. See genpybind.h for the definition of the macros. ↩