-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Embedding a skeleton / object #1017
Comments
Thanks for opening this issue! This is something I also bumped into. My program also relies on access to rodata as well as various maps during the lifetime of the BPF program. We store the Skeleton in a field of the main structure of this application. This is still a WIP, and not sure it's totally safe. I need to make sure Rust semantics can't cause any issues, so please don't use this code in production. This is how it roughly looks so far. Can update it once I get a more refined version, if this is useful. I ran it with ASAN and it didn't report any issues, but it's not a 100% guarantee that the code is correct. // Warning!! The code below is quite brittle. Do *not* use it as-is.
// [...]
impl for Profiler {
fn new() -> Self {
// [...]
let mut profiler_open_object = ManuallyDrop::new(Box::new(MaybeUninit::uninit()));
let mut open_skel = skel_builder
.open(&mut native_profiler_object)
.expect("open skel");
let native_unwinder_skel = ManuallyDrop::new(open_skel.load().expect("load skel"));
// Safety: this has to be dropped before `profiler_open_object`, as long as this is guaranteed it should
// be fine.
let native_unwinder_skel = unsafe {
std::mem::transmute::<ManuallyDrop<NativeProfilerSkel<'_>>,
ManuallyDrop<NativeProfilerSkel<'static>>>(native_unwinder_skel)
};
Profiler {
// Rust guarantees drop order, and custom `drop` impls run first, so make sure to free
// `profiler_open_object` after `native_unwinder_skel`.
profiler_open_object,
native_unwinder_skel,
}
}
}
// TODO: drop implementation in `Profiler` to free the `ManuallyDrop`s. Make sure the order
// does not do anything unsafe, |
I think a dance like what @javierhonduco suggests may be necessary at the moment. We should check whether we can improve, but it would be helpful to have a minimal example reproducing the issue. |
I wrote one at https://github.com/atenart/libbpf-rs-skel-embed. The code is pretty minimal. There's 3 commits:
Speaking of reproducers, is there any example reproducing the issues fixed by the introduction of the storage ( |
Sorry for the delayed response, I was on vacation. As c01e9b9 states, the main driver for the rework is the desire to access multiple maps/programs mutably. Here is a trivial refactoring that fails to build due to one such borrowing conflict on top of --- examples/tcp_ca/src/main.rs
+++ examples/tcp_ca/src/main.rs
@@ -137,13 +137,9 @@ fn test(name_to_register: Option<&CStr>, name_to_use: &CStr, verbose: bool) -> R
let () = ca_update.name[len..].fill(0);
}
- let ca_update_cong_control2 = open_skel
- .progs()
- .ca_update_cong_control2()
- .as_libbpf_object()
- .as_ptr();
+ let ca_update_cong_control2 = open_skel.progs().ca_update_cong_control2();
let ca_update = open_skel.struct_ops.ca_update_mut();
- ca_update.cong_control = ca_update_cong_control2;
+ ca_update.cong_control = ca_update_cong_control2.as_libbpf_object().as_ptr();
let mut skel = open_skel.load()?;
let mut maps = skel.maps_mut(); The storage logic is necessary to get around all the self-referentiality issues that we otherwise run into when objects are no longer hidden behind method calls. Real world example of ergonomic hit: the state before sched-ext/scx#445 I don't have every detail paged in anymore, but I believe the use-after-free may have been solvable without this rework -- it certainly wasn't the driving factor. |
Many thanks for that!
Can you pass in the object storage instead or where does the requirement of storing it alongside the skeleton come from? Given that diff --git src/main.rs src/main.rs
index 15f048..e07a90 100644
--- src/main.rs
+++ src/main.rs
@@ -1,10 +1,13 @@
+use std::mem::MaybeUninit;
+
use anyhow::Result;
mod manager;
use manager::*;
fn main() -> Result<()> {
- let mut mgr = KprobeManager::new()?;
+ let mut obj = MaybeUninit::uninit();
+ let mut mgr = KprobeManager::new(&mut obj)?;
mgr.attach("pskb_expand_head")?;
mgr.attach("kfree_skb_partial")?;
diff --git src/manager.rs src/manager.rs
index 3034b2..db8522 100644
--- src/manager.rs
+++ src/manager.rs
@@ -1,7 +1,7 @@
use std::mem::MaybeUninit;
use anyhow::{anyhow, Result};
-use libbpf_rs::skel::{OpenSkel, Skel, SkelBuilder};
+use libbpf_rs::{skel::{OpenSkel, Skel, SkelBuilder}, OpenObject};
mod bpf {
include!("bpf/kprobe.skel.rs");
@@ -9,21 +9,18 @@ mod bpf {
use bpf::*;
pub(crate) struct KprobeManager<'a> {
- storage: Box<MaybeUninit<libbpf_rs::OpenObject>>,
skel: KprobeSkel<'a>,
links: Vec<libbpf_rs::Link>,
}
-impl KprobeManager<'_> {
- pub(crate) fn new() -> Result<Self> {
- let mut storage = Box::new(MaybeUninit::uninit());
- let skel = KprobeSkelBuilder::default().open(&mut storage)?;
+impl<'a> KprobeManager<'a> {
+ pub(crate) fn new(storage: &'a mut MaybeUninit<OpenObject>) -> Result<Self> {
+ let skel = KprobeSkelBuilder::default().open(storage)?;
// Set rodata.
skel.maps.rodata_data.log_level = 0;
Ok(Self {
- storage,
skel: skel.load()?,
links: Vec::new(),
}) |
I thought about this but although it works and does not look to bad in the small reproducer, in our real world application things are more tricky. The main issue is those objects building probes ( |
Okay, short of that you can just fake a --- src/manager.rs
+++ src/manager.rs
@@ -1,4 +1,4 @@
-use std::mem::MaybeUninit;
+use std::mem::{transmute, MaybeUninit};
use anyhow::{anyhow, Result};
use libbpf_rs::skel::{OpenSkel, Skel, SkelBuilder};
@@ -8,23 +8,34 @@ mod bpf {
}
use bpf::*;
-pub(crate) struct KprobeManager<'a> {
- storage: Box<MaybeUninit<libbpf_rs::OpenObject>>,
- skel: KprobeSkel<'a>,
+pub(crate) struct KprobeManager {
+ // SAFETY: We must not hand out references with a 'static lifetime to
+ // this member. Rather, they should never outlive `self`.
+ // Furthermore, this member has to be listed before
+ // `_storage` to make sure we never end up with a dangling
+ // reference.
+ skel: KprobeSkel<'static>,
+ _storage: Box<MaybeUninit<libbpf_rs::OpenObject>>,
links: Vec<libbpf_rs::Link>,
}
-impl KprobeManager<'_> {
+impl KprobeManager {
pub(crate) fn new() -> Result<Self> {
let mut storage = Box::new(MaybeUninit::uninit());
- let skel = KprobeSkelBuilder::default().open(&mut storage)?;
+ // SAFETY: We keep a reference to the box around for the
+ // lifetime of self and we never hand out 'static
+ // references to this object.
+ let obj_ref = unsafe {
+ transmute::<&mut MaybeUninit<_>, &'static mut MaybeUninit<_>>(storage.as_mut())
+ };
+ let skel = KprobeSkelBuilder::default().open(obj_ref)?;
// Set rodata.
skel.maps.rodata_data.log_level = 0;
Ok(Self {
- storage,
skel: skel.load()?,
+ _storage: storage,
links: Vec::new(),
})
} This seems to be standard for self-referential stuff. |
Right, and control the order the storage and object are dropped. I used a third party object in the reproducer to hide the trick from the user, but your example should work too.
I don't know if it's standard as we're tricking the compiler, but that's the common way to make self-referential working atm :) |
Yeah, so that would be my recommendation. How you package it is really up to you. I suppose it may be possible to put all of that into the generated skeletons themselves and I think that would be fine. Not sure if I'll have the cycles to work on that any time soon, though, so if you need it in the near future it would be best to open a pull request. |
Do you mean embedding the storage in the skeleton and hiding it from the user? Something like the workaround from the reproducer but integrated.
Yeah, no worries, this was about raising awareness and discussing a potential way forward (and also making sure I wasn't missing something obvious). I might as well look into this at some point. |
Yep, that's what I meant. I don't know how involved it is, to be honest, but perhaps with a hidden intermediate type similar to what you have it won't be too bad. |
Thanks for this discussion @atenart and @d-e-s-o . I'm also in the process of trying to upgrade to the latest libbpf-rs and the bump broke my code. I tried the suggested transmute fix which allows me to "package" both the skeleton and its storage within my own structure. Cheers for that:
Another breakage from this change is that I am no longer able to have a global static (behind LazyLock) of Bpf kind. This is because of |
If the skeleton (or some embedded type) was |
Embedding a skeleton (
Skel
) in an internal structure is quite useful, eg. for loading a program once and attaching it to functions on-demand later on (attach_kprobe
). Up until 0.22 it was possible to either embed aSkel
or anObject
(taking it out of theSkel
).With the addition of commit c01e9b9 ("libbpf-cargo: Define maps as publicly accessible members") and follow-ups I'm now not sure how to have the same behavior, or if it is even possible. There are a few things preventing this:
MaybeUninit<libbpf_rs::OpenObject>
) needs to be provided to the skeleton builder to hold the (open)object and obviously have to live long enough. The internal structure now needs to embed both theSkel
and the storage, but this is impossible due to self-referencial.Skel
(after setting rodata). But internally it uses anOwnedRef
which will drop the storage if we don't keep theSkel
around. Or we needmem::forget(skel)
but that leaks memory (and does not look nice).Then I thought maybe the skeleton isn't meant for such usage and we could build an
Object
manually. This is honestly fine... until we need to access rodata. Then basically we go full circle and re-implement a skeleton (as we need the Rust rodata definition to be automatically generated to stay in-sync).I'm not really sure how to proceed. Hopefully I'm missing something :)
Thanks!
(The actual code is there, https://github.com/retis-org/retis/blob/59948504aa2f0da99cefb53057f7995baf8eed0b/retis/src/core/probe/kernel/kprobe.rs, but it's a bit complex to grab everything w/o reading quite a lot of code so hopefully the above explanation describes the generic issue).
The text was updated successfully, but these errors were encountered: