Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedding a skeleton / object #1017

Open
atenart opened this issue Nov 18, 2024 · 14 comments
Open

Embedding a skeleton / object #1017

atenart opened this issue Nov 18, 2024 · 14 comments

Comments

@atenart
Copy link
Contributor

atenart commented Nov 18, 2024

Embedding a skeleton (Skel) in an internal structure is quite useful, eg. for loading a program once and attaching it to functions on-demand later on (attach_kprobe). Up until 0.22 it was possible to either embed a Skel or an Object (taking it out of the Skel).

With the addition of commit c01e9b9 ("libbpf-cargo: Define maps as publicly accessible members") and follow-ups I'm now not sure how to have the same behavior, or if it is even possible. There are a few things preventing this:

  1. A storage (MaybeUninit<libbpf_rs::OpenObject>) needs to be provided to the skeleton builder to hold the (open)object and obviously have to live long enough. The internal structure now needs to embed both the Skel and the storage, but this is impossible due to self-referencial.
  2. To workaround the above we could try directly using the storage, without the Skel (after setting rodata). But internally it uses an OwnedRef which will drop the storage if we don't keep the Skel around. Or we need mem::forget(skel) but that leaks memory (and does not look nice).

Then I thought maybe the skeleton isn't meant for such usage and we could build an Object manually. This is honestly fine... until we need to access rodata. Then basically we go full circle and re-implement a skeleton (as we need the Rust rodata definition to be automatically generated to stay in-sync).

I'm not really sure how to proceed. Hopefully I'm missing something :)

Thanks!

(The actual code is there, https://github.com/retis-org/retis/blob/59948504aa2f0da99cefb53057f7995baf8eed0b/retis/src/core/probe/kernel/kprobe.rs, but it's a bit complex to grab everything w/o reading quite a lot of code so hopefully the above explanation describes the generic issue).

@javierhonduco
Copy link
Contributor

javierhonduco commented Nov 19, 2024

Thanks for opening this issue! This is something I also bumped into. My program also relies on access to rodata as well as various maps during the lifetime of the BPF program. We store the Skeleton in a field of the main structure of this application.

This is still a WIP, and not sure it's totally safe. I need to make sure Rust semantics can't cause any issues, so please don't use this code in production. This is how it roughly looks so far. Can update it once I get a more refined version, if this is useful. I ran it with ASAN and it didn't report any issues, but it's not a 100% guarantee that the code is correct.

// Warning!! The code below is quite brittle. Do *not* use it as-is.

// [...]

impl for Profiler {
  fn new() -> Self {
    // [...]

    let mut profiler_open_object = ManuallyDrop::new(Box::new(MaybeUninit::uninit()));
    let mut open_skel = skel_builder
        .open(&mut native_profiler_object)
        .expect("open skel");
    let native_unwinder_skel = ManuallyDrop::new(open_skel.load().expect("load skel"));
    // Safety: this has to be dropped before `profiler_open_object`, as long as this is guaranteed it should
    // be fine.
    let native_unwinder_skel = unsafe {
        std::mem::transmute::<ManuallyDrop<NativeProfilerSkel<'_>>, 
        ManuallyDrop<NativeProfilerSkel<'static>>>(native_unwinder_skel)
    };
    Profiler {
        // Rust guarantees drop order, and custom `drop` impls run first, so make sure to free
        // `profiler_open_object` after `native_unwinder_skel`.
        profiler_open_object,
        native_unwinder_skel,
    }
  }
}

// TODO: drop implementation in `Profiler` to free the `ManuallyDrop`s. Make sure the order
// does not do anything unsafe,

@danielocfb
Copy link
Collaborator

I think a dance like what @javierhonduco suggests may be necessary at the moment. We should check whether we can improve, but it would be helpful to have a minimal example reproducing the issue.

@atenart
Copy link
Contributor Author

atenart commented Nov 28, 2024

it would be helpful to have a minimal example reproducing the issue.

I wrote one at https://github.com/atenart/libbpf-rs-skel-embed. The code is pretty minimal. There's 3 commits:

  1. Using libbpf-rs 0.22 and showing what was possible to write (embedding an Object in an internal structure).
  2. Upgrading to libbpf-rs 0.24, this does not compile.
  3. Adding a workaround to make things to work. The idea would be not to need this, but might be interesting to investigate possible solutions (to add to libbpf-rs).

Speaking of reproducers, is there any example reproducing the issues fixed by the introduction of the storage (MaybeUninit<OpenObject>)? That would help finding a solution that fits both needs.

@javierhonduco
Copy link
Contributor

@atenart I believe this was the repro for the UAF #852

@danielocfb
Copy link
Collaborator

danielocfb commented Dec 2, 2024

Sorry for the delayed response, I was on vacation. As c01e9b9 states, the main driver for the rework is the desire to access multiple maps/programs mutably.

Here is a trivial refactoring that fails to build due to one such borrowing conflict on top of v0.23.3:

--- examples/tcp_ca/src/main.rs
+++ examples/tcp_ca/src/main.rs
@@ -137,13 +137,9 @@ fn test(name_to_register: Option<&CStr>, name_to_use: &CStr, verbose: bool) -> R
         let () = ca_update.name[len..].fill(0);
     }

-    let ca_update_cong_control2 = open_skel
-        .progs()
-        .ca_update_cong_control2()
-        .as_libbpf_object()
-        .as_ptr();
+    let ca_update_cong_control2 = open_skel.progs().ca_update_cong_control2();
     let ca_update = open_skel.struct_ops.ca_update_mut();
-    ca_update.cong_control = ca_update_cong_control2;
+    ca_update.cong_control = ca_update_cong_control2.as_libbpf_object().as_ptr();

     let mut skel = open_skel.load()?;
     let mut maps = skel.maps_mut();

The storage logic is necessary to get around all the self-referentiality issues that we otherwise run into when objects are no longer hidden behind method calls.

Real world example of ergonomic hit: the state before sched-ext/scx#445

I don't have every detail paged in anymore, but I believe the use-after-free may have been solvable without this rework -- it certainly wasn't the driving factor.

@d-e-s-o
Copy link
Collaborator

d-e-s-o commented Dec 2, 2024

it would be helpful to have a minimal example reproducing the issue.

I wrote one at https://github.com/atenart/libbpf-rs-skel-embed.

Many thanks for that!

The code is pretty minimal. There's 3 commits:

  1. Using libbpf-rs 0.22 and showing what was possible to write (embedding an Object in an internal structure).
  2. Upgrading to libbpf-rs 0.24, this does not compile.

Can you pass in the object storage instead or where does the requirement of storing it alongside the skeleton come from? Given that KprobeManager already accounts for a lifetime, we may as well actually use it, would be my guess:

diff --git src/main.rs src/main.rs
index 15f048..e07a90 100644
--- src/main.rs
+++ src/main.rs
@@ -1,10 +1,13 @@
+use std::mem::MaybeUninit;
+
 use anyhow::Result;

 mod manager;
 use manager::*;

 fn main() -> Result<()> {
-    let mut mgr = KprobeManager::new()?;
+    let mut obj = MaybeUninit::uninit();
+    let mut mgr = KprobeManager::new(&mut obj)?;

     mgr.attach("pskb_expand_head")?;
     mgr.attach("kfree_skb_partial")?;
diff --git src/manager.rs src/manager.rs
index 3034b2..db8522 100644
--- src/manager.rs
+++ src/manager.rs
@@ -1,7 +1,7 @@
 use std::mem::MaybeUninit;

 use anyhow::{anyhow, Result};
-use libbpf_rs::skel::{OpenSkel, Skel, SkelBuilder};
+use libbpf_rs::{skel::{OpenSkel, Skel, SkelBuilder}, OpenObject};

 mod bpf {
     include!("bpf/kprobe.skel.rs");
@@ -9,21 +9,18 @@ mod bpf {
 use bpf::*;

 pub(crate) struct KprobeManager<'a> {
-    storage: Box<MaybeUninit<libbpf_rs::OpenObject>>,
     skel: KprobeSkel<'a>,
     links: Vec<libbpf_rs::Link>,
 }

-impl KprobeManager<'_> {
-    pub(crate) fn new() -> Result<Self> {
-        let mut storage = Box::new(MaybeUninit::uninit());
-        let skel = KprobeSkelBuilder::default().open(&mut storage)?;
+impl<'a> KprobeManager<'a> {
+    pub(crate) fn new(storage: &'a mut MaybeUninit<OpenObject>) -> Result<Self> {
+        let skel = KprobeSkelBuilder::default().open(storage)?;

         // Set rodata.
         skel.maps.rodata_data.log_level = 0;

         Ok(Self {
-            storage,
             skel: skel.load()?,
             links: Vec::new(),
         })

@atenart
Copy link
Contributor Author

atenart commented Dec 3, 2024

The code is pretty minimal. There's 3 commits:

  1. Using libbpf-rs 0.22 and showing what was possible to write (embedding an Object in an internal structure).
  2. Upgrading to libbpf-rs 0.24, this does not compile.

Can you pass in the object storage instead or where does the requirement of storing it alongside the skeleton come from? Given that KprobeManager already accounts for a lifetime, we may as well actually use it, would be my guess:

I thought about this but although it works and does not look to bad in the small reproducer, in our real world application things are more tricky. The main issue is those objects building probes (KprobeManager in the reproducer) can be added dynamically at runtime. Another issue would be only cosmetic but IMO wouldn't look very nice: those managers are built quite far away from main.

@danielocfb
Copy link
Collaborator

Okay, short of that you can just fake a 'static lifetime, as per my understanding:

--- src/manager.rs
+++ src/manager.rs
@@ -1,4 +1,4 @@
-use std::mem::MaybeUninit;
+use std::mem::{transmute, MaybeUninit};

 use anyhow::{anyhow, Result};
 use libbpf_rs::skel::{OpenSkel, Skel, SkelBuilder};
@@ -8,23 +8,34 @@ mod bpf {
 }
 use bpf::*;

-pub(crate) struct KprobeManager<'a> {
-    storage: Box<MaybeUninit<libbpf_rs::OpenObject>>,
-    skel: KprobeSkel<'a>,
+pub(crate) struct KprobeManager {
+    // SAFETY: We must not hand out references with a 'static lifetime to
+    //         this member. Rather, they should never outlive `self`.
+    //         Furthermore, this member has to be listed before
+    //         `_storage` to make sure we never end up with a dangling
+    //         reference.
+    skel: KprobeSkel<'static>,
+    _storage: Box<MaybeUninit<libbpf_rs::OpenObject>>,
     links: Vec<libbpf_rs::Link>,
 }

-impl KprobeManager<'_> {
+impl KprobeManager {
     pub(crate) fn new() -> Result<Self> {
         let mut storage = Box::new(MaybeUninit::uninit());
-        let skel = KprobeSkelBuilder::default().open(&mut storage)?;
+        // SAFETY: We keep a reference to the box around for the
+        //         lifetime of self and we never hand out 'static
+        //         references to this object.
+        let obj_ref = unsafe {
+            transmute::<&mut MaybeUninit<_>, &'static mut MaybeUninit<_>>(storage.as_mut())
+        };
+        let skel = KprobeSkelBuilder::default().open(obj_ref)?;

         // Set rodata.
         skel.maps.rodata_data.log_level = 0;

         Ok(Self {
-            storage,
             skel: skel.load()?,
+            _storage: storage,
             links: Vec::new(),
         })
     }

This seems to be standard for self-referential stuff.

@atenart
Copy link
Contributor Author

atenart commented Dec 4, 2024

Okay, short of that you can just fake a 'static lifetime, as per my understanding:

Right, and control the order the storage and object are dropped. I used a third party object in the reproducer to hide the trick from the user, but your example should work too.

This seems to be standard for self-referential stuff.

I don't know if it's standard as we're tricking the compiler, but that's the common way to make self-referential working atm :)

@d-e-s-o
Copy link
Collaborator

d-e-s-o commented Dec 4, 2024

Yeah, so that would be my recommendation. How you package it is really up to you. I suppose it may be possible to put all of that into the generated skeletons themselves and I think that would be fine. Not sure if I'll have the cycles to work on that any time soon, though, so if you need it in the near future it would be best to open a pull request.

@atenart
Copy link
Contributor Author

atenart commented Dec 4, 2024

I suppose it may be possible to put all of that into the generated skeletons themselves and I think that would be fine.

Do you mean embedding the storage in the skeleton and hiding it from the user? Something like the workaround from the reproducer but integrated.

Not sure if I'll have the cycles to work on that any time soon, though, so if you need it in the near future it would be best to open a pull request.

Yeah, no worries, this was about raising awareness and discussing a potential way forward (and also making sure I wasn't missing something obvious). I might as well look into this at some point.

@d-e-s-o
Copy link
Collaborator

d-e-s-o commented Dec 4, 2024

Do you mean embedding the storage in the skeleton and hiding it from the user? Something like the workaround from the reproducer but integrated.

Yep, that's what I meant. I don't know how involved it is, to be honest, but perhaps with a hidden intermediate type similar to what you have it won't be too bad.

@telbizov
Copy link

telbizov commented Dec 4, 2024

Thanks for this discussion @atenart and @d-e-s-o . I'm also in the process of trying to upgrade to the latest libbpf-rs and the bump broke my code. I tried the suggested transmute fix which allows me to "package" both the skeleton and its storage within my own structure. Cheers for that:

pub struct Bpf {
    pub skel: skel::MyOwnSkel<'static>,
    storage: Box<MaybeUninit<OpenObject>>,
}

Another breakage from this change is that I am no longer able to have a global static (behind LazyLock) of Bpf kind. This is because of std::ptr::NonNull not Sync, which is needed. I am still looking into this, but wanted to bring this up and get your input and suggestions on that matter.

@danielocfb
Copy link
Collaborator

Another breakage from this change is that I am no longer able to have a global static (behind LazyLock) of Bpf kind. This is because of std::ptr::NonNull not Sync, which is needed. I am still looking into this, but wanted to bring this up and get your input and suggestions on that matter.

If the skeleton (or some embedded type) was Sync beforehand and no longer is we should fix that by adding the implementation. Feel free to open a PR or spin off a separate issue if more discussion is necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants