Software & Apps

Four limitations of Rust’s credit checker

I’ve been using Rust for hobby projects since 2016 and working professionally with Rust since 2021, so I consider myself knowledgeable about Rust. I’m already familiar with all the common limitations of Rust’s type system and how to work around them, so I rarely have to “fight the borrow checker” as new Rust users often struggle. However, it still happens sometimes.

In this post, I’ll cover four surprising limitations of loan checkers that I’ve encountered in the course of my work.

Also note that when I say impossible, I mean it it can’t be done the way Rust’s type system worksie with static type checking. You can bypass any problem by using unsafe code or by using runtime checks (eg “just file Arc> in all things.”) However, resorting to it still represents a limitation of the type system. It’s not really the case that you literally can’t solve a problem, because there are always escaped hatches (and I’ll show more examples of an escape hatch I used below), but it’s impossible to solve the problem. in a way that makes Rust Rust.

In this case, I actually had someone come to me for help with this problem. Then I forgot about it, only to run into the exact same wall later in my own work, so it seems like a common problem.

This problem usually appears when you want to look up a value in a hashmap, and do something different if it doesn’t exist. For example, suppose you want to find a key in a hashmap, and if that doesn’t exist, return the second key. You can do it easily like this:

fn double_lookup(map: &HashMap<String, String>, mut k: String) -> Option<&String> {
    if let Some(v) = map.get(&k) {
        return Some(v);
    }

    k.push_str("-default");
    map.get(&k)
}

Usually you come back &str rather than &Stringbut I use it String here for simplicity and clarity.

One thing that Rust prevents is doing unnecessary work like multiple redundant searches in a hashmap. Instead of first checking a value in the map and then accessing it (with an unnecessary second lookup), you call get()which returns a Optionallowing you to do everything in one call.

At least, you can do that MANY at the time. Unfortunately, sometimes checker borrowing limits get in the way. Specifically, let’s say we want to do the same thing as above but return one exclusive (&mut) reference than shared (&) reference:

fn double_lookup_mut(map: &mut HashMap<String, String>, mut k: String) -> Option<&mut String> {
    if let Some(v) = map.get_mut(&k) {
        return Some(v);
    }

    k.push_str("-default");
    map.get_mut(&k)
}

Try to run it and the compiler complains:

error(E0499): cannot borrow `*map` as mutable more than once at a time
  --> src/main.rs:46:5
   |
40 | fn double_lookup_mut(map: &mut HashMap, mut k: String) -> Option<&mut String> {
   |                        - let's call the lifetime of this reference `'1`
41 |    if let Some(v) = map.get_mut(&k) {
   |                    --- first mutable borrow occurs here
42 |        return Some(v);
   |                ------- returning this value requires that `*map` is borrowed for `'1`
...
46 |    map.get_mut(&k)
   |    ^^^ second mutable borrow occurs here

The first one get_mut call to borrow map and came back Option possibly contain a borrowed reference. If it happens, we immediately return the value, and the same for the branch where we are do not again, we don’t use borrowing at all. However, the loan examiner has limited ability to analyze the flow and does not currently reason about things like this.

So, from the perspective of the loan checker, the first get_mut call the results of map falsely borrowed for the entire remaining functionwhich makes it impossible to do anything with it.

To solve this limitation, we need to do an unnecessary check-and-lookup like this:

fn double_lookup_mut2(map: &mut HashMap<String, String>, mut k: String) -> Option<&mut String> {
    // We look up k here:
    if map.contains_key(&k) {
        // and then look it up again here for no reason.
        return map.get_mut(&k);
    }

    k.push_str("-default");
    map.get_mut(&k)
}

Suppose you have a vec and want to use it encapsulation to prevent users from worrying about internal implementation details. So, you just provide a method that takes a user-supplied callback and calls it on every element.

struct MyVec<T>(Vec<T>);
impl<T> MyVec<T> {
    pub fn for_all(&self, mut f: impl FnMut(&T)) {
        for v in self.0.iter() {
            f(v);
        }
    }
}

You can use it like this:

let mv = MyVec(vec!(1,2,3));
mv.for_all(|v| println!("{}", v));

let mut sum = 0;
// Can also capture values in the callback
mv.for_all(|v| sum += v);    

Pretty simple, right? Now imagine that you want to allow async code. We want to do something like this:

mv.async_for_all(|v| async move {println!("{}", v)}).await;

… yes, good luck with that. I spent a while trying everything I could think of, but as far as I can tell, there is no way to express the required type signature in Rust right now. Rust recently added the use<'a> syntax, and not so new, Generic Associated Types, but even those don’t help. The problem is that the future type returned by the function must depend on the lifetime of the argument, and Rust doesn’t allow you to be generic with parameterized types.

I may be wrong about this, in which case, feel free to speak up. If there is a way to do this, I would like to know.

Ok, so it’s not possible to get an async callback that takes a reference. However, in the toy example above, we are only dealing with simple integers. Let’s reject the generic and also pass everything by value instead of reference:

struct MyVec(Vec<u32>);
impl MyVec {
    pub fn for_all(&self, mut f: impl FnMut(u32)) {
        for v in self.0.iter().copied() {
            f(v);
        }
    }

    pub async fn async_for_all<Fut>(&self, mut f: impl FnMut(u32) -> Fut)
        where Fut: Future<Output = ()>,
    {
        for v in self.0.iter().copied() {
            f(v).await;
        }
    }
}

This actually worked for our first example. The following compilations are fine:

mv.async_for_all(|v| async move {println!("{}", v);}).await;

Unfortunately, it still doesn’t work when we try to pass a callback that actually gets an object:

let mut sum = 0;
let r = &mut sum;
mv.async_for_all(|v| async move {*r += v}).await;
error(E0507): cannot move out of `r`, a captured variable in an `FnMut` closure
   --> src/main.rs:137:26
    |
136 |   let r = &mut sum;
    |       - captured outer variable
137 |   mv.async_for_all(|v| async move {*r += v}).await;
    |                   --- ^^^^^^^^^^  --
    |                   |   |           |
    |                   |   |           variable moved due to use in coroutine
    |                   |   |           move occurs because `r` has type `&mut u32`, which does not implement the `Copy` trait
    |                   |   `r` is moved here
    |                   captured by this `FnMut` closure

The problem here is that the signature of async_for_all above is not general enough.

What kind of closure do we have? To understand the problem, let’s try writing the closure by hand using explicit types.

First, we need to create a type for the future that we will return. In most cases, it is impossible to write your own future in safe Rustbut in a simple case like this without borrowing, it is true is the possible:

struct MyFut<'a>{
    r: &'a mut u32,
    v: u32,
}
impl<'a> Future for MyFut<'a> {
    type Output = ();

    fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
        *self.r += self.v;
        Poll::Ready(())
    }    
}

Now we need a type that represents the closure itself:

struct SumCallback<'a> {
    r: &'a mut u32,
}
impl<'a> SumCallback<'a> {
    fn call_mut<'s>(&'s mut self, v: u32) -> MyFut<'s> {
        MyFut{r: &mut self.r, v}
    }
}

Note: The 's Lifetime can be removed, but I write it clearly here for clarity.

This code compiles. The problem is the signature of the call_mut The method we described is not exactly the same as the real signature FnMut attitude The truth FnMut The attribute forces the output type to be independent of self whole life.

FnMut was probably designed this way because a) Rust doesn’t have generic associated launch types and b) it’s not clear what kind of shorthand syntax you should use. You can try to explain a magic 'self type syntax, which allows you to write type as impl FnMut(u32) -> MyFut<'self>but that’s a bit of a hack and won’t work if you let it impl FnMuts nest. In any case, that’s not how it is FnMut working now, and so we are stuck again.

Besides, Rust has three functional characteristics, Fn, FnMutand FnOncewith ways receiver of &self, &mut selfand self each one. However, FnMut is the only one where the lack of a life of its own is an issue. In the case of Fnany reference to derived values ​​must be a shared reference, and therefore Copyand therefore there is no problem in returning a reference to the entire type. For FnOnceyou cannot borrow the obtained amounts in advance.

The reason FnMut it’s amazing &mut references are the only case where borrow again relevant. In our call_mut method, we will not return the captured r direct reference (with lifetime 'a). Instead, we returned to the temporary sub-loan in that reference throughout life 's. if r about a &u32 rather than &mut u32then that’s it Copy and so that we can return the whole 'a reference without problem.

Here’s a simplified version of some code I wrote earlier at work:

async fn update_value(foo: Arc<std::sync::Mutex<Foo>>, new_val: u32) {
    let mut locked_foo = foo.lock().unwrap();

    let old_val = locked_foo.val;
    if new_val == old_val {
        locked_foo.send_no_changes();
    } else {
        // Release the mutex so we don't hold it across an await point.
        std::mem::drop(locked_foo);
        // Now do some expensive work
        let changes = get_changes(old_val, new_val).await;
        // And send the result
        foo.lock().unwrap().send_changes(changes);
    }
}

We lock an object, and if the field hasn’t changed, we take the fast path, otherwise, we drop the lock, do some processing, and then lock it again to send the update.

As an aside, I’m sure someone will ask what will happen if foo.val will change as the lock is dropped. In this case, it’s the only task that writes to the field, so that doesn’t happen (the only reason we need the mutex is because there’s another task that read field). Also, since we don’t do anything expensive while holding the lock and don’t expect any real friction, we just use a regular one. std::sync::Mutex than the more typical async-aware tokio Mutexbut that is not relevant to the problem discussed here.

So what’s the problem? No problem, as long as it only ran in the root task. With the standard multithreaded Tokio runtime, you can run a task on the main thread using block_onand this future is unnecessary Send. But, whatever OTHERS The tasks you produce require that your future Send.

To increase parallelism and avoid blocking the main thread, I want to move this code out of the main thread and into another task. Unfortunately, the future above is not Send and therefore cannot be born as a task.

note: future is not `Send` as this value is used across an await
   --> src/main.rs:183:53
    |
175 |   let mut locked_foo = foo.lock().unwrap();
    |       -------------- has type `MutexGuard<'_, Foo>` which is not `Send`
...
183 |       let changes = get_changes(old_val, new_val).await;
    |                                                   ^^^^^ await occurs here, with `mut locked_foo` maybe used later

Now, this code NEED BE Send. After all, it never was to be honest holds the lock across a wait point (which can be dangerous for deadlocks). However, the compiler is now does not perform any control flow analysis when deciding whether the future Send or not, and therefore it is marked as a false positive.

As a workaround, I had to move the lock to an explicit scope, and then duplicate the if condition and move the other branch outside the scope:

async fn update_value(foo: Arc<std::sync::Mutex<Foo>>, new_val: u32) {
    let old_val = {
        let mut locked_foo = foo.lock().unwrap();

        let old_val = locked_foo.val;
        if new_val == old_val {
            locked_foo.send_no_changes();
        }
        old_val
        // Drop the lock here, so the compiler understands this is Send
    };

    if new_val != old_val {
        let changes = get_changes(old_val, new_val).await;
        foo.lock().unwrap().send_changes(changes);
    }
}

Rust’s type system already works in most cases, but there are occasional surprises. No static type system will allow that all valid programs, due to undecidability issuesbut programming languages ​​can be so good that this is rarely a practical problem. One of the challenges of programming language design is figuring out how to allow as many reasonable programs as you can within your complexity and performance budget (this includes not only the compiler implementation but the complexity of the language itself, and especially the type system) .

From the issues I’ve highlighted here, #1 and #4 in particular seem like obvious things to fix that would provide more value for less cost. #2 and #3 are trickier, because they require syntax typing changes and have a high complexity cost there. However, it’s unfortunate how bad async Rust is compared to classic straight-line Rust.

2024-12-22 10:33:00

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button