Rust for cpp devs - Ownership

2021-04-05 本文已影响0人找不到工作

编程语言的内存管理一般有两种：

带垃圾回收机制的，如 Java，Golang，会在运行时检查不再使用的内存并回收，这样会牺牲程序的速度。
手动分配回收的，如 cpp。容易产生内存泄漏。

Rust 采用了第三种，即利用一系列关于所有权（ownership）的规则来管理内存。这些规则都是在编译时检查的，因此不会拖慢程序的速度。

所有权规则

Rust 有三条关于所有权的规则：

Rust 的每个值都有一个 owner 变量
在同一时间，每个值有且仅有一个 owner
当离开 owner 作用域后，这个值会被丢弃（dropped）

    {                      // s is not valid here, it’s not yet declared
        let s = "hello";   // s is valid from this point forward

        // do stuff with s
    }                      // this scope is now over, and s is no longer valid

内存和分配

我们以 String 类型为例来说明 Rust 如何管理 ownership。

String 在内存中的表达包括了：

一段 raw data，分配在 heap 中，存放了字符串的内容，。
一个结构体，分配在 stack 中，包含了指向数据的指针 ptr，字符串长度 len，以及字符串容量 capacity。

这与 golang 的 slice 表示方式基本一致。

String 在内存中的存储方式

则我们可以对它进行多种内存操作：

移动（Move）
克隆（Clone）
拷贝（Copy），仅对 stack 上的数据支持此操作

Rust 的其中一条设计原则是：Rust 从不自动对数据进行深拷贝。因此，默认的拷贝行为都是廉价的。

移动（Move）

移动发生在对变量进行赋值的时候，它非常类似于 cpp 中的 std::move，但是是 Rust 中赋值时的默认行为。这是为了满足所有权的规则：

在同一时间，每个值有且仅有一个 owner

fn main() {
    let s1 = String::from("hello");
    let s2 = s1;

    println!("{}, world!", s1);
}

这个程序编译会报错：

error[E0382]: borrow of moved value: `s1`

原因是 s1 在赋值后已经失效，无法使用。Rust 在 s1 离开作用域时也不会释放 s1 指向对内存，因为其所有权已经移交给 s2。

Move 的内存操作

克隆（Clone）

如果确实需要拷贝 heap 上的内容，而不仅是 stack 上的，我们还可以使用 clone 方法。

fn main() {
    let s1 = String::from("hello");
    let s2 = s1.clone();

    println!("s1 = {}, s2 = {}", s1, s2);
}

程序编译成功，现在 s1 和 s2 都持有一份 heap 数据。如下图所示：

Clone 的内存操作

拷贝（Copy）

对于不需要使用 heap 空间的数据，无需 clone，也可以在赋值后使用：

fn main() {
    let x = 5;
    let y = x;

    println!("x = {}, y = {}", x, y);
}

主要是由于 Rust 认为此后无需从 heap 中释放空间，而且该变量的深拷贝、浅拷贝没有什么不同，因此无需使用 clone。

这些变量类型包括：

所有的整形，如 u32
所有的布尔形，如 bool
所有浮点型，如 f64
所有字符类型，如char
所有由以上类型构成的 Tuple。

Ownership and Functions

当我们给一个函数传参时，要么会发生 Move，要么会发生 Copy。

fn main() {
    let s = String::from("hello");  // s comes into scope

    takes_ownership(s);             // s's value moves into the function...
                                    // ... and so is no longer valid here

    let x = 5;                      // x comes into scope

    makes_copy(x);                  // x would move into the function,
                                    // but i32 is Copy, so it’s okay to still
                                    // use x afterward

} // Here, x goes out of scope, then s. But because s's value was moved, nothing
  // special happens.

fn takes_ownership(some_string: String) { // some_string comes into scope
    println!("{}", some_string);
} // Here, some_string goes out of scope and `drop` is called. The backing
  // memory is freed.

fn makes_copy(some_integer: i32) { // some_integer comes into scope
    println!("{}", some_integer);
} // Here, some_integer goes out of scope. Nothing special happens.

如果我们在调用 take_ownership 后使用 s，则编译时就会报错。因为会执行关于 ownership 的静态检查。而由于 x 是使用的 Copy 所以不存在这个问题。

References and Borrowing

在这样的机制下，如果我们需要写一个获取字符串长度的函数，我们不得不将字符串的 ownership 传入函数中，再通过 return 返回：

fn calculate_length(s: String) -> (String, usize) {
    let length = s.len();
    return (s, length);
}

fn main() {
    let s1 = String::from("hello");

    let (s2, len) = calculate_length(s1);

    println!("The length of '{}' is {}.", s2, len);
}

这样做显然非常不方便，因此 Rust 引入了引用的概念。允许我们在不拿走 ownership 的情况下使用一个值。可以使用 & 来表示常量引用，&mut 表示可变引用。

fn calculate_length(s: &String) -> usize {
    return s.len();
}

fn main() {
    let s1 = String::from("hello");

    let len = calculate_length(&s1);

    println!("The length of '{}' is {}.", s1, len);
}

这样下来函数自然了很多。注意，我们在传参时使用 & 指明传引用：

    let len = calculate_length(&s1);

在函数体中，我们接受的参数也是明确是引用：

fn calculate_length(s: &String) -> usize { // s is a reference to a String
    return s.len();
} // Here, s goes out of scope. But because it does not have ownership of what
  // it refers to, nothing happens.

引用

可变引用

我们也可以通过&mut 声明一个可变的引用。例如：

fn change(s: &mut String) {
    s.push_str(", world")
}

fn main() {
    let mut s1 = String::from("hello");

    change(&mut s1);

    println!("s1 = {}", s1);
}

Rust 中，可变引用的使用有如下限制：

在一个作用域内，对于同一份数据，只能有一个可变引用。
有可变引用存在时，不能使用常量引用。

以下代码会报错，由于对 s 创建了 r1 r2 两个可变引用。

fn main() {
    let mut s = String::from("hello");

    let r1 = &mut s;
    let r2 = &mut s;

    println!("{}, {}", r1, r2);
}

以下代码还是会报错，因为对 s 创建了可变引用以及常量引用。

fn main() {
    let mut s = String::from("hello");

    let r1 = & s;
    let r2 = &mut s;

    println!("r1 = {}, r2 = {}", r1, r2);
}

这些限制是为了避免发生数据竞争。r1 的使用者期望数据在使用时不发生变化，而 r2 却是个可变引用。

如果我们在创建可变引用 r3 后不再使用之前的常量引用 r1 r2，编译就不会有问题了。

fn main() {
    let mut s = String::from("hello");

    let r1 = & s;  // no problem
    let r2 = & s;  // no problem
    println!("r1 = {}, r2 = {}", r1, r2);
        // r1 and r2 are no longer used after this point, though they are still valid

    let r3 = &mut s;  // no problem
    println!("r3 = {}", r3);
}

这些限制主要是为了在编译时期就保证运行时期的数据安全。

Dangling References

cpp 中我们会有悬垂指针的问题。即，指针指向的内存已经 invalid，这会导致内存错误或者未定义的行为。在 Rust 中，编译器保证了不存在悬垂指针，任何数据不会在它的引用离开作用域前被回收。