C/C++转Rust的项目实践总结 6-10

2018-10-21 本文已影响46人熊皮皮

全局static变量
6.1. lazy_static的局限
全局可变单例
7.1. libstd的全局可变单例实现方案
宏与宏开发技巧
8.1. 宏展开的编译器指令
8.2. 宏展开编译器指令的缺点

6. 全局static变量

假设有多个项目实现运行期间协同，其中项目1设置了资源到一个全局static变量global_var，由项目2做global_var后续读写。但是有个限制：项目2的函数读写时不带项目1提供的参数，典型场景是eglMakeCurrent和后续OpenGL函数调用，全局static变量global_var的C++实现示例如下：

// C++
static Engine *CURRENT_ENGINE = nullptr;

Engine *GetCurrentEngine()
{
    return CURRENT_ENGINE;
}

void SetCurrentEngine(Engine *engine)
{
    CURRENT_ENGINE = engine;
}

// 所有外部调用都通过此函数，比如 GetCurrentEngine()->GetAssetManager()...

由于自己对Rust了解较少，这个场景尝试用Rust safe代码实现遇到了较大困难，只好退而求其次用unsafe实现。

static mut CURRENT_ENGINE: *mut Engine = std::ptr::null_mut();

fn set_current_engine(engine: *mut Engine) {
    unsafe {
        CURRENT_ENGINE = engine;
    }
}

fn current_engine<'a>() -> &'a Engine {
    unsafe { &*CURRENT_ENGINE }
}

fn current_engine_mut<'a>() -> &'a mut Engine {
    unsafe { &mut *CURRENT_ENGINE }
}

有人会质疑，为什么不用lazy_static？虽然lazy_static支持运行期间分配内存和对象的内部可变性，然而它不支持替换我们声明的对象本身。

6.1. lazy_static的局限

lazy_static! {
    pub(crate) static ref CURRENT_ENGINE: Arc<Mutex<Context>> = Arc::new(Mutex::new(Engine::new()));
}

CURRENT_ENGINE本身无法像前面的C++代码一样替换成另一个实例，但它lock().unwrap()后可以修改内部数据，示例如下：

lazy_static! {
    static ref REGISTRY: Arc<Mutex<FastHashMap<usize, &'static str>>> = Arc::new(Mutex::new(FastHashMap::default()));
}

pub fn report_leaks() {
    println!("Leaked handles:");
    let mut map = REGISTRY.lock().unwrap();
    for (_, type_id) in map.drain() {
        println!("\t{:?}", type_id);
    }
}

REGISTRY.lock().unwrap().insert(ptr as _, name);
REGISTRY.lock().unwrap().remove(&(self.0 as _)).unwrap();
REGISTRY.lock().unwrap().contains_key(&(self.0 as _));

7. 全局可变单例

这是问题6的延伸，C++实现的图形项目经常会使用全局可变单例，而Rust不支持这种做法，通常借助lazy_static实现，简化了背后的复杂实现机制。lazy_static的使用已在问题6描述，下面探索标准库的实现方案。

7.1. libstd的全局可变单例实现方案

以下为rustc 1.30.0-beta.14 (1320d2145 2018-10-09)的源码，完整代码见libstd/io/lazy.rs。

use cell::Cell;
use ptr;
use sync::Arc;
use sys_common;
use sys_common::mutex::Mutex;

pub struct Lazy<T> {
    // We never call `lock.init()`, so it is UB to attempt to acquire this mutex reentrantly!
    lock: Mutex,
    ptr: Cell<*mut Arc<T>>,
}

#[inline]
const fn done<T>() -> *mut Arc<T> { 1_usize as *mut _ }

unsafe impl<T> Sync for Lazy<T> {}

impl<T> Lazy<T> {
    pub const fn new() -> Lazy<T> {
        Lazy {
            lock: Mutex::new(),
            ptr: Cell::new(ptr::null_mut()),
        }
    }
}

impl<T: Send + Sync + 'static> Lazy<T> {
    /// Safety: `init` must not call `get` on the variable that is being
    /// initialized.
    pub unsafe fn get(&'static self, init: fn() -> Arc<T>) -> Option<Arc<T>> {
        let _guard = self.lock.lock();
        let ptr = self.ptr.get();
        if ptr.is_null() {
            Some(self.init(init))
        } else if ptr == done() {
            None
        } else {
            Some((*ptr).clone())
        }
    }

    // Must only be called with `lock` held
    unsafe fn init(&'static self, init: fn() -> Arc<T>) -> Arc<T> {
        // If we successfully register an at exit handler, then we cache the
        // `Arc` allocation in our own internal box (it will get deallocated by
        // the at exit handler). Otherwise we just return the freshly allocated
        // `Arc`.
        let registered = sys_common::at_exit(move || {
            let ptr = {
                let _guard = self.lock.lock();
                self.ptr.replace(done())
            };
            drop(Box::from_raw(ptr))
        });
        // This could reentrantly call `init` again, which is a problem
        // because our `lock` allows reentrancy!
        // That's why `get` is unsafe and requires the caller to ensure no reentrancy happens.
        let ret = init();
        if registered.is_ok() {
            self.ptr.set(Box::into_raw(Box::new(ret.clone())));
        }
        ret
    }
}

stdin/stdout/stderr都用了上述lazy功能初始化。

pub fn stdin() -> Stdin {
    static INSTANCE: Lazy<Mutex<BufReader<Maybe<StdinRaw>>>> = Lazy::new();
    return Stdin {
        inner: unsafe {
            INSTANCE.get(stdin_init).expect("cannot access stdin during shutdown")
        },
    };

    fn stdin_init() -> Arc<Mutex<BufReader<Maybe<StdinRaw>>>> {
        // This must not reentrantly access `INSTANCE`
        let stdin = match stdin_raw() {
            Ok(stdin) => Maybe::Real(stdin),
            _ => Maybe::Fake
        };

        Arc::new(Mutex::new(BufReader::with_capacity(stdio::STDIN_BUF_SIZE, stdin)))
    }
}

8. 宏与宏开发技巧

前面获取CURRENT_ENGINE的代码没判断指针是否为空，直接解引用，这很可能变成野指针导致程序崩溃。下面还是根据我们业务场景给出示例。

#[macro_export]
macro_rules! validate_current_context {
    () => {{
        #[allow(unsafe_code)]
        unsafe {
            if CURRENT_CONTEXT.is_null() {
                println!("No valid egl context, return directly");
                return;
            }
        };
    }};
}

C/C++宏在预编译阶段完成文本替换，不做任何校验。Rust宏则是基于表达式实现，与C/C++宏有本质区别。
宏的编写容易出错，为了检查宏是否正确，可以借助编译器指令进行展开。

8.1 宏展开的编译器指令

rustc +nightly -Z unstable-options --pretty=expanded path/your_file.rs

前面定义的validate_current_context，调用后生成代码如下所示：

// -------- 原始代码
fn main() {
    validate_current_context!();
}
// -------- 展开代码
fn main() {
    {
        #[allow(unsafe_code)]
        unsafe {
            if CURRENT_CONTEXT.is_null() {
                {
                    ::io::_print(::std::fmt::Arguments::new_v1(
                        &["No valid egl context, return directly\n"],
                        &match () {
                            () => [],
                        },
                    ));
                };
                return;
            }
        };
    };
}

8.2 宏展开编译器指令的缺点

目前遇到了，rustc不接受--features=xxx语法问题。

致谢

开发过程中Rust编程语言社区主群（303838735）的朋友们非常热心地答疑，感谢黑化的齿轮、我是傻逼我自豪、λCrLF·º⁷¹º、KiChjang、{ Chaos Bot}、DCjanus、Solmyr等朋友。

参考

how do i create a global mutable singleton