双重加锁DCL的陷阱【译】

2018-04-10 本文已影响53人木驴的天空

双重检查锁（DCL）是一种被广泛用在多线程中有效实现懒加载的方式。然而，在没有额外同步的情况下，即使是在平台无关的JAVA实现中，它可能还是不会可靠的工作。在其它如C++等语言的实现中，它依赖于处理器的内存模型、编译器的重排序及编译器与同步库之间的交互。由于这些问题在C++中并不确定，因此不能够确定具体的行为。虽然C++中显示使用内存屏障可以使它正常工作，但是内存屏障在JAVA是无效的。
事实上，根据JAVA并发编程实战16.2.4中对于DCL的描述，DCL实际上是一种已经被废弃的模式。使用其他线程安全的懒加载模式，能获得同样的优势。

考虑下面的代码：

// Single threaded version
class Foo { 
  private Helper helper = null;
  public Helper getHelper() {
    if (helper == null) 
        helper = new Helper();
    return helper;
    }
  // other functions and members...
  }

在多线程环境中使用以上代码，会出现很多问题。最明显的是会创建两个甚至更多的Helper对象（稍后介绍其他问题）。为了解决这个问题，最简单的做法是给getHelper()方法加上synchronized，如下：

// Correct multithreaded version
class Foo { 
  private Helper helper = null;
  public synchronized Helper getHelper() {
    if (helper == null) 
        helper = new Helper();
    return helper;
    }
  // other functions and members...
  }

以上代码每次调用getHelper()都会执行进行同步。双重检查锁语义就是试图避免在helper已经创建的情况下再同步。然而，这段代码在编译器优化和共享内存的处理器下都不会可靠工作。

它不会可靠工作

有多种原因导致它不会可靠工作，我们将描述一对很显而易见的原因。通过理解存在的问题，我们尝试着去修复双重检查锁语义存在的问题，然而我们的修复可能并没有用，我们可以一起看看为什么没有用，理解这些原因，我们去尝试着寻找更好的方法，可能还是没有用，因为还是存在一些微妙的原因。

它不会可靠工作的第一个原因

最显而易见的原因是，初始化Helper对象的写指令和初始化helper属性的写指令可能会被重排序。因此，调用getHelper()方法的线程本可能看到的是一个non-null helper对象引用，却看到的是helper的默认值，而不是在构造器中设置的值。

如果编译器内联调用构造器，那么编译器在确保构造器不会抛出异常和执行同步的情况下，完全有可能对初始化Helper()对象的写指令和初始化helper属性的写指令进行重排序。即使编译器对这些写指令不重排序，在多处理器和内存系统中也会对这些写指令重排序以确保对其他处理器上的线程是可见的。Doug Lea大神写了一篇更加详细的文章Synchronization and the Java Memory Model

下面这个测试用例展示它为何不可靠工作

Paul Jakubik发现了双重检查锁不可靠工作的例子。

public class DoubleCheckTest
  {
 // static data to aid in creating N singletons
  static final Object dummyObject = new Object(); // for reference init
  static final int A_VALUE = 256; // value to initialize 'a' to
  static final int B_VALUE = 512; // value to initialize 'b' to
  static final int C_VALUE = 1024;
  static ObjectHolder[] singletons;  // array of static references
  static Thread[] threads; // array of racing threads
  static int threadCount; // number of threads to create
  static int singletonCount; // number of singletons to create
  
  static volatile int recentSingleton;

  // I am going to set a couple of threads racing,
  // trying to create N singletons. Basically the
  // race is to initialize a single array of 
  // singleton references. The threads will use
  // double checked locking to control who 
  // initializes what. Any thread that does not
  // initialize a particular singleton will check 
  // to see if it sees a partially initialized view.
  // To keep from getting accidental synchronization,
  // each singleton is stored in an ObjectHolder 
  // and the ObjectHolder is used for 
  // synchronization. In the end the structure
  // is not exactly a singleton, but should be a
  // close enough approximation.
  // 

  // This class contains data and simulates a 
  // singleton. The static reference is stored in
  // a static array in DoubleCheckFail.
  static class Singleton
    {
    public int a;
    public int b;
    public int c;
    public Object dummy;

    public Singleton()
      {
      a = A_VALUE;
      b = B_VALUE;
      c = C_VALUE;
      dummy = dummyObject;
      }
    }

  static void checkSingleton(Singleton s, int index)
    {
    int s_a = s.a;
    int s_b = s.b;
    int s_c = s.c;
    Object s_d = s.dummy;
    if(s_a != A_VALUE)
      System.out.println("[" + index + "] Singleton.a not initialized " +
s_a);
    if(s_b != B_VALUE)
      System.out.println("[" + index 
                         + "] Singleton.b not intialized " + s_b);
    
    if(s_c != C_VALUE)
      System.out.println("[" + index 
                         + "] Singleton.c not intialized " + s_c);
    
    if(s_d != dummyObject)
      if(s_d == null)
        System.out.println("[" + index 
                           + "] Singleton.dummy not initialized," 
                           + " value is null");
      else
        System.out.println("[" + index 
                           + "] Singleton.dummy not initialized," 
                           + " value is garbage");
    }

  // Holder used for synchronization of 
  // singleton initialization. 
  static class ObjectHolder
    {
    public Singleton reference;
    }

  static class TestThread implements Runnable
    {
    public void run()
      {
      for(int i = 0; i < singletonCount; ++i)
        {
    ObjectHolder o = singletons[i];
        if(o.reference == null)
          {
          synchronized(o)
            {
            if (o.reference == null) {
              o.reference = new Singleton();
          recentSingleton = i;
          }
            // shouldn't have to check singelton here
            // mutex should provide consistent view
            }
          }
        else {
          checkSingleton(o.reference, i);
      int j = recentSingleton-1;
      if (j > i) i = j;
      }
        } 
      }
    }

  public static void main(String[] args)
    {
    if( args.length != 2 )
      {
      System.err.println("usage: java DoubleCheckFail" +
                         " <numThreads> <numSingletons>");
      }
    // read values from args
    threadCount = Integer.parseInt(args[0]);
    singletonCount = Integer.parseInt(args[1]);
    
    // create arrays
    threads = new Thread[threadCount];
    singletons = new ObjectHolder[singletonCount];

    // fill singleton array
    for(int i = 0; i < singletonCount; ++i)
      singletons[i] = new ObjectHolder();

    // fill thread array
    for(int i = 0; i < threadCount; ++i)
      threads[i] = new Thread( new TestThread() );

    // start threads
    for(int i = 0; i < threadCount; ++i)
      threads[i].start();

    // wait for threads to finish
    for(int i = 0; i < threadCount; ++i)
      {
      try
        {
        System.out.println("waiting to join " + i);
        threads[i].join();
        }
      catch(InterruptedException ex)
        {
        System.out.println("interrupted");
        }
      }
    System.out.println("done");
    }
  }

当它运行在Symantec JIT上时，它不会正确工作。Symantec JIT把

singletons[i].reference = new Singleton();

编译成如下指令：

0206106A   mov         eax,0F97E78h
0206106F   call        01F6B210                  ; allocate space for
                                                 ; Singleton, return result in eax
02061074   mov         dword ptr [ebp],eax       ; EBP is &singletons[i].reference 
                                                ; store the unconstructed object here.
02061077   mov         ecx,dword ptr [eax]       ; dereference the handle to
                                                 ; get the raw pointer
02061079   mov         dword ptr [ecx],100h      ; Next 4 lines are
0206107F   mov         dword ptr [ecx+4],200h    ; Singleton's inlined constructor
02061086   mov         dword ptr [ecx+8],400h
0206108D   mov         dword ptr [ecx+0Ch],0F84030h

正如你所见，singletons[i].reference的赋值在Singleton的构造器之前执行，这在现有的JAVA内存模型下完全是合法的，在C、C++中同样也是合法的（它们没有内存模型的概念）。

下面的修改依然可能是不可靠工作

对于以上的解释，许多人给出了如下的代码：

// (Still) Broken multithreaded version
// "Double-Checked Locking" idiom
class Foo { 
  private Helper helper = null;
  public Helper getHelper() {
    if (helper == null) {
      Helper h;
      synchronized(this) {
        h = helper;
        if (h == null) 
            synchronized (this) {
              h = new Helper();
            } // release inner synchronization lock
        helper = h;
        } 
      }    
    return helper;
    }
  // other functions and members...
  }

这段代码把Helper对象的构造放在内置的同步代码块中。直觉的想法是想通过synchronized释放之后的屏障来避免问题，从而阻止对helper属性的赋值和对Helper对象的构造的指令重排序。
然而，这种直觉是绝对错误的。因为synchronized规则只能保证monitorexit（如synchronization的释放）在synchronization释放之前执行。并没有规则保证monitorexit之后的指令一定不会在synchronized释放之前执行。对于编译器来说，把helper = h; 赋值语句移到里层同步块内完全是合理的、合法的（这样又变成了上一个问题了）。许多处理器都提供了这种单向内存屏障的指令，如果想要获取双向完全的内存屏障语义，就会有性能损失。

你可以强制写操作来执行双向的内存屏障，但这是一种丑陋、低效、被JAVA内存模型摒弃掉的方式。不要使用这种方式。如果你感兴趣，可以参考BidirectionalMemoryBarrier。
然而，即使使用双向的内存屏障，也不一定能正确工作。原因是在一些系统上，helper属性也需要使用内存屏障。什么原因呢？这是因为处理器都有自己的本地缓存，除非执行处理器缓存一致性指令(如内存屏障)，否则就算其他处理器使用内存屏障强制写入到全局内存中，当前处理器还是有可能读到本地失效的缓存。如Alpha processor

静态单例

如果你所创建的单例对象是静态的，有一种简单而优雅的方式保证它正确工作。把单例作为静态属性定义在一个单独的类中，JAVA语义会保证这个静态属性在被引用时进行初始化（懒加载），而且其他线程能够正确的访问到初始化后的结果。代码如下：

class HelperSingleton {
  static Helper singleton = new Helper();
  }

用于32位的原始类型能正确工作

尽管Double-Checked Locking用于对象引用不能正确工作，但是用于32位原始类型（如int、float等），却能正确工作。注意，用于long和double也无法正常工作，64位原始类型的读和写并不能保证是原子操作。

// Correct Double-Checked Locking for 32-bit primitives
class Foo { 
  private int cachedHashCode = 0;
  public int hashCode() {
    int h = cachedHashCode;
    if (h == 0) 
    synchronized(this) {
      if (cachedHashCode != 0) return cachedHashCode;
      h = computeHashCode();
      cachedHashCode = h;
      }
    return h;
    }
  // other functions and members...
  }

事实上，假定computeHashCode()返回的结果都一样，没有副作用（如幂等的）的情况下，可以把这些synchronized都去掉。如：

// Lazy initialization 32-bit primitives
// Thread-safe if computeHashCode is idempotent
class Foo { 
  private int cachedHashCode = 0;
  public int hashCode() {
    int h = cachedHashCode;
    if (h == 0) {
      h = computeHashCode();
      cachedHashCode = h;
      }
    return h;
    }
  // other functions and members...
  }

显示内存屏障使其正确工作

如果使用显示的内存屏障指令，可以使Double-Checked Locking正确工作。如果你使用的C++语言，你可以从Doug Schmidt的书中获取如下代码：

// C++ implementation with explicit memory barriers
// Should work on any platform, including DEC Alphas
// From "Patterns for Concurrent and Distributed Objects",
// by Doug Schmidt
template <class TYPE, class LOCK> TYPE *
Singleton<TYPE, LOCK>::instance (void) {
    // First check
    TYPE* tmp = instance_;
    // Insert the CPU-specific memory barrier instruction
    // to synchronize the cache lines on multi-processor.
    asm ("memoryBarrier");
    if (tmp == 0) {
        // Ensure serialization (guard
        // constructor acquires lock_).
        Guard<LOCK> guard (lock_);
        // Double check.
        tmp = instance_;
        if (tmp == 0) {
                tmp = new TYPE;
                // Insert the CPU-specific memory barrier instruction
                // to synchronize the cache lines on multi-processor.
                asm ("memoryBarrier");
                instance_ = tmp;
        }
    return tmp;
    }

使用ThreadLocal解决Double-Checked Locking遇到的问题

Alexander Terekhov使用提出了一个非常聪明的基于ThreadLocal的实现。每个线程保存一个线程本地标识，来判断是否已经进行了必要的同步。

 class Foo {
     /** If perThreadInstance.get() returns a non-null value, this thread
        has done synchronization needed to see initialization
        of helper */
         private final ThreadLocal perThreadInstance = new ThreadLocal();
         private Helper helper = null;
         public Helper getHelper() {
             if (perThreadInstance.get() == null) createHelper();
             return helper;
         }
         private final void createHelper() {
             synchronized(this) {
                 if (helper == null)
                     helper = new Helper();
             }
         // Any non-null value would do as the argument here
             perThreadInstance.set(perThreadInstance);
         }
  }

这种技术的性能取决于你使用的JDK版本，Sun实现的JDK2中ThreadLocal性能很低，JDK3性能得到显著提升，JDK4性能更佳。请参考Doug Lea大神写的Performance of techniques for correctly implementing lazy initialization

使用volatile

JDK5及后续版本使用的新的内存模型，扩展了volatile语义，它禁止读写指令的重排序。可以把helper属性声明成volatile即可。

// Works with acquire/release semantics for volatile
// Broken under current semantics for volatile
  class Foo {
        private volatile Helper helper = null;
        public Helper getHelper() {
            if (helper == null) {
                synchronized(this) {
                    if (helper == null)
                        helper = new Helper();
                }
            }
            return helper;
        }
    }

使用不可变对象

如果Helper是不可变对象（Helper所有的属性都是final的）那么不使用volatile声明属性也可以正常工作。这是因为引用不可变对象（如String、Integer等）有点类似于32位的原始类型。

原文The "Double-Checked Locking is Broken" Declaration