线程安全的数据结构
相信大家都了解或者至少听说过Vector和ArrayList的区别。
Vector是线程安全的,但是ArrayList的效率更高。道理很容易理解,效率和安全肯定只能关注一边。但是Vector是如何实现线程安全的呢?
/**
* The {@code Vector} class implements a growable array of
* objects. Like an array, it contains components that can be
* accessed using an integer index. However, the size of a
* {@code Vector} can grow or shrink as needed to accommodate
* adding and removing items after the {@code Vector} has been created.
*
* <p>Each vector tries to optimize storage management by maintaining a
* {@code capacity} and a {@code capacityIncrement}. The
* {@code capacity} is always at least as large as the vector
* size; it is usually larger because as components are added to the
* vector, the vector's storage increases in chunks the size of
* {@code capacityIncrement}. An application can increase the
* capacity of a vector before inserting a large number of
* components; this reduces the amount of incremental reallocation.
*
* <p><a name="fail-fast">
* The iterators returned by this class's {@link #iterator() iterator} and
* {@link #listIterator(int) listIterator} methods are <em>fail-fast</em></a>:
* if the vector is structurally modified at any time after the iterator is
* created, in any way except through the iterator's own
* {@link ListIterator#remove() remove} or
* {@link ListIterator#add(Object) add} methods, the iterator will throw a
* {@link ConcurrentModificationException}. Thus, in the face of
* concurrent modification, the iterator fails quickly and cleanly, rather
* than risking arbitrary, non-deterministic behavior at an undetermined
* time in the future. The {@link Enumeration Enumerations} returned by
* the {@link #elements() elements} method are <em>not</em> fail-fast.
*
* <p>Note that the fail-fast behavior of an iterator cannot be guaranteed
* as it is, generally speaking, impossible to make any hard guarantees in the
* presence of unsynchronized concurrent modification. Fail-fast iterators
* throw {@code ConcurrentModificationException} on a best-effort basis.
* Therefore, it would be wrong to write a program that depended on this
* exception for its correctness: <i>the fail-fast behavior of iterators
* should be used only to detect bugs.</i>
*
* <p>As of the Java 2 platform v1.2, this class was retrofitted to
* implement the {@link List} interface, making it a member of the
* <a href="{@docRoot}/../technotes/guides/collections/index.html">
* Java Collections Framework</a>. Unlike the new collection
* implementations, {@code Vector} is synchronized. If a thread-safe
* implementation is not needed, it is recommended to use {@link
* ArrayList} in place of {@code Vector}.
*/
public class Vector<E>
extends AbstractList<E>
implements List<E>, RandomAccess, Cloneable, java.io.Serializable
{
}
通过类注释我们就能清楚的看到,Vector是通过synchronized关键字来实现线程安全的,如果不需要线程安全,推荐我们使用ArrayList。
让我们来查看一下Vector的add和get方法。
/**
* Inserts the specified element at the specified position in this Vector.
* Shifts the element currently at that position (if any) and any
* subsequent elements to the right (adds one to their indices).
*
* @param index index at which the specified element is to be inserted
* @param element element to be inserted
* @throws ArrayIndexOutOfBoundsException if the index is out of range
* ({@code index < 0 || index > size()})
* @since 1.2
*/
public void add(int index, E element) {
insertElementAt(element, index);
}
/**
* Inserts the specified object as a component in this vector at the
* specified {@code index}. Each component in this vector with
* an index greater or equal to the specified {@code index} is
* shifted upward to have an index one greater than the value it had
* previously.
*
* <p>The index must be a value greater than or equal to {@code 0}
* and less than or equal to the current size of the vector. (If the
* index is equal to the current size of the vector, the new element
* is appended to the Vector.)
*
* <p>This method is identical in functionality to the
* {@link #add(int, Object) add(int, E)}
* method (which is part of the {@link List} interface). Note that the
* {@code add} method reverses the order of the parameters, to more closely
* match array usage.
*
* @param obj the component to insert
* @param index where to insert the new component
* @throws ArrayIndexOutOfBoundsException if the index is out of range
* ({@code index < 0 || index > size()})
*/
public synchronized void insertElementAt(E obj, int index) {
modCount++;
if (index > elementCount) {
throw new ArrayIndexOutOfBoundsException(index
+ " > " + elementCount);
}
ensureCapacityHelper(elementCount + 1);
System.arraycopy(elementData, index, elementData, index + 1, elementCount - index);
elementData[index] = obj;
elementCount++;
}
/**
* Appends the specified element to the end of this Vector.
*
* @param e element to be appended to this Vector
* @return {@code true} (as specified by {@link Collection#add})
* @since 1.2
*/
public synchronized boolean add(E e) {
modCount++;
ensureCapacityHelper(elementCount + 1);
elementData[elementCount++] = e;
return true;
}
/**
* Adds the specified component to the end of this vector,
* increasing its size by one. The capacity of this vector is
* increased if its size becomes greater than its capacity.
*
* <p>This method is identical in functionality to the
* {@link #add(Object) add(E)}
* method (which is part of the {@link List} interface).
*
* @param obj the component to be added
*/
public synchronized void addElement(E obj) {
modCount++;
ensureCapacityHelper(elementCount + 1);
elementData[elementCount++] = obj;
}
/**
* Returns the element at the specified position in this Vector.
*
* @param index index of the element to return
* @return object at the specified index
* @throws ArrayIndexOutOfBoundsException if the index is out of range
* ({@code index < 0 || index >= size()})
* @since 1.2
*/
public synchronized E get(int index) {
if (index >= elementCount)
throw new ArrayIndexOutOfBoundsException(index);
return elementData(index);
}
我们可以看到被调用的方法都是有synchronized 来修饰的。
好了,接着让我们查看其它的线程安全类。最重要是后来推出的在java.util.concurrent包下的一些类,我们以ConcurrentHashMap为例。
/**
* A hash table supporting full concurrency of retrievals and
* high expected concurrency for updates. This class obeys the
* same functional specification as {@link java.util.Hashtable}, and
* includes versions of methods corresponding to each method of
* {@code Hashtable}. However, even though all operations are
* thread-safe, retrieval operations do <em>not</em> entail locking,
* and there is <em>not</em> any support for locking the entire table
* in a way that prevents all access. This class is fully
* interoperable with {@code Hashtable} in programs that rely on its
* thread safety but not on its synchronization details.
*
* <p>Retrieval operations (including {@code get}) generally do not
* block, so may overlap with update operations (including {@code put}
* and {@code remove}). Retrievals reflect the results of the most
* recently <em>completed</em> update operations holding upon their
* onset. (More formally, an update operation for a given key bears a
* <em>happens-before</em> relation with any (non-null) retrieval for
* that key reporting the updated value.) For aggregate operations
* such as {@code putAll} and {@code clear}, concurrent retrievals may
* reflect insertion or removal of only some entries. Similarly,
* Iterators, Spliterators and Enumerations return elements reflecting the
* state of the hash table at some point at or since the creation of the
* iterator/enumeration. They do <em>not</em> throw {@link
* java.util.ConcurrentModificationException ConcurrentModificationException}.
* However, iterators are designed to be used by only one thread at a time.
* Bear in mind that the results of aggregate status methods including
* {@code size}, {@code isEmpty}, and {@code containsValue} are typically
* useful only when a map is not undergoing concurrent updates in other threads.
* Otherwise the results of these methods reflect transient states
* that may be adequate for monitoring or estimation purposes, but not
* for program control.
*
* <p>The table is dynamically expanded when there are too many
* collisions (i.e., keys that have distinct hash codes but fall into
* the same slot modulo the table size), with the expected average
* effect of maintaining roughly two bins per mapping (corresponding
* to a 0.75 load factor threshold for resizing). There may be much
* variance around this average as mappings are added and removed, but
* overall, this maintains a commonly accepted time/space tradeoff for
* hash tables. However, resizing this or any other kind of hash
* table may be a relatively slow operation. When possible, it is a
* good idea to provide a size estimate as an optional {@code
* initialCapacity} constructor argument. An additional optional
* {@code loadFactor} constructor argument provides a further means of
* customizing initial table capacity by specifying the table density
* to be used in calculating the amount of space to allocate for the
* given number of elements. Also, for compatibility with previous
* versions of this class, constructors may optionally specify an
* expected {@code concurrencyLevel} as an additional hint for
* internal sizing. Note that using many keys with exactly the same
* {@code hashCode()} is a sure way to slow down performance of any
* hash table. To ameliorate impact, when keys are {@link Comparable},
* this class may use comparison order among keys to help break ties.
*
* <p>A {@link Set} projection of a ConcurrentHashMap may be created
* (using {@link #newKeySet()} or {@link #newKeySet(int)}), or viewed
* (using {@link #keySet(Object)} when only keys are of interest, and the
* mapped values are (perhaps transiently) not used or all take the
* same mapping value.
*
* <p>A ConcurrentHashMap can be used as scalable frequency map (a
* form of histogram or multiset) by using {@link
* java.util.concurrent.atomic.LongAdder} values and initializing via
* {@link #computeIfAbsent computeIfAbsent}. For example, to add a count
* to a {@code ConcurrentHashMap<String,LongAdder> freqs}, you can use
* {@code freqs.computeIfAbsent(k -> new LongAdder()).increment();}
*
* <p>This class and its views and iterators implement all of the
* <em>optional</em> methods of the {@link Map} and {@link Iterator}
* interfaces.
*
* <p>Like {@link Hashtable} but unlike {@link HashMap}, this class
* does <em>not</em> allow {@code null} to be used as a key or value.
*
* <p>ConcurrentHashMaps support a set of sequential and parallel bulk
* operations that, unlike most {@link Stream} methods, are designed
* to be safely, and often sensibly, applied even with maps that are
* being concurrently updated by other threads; for example, when
* computing a snapshot summary of the values in a shared registry.
* There are three kinds of operation, each with four forms, accepting
* functions with Keys, Values, Entries, and (Key, Value) arguments
* and/or return values. Because the elements of a ConcurrentHashMap
* are not ordered in any particular way, and may be processed in
* different orders in different parallel executions, the correctness
* of supplied functions should not depend on any ordering, or on any
* other objects or values that may transiently change while
* computation is in progress; and except for forEach actions, should
* ideally be side-effect-free. Bulk operations on {@link java.util.Map.Entry}
* objects do not support method {@code setValue}.
*
* <ul>
* <li> forEach: Perform a given action on each element.
* A variant form applies a given transformation on each element
* before performing the action.</li>
*
* <li> search: Return the first available non-null result of
* applying a given function on each element; skipping further
* search when a result is found.</li>
*
* <li> reduce: Accumulate each element. The supplied reduction
* function cannot rely on ordering (more formally, it should be
* both associative and commutative). There are five variants:
*
* <ul>
*
* <li> Plain reductions. (There is not a form of this method for
* (key, value) function arguments since there is no corresponding
* return type.)</li>
*
* <li> Mapped reductions that accumulate the results of a given
* function applied to each element.</li>
*
* <li> Reductions to scalar doubles, longs, and ints, using a
* given basis value.</li>
*
* </ul>
* </li>
* </ul>
*
* <p>These bulk operations accept a {@code parallelismThreshold}
* argument. Methods proceed sequentially if the current map size is
* estimated to be less than the given threshold. Using a value of
* {@code Long.MAX_VALUE} suppresses all parallelism. Using a value
* of {@code 1} results in maximal parallelism by partitioning into
* enough subtasks to fully utilize the {@link
* ForkJoinPool#commonPool()} that is used for all parallel
* computations. Normally, you would initially choose one of these
* extreme values, and then measure performance of using in-between
* values that trade off overhead versus throughput.
*
* <p>The concurrency properties of bulk operations follow
* from those of ConcurrentHashMap: Any non-null result returned
* from {@code get(key)} and related access methods bears a
* happens-before relation with the associated insertion or
* update. The result of any bulk operation reflects the
* composition of these per-element relations (but is not
* necessarily atomic with respect to the map as a whole unless it
* is somehow known to be quiescent). Conversely, because keys
* and values in the map are never null, null serves as a reliable
* atomic indicator of the current lack of any result. To
* maintain this property, null serves as an implicit basis for
* all non-scalar reduction operations. For the double, long, and
* int versions, the basis should be one that, when combined with
* any other value, returns that other value (more formally, it
* should be the identity element for the reduction). Most common
* reductions have these properties; for example, computing a sum
* with basis 0 or a minimum with basis MAX_VALUE.
*
* <p>Search and transformation functions provided as arguments
* should similarly return null to indicate the lack of any result
* (in which case it is not used). In the case of mapped
* reductions, this also enables transformations to serve as
* filters, returning null (or, in the case of primitive
* specializations, the identity basis) if the element should not
* be combined. You can create compound transformations and
* filterings by composing them yourself under this "null means
* there is nothing there now" rule before using them in search or
* reduce operations.
*
* <p>Methods accepting and/or returning Entry arguments maintain
* key-value associations. They may be useful for example when
* finding the key for the greatest value. Note that "plain" Entry
* arguments can be supplied using {@code new
* AbstractMap.SimpleEntry(k,v)}.
*
* <p>Bulk operations may complete abruptly, throwing an
* exception encountered in the application of a supplied
* function. Bear in mind when handling such exceptions that other
* concurrently executing functions could also have thrown
* exceptions, or would have done so if the first exception had
* not occurred.
*
* <p>Speedups for parallel compared to sequential forms are common
* but not guaranteed. Parallel operations involving brief functions
* on small maps may execute more slowly than sequential forms if the
* underlying work to parallelize the computation is more expensive
* than the computation itself. Similarly, parallelization may not
* lead to much actual parallelism if all processors are busy
* performing unrelated tasks.
*
* <p>All arguments to all task methods must be non-null.
*
* <p>This class is a member of the
* <a href="{@docRoot}/../technotes/guides/collections/index.html">
* Java Collections Framework</a>.
*
* @since 1.5
* @author Doug Lea
* @param <K> the type of keys maintained by this map
* @param <V> the type of mapped values
*/
public class ConcurrentHashMap<K,V> extends AbstractMap<K,V>
implements ConcurrentMap<K,V>, Serializable {
}
我们可以看到,类注释上说的很清楚,get方法不是阻塞式的,所以可能和put,remove方法重叠。Iterators, Spliterators 和Enumerations这些方法都最好同时只有一个线程访问。还有size(),isEmpty()和containsValue()都是好在别的线程没有更新的时候再去访问。
/**
* Maps the specified key to the specified value in this table.
* Neither the key nor the value can be null.
*
* <p>The value can be retrieved by calling the {@code get} method
* with a key that is equal to the original key.
*
* @param key key with which the specified value is to be associated
* @param value value to be associated with the specified key
* @return the previous value associated with {@code key}, or
* {@code null} if there was no mapping for {@code key}
* @throws NullPointerException if the specified key or value is null
*/
public V put(K key, V value) {
return putVal(key, value, false);
}
/** Implementation for put and putIfAbsent */
final V putVal(K key, V value, boolean onlyIfAbsent) {
if (key == null || value == null) throw new NullPointerException();
int hash = spread(key.hashCode());
int binCount = 0;
for (Node<K,V>[] tab = table;;) {
Node<K,V> f; int n, i, fh;
if (tab == null || (n = tab.length) == 0)
tab = initTable();
else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
if (casTabAt(tab, i, null,
new Node<K,V>(hash, key, value, null)))
break; // no lock when adding to empty bin
}
else if ((fh = f.hash) == MOVED)
tab = helpTransfer(tab, f);
else {
V oldVal = null;
synchronized (f) {
if (tabAt(tab, i) == f) {
if (fh >= 0) {
binCount = 1;
for (Node<K,V> e = f;; ++binCount) {
K ek;
if (e.hash == hash &&
((ek = e.key) == key ||
(ek != null && key.equals(ek)))) {
oldVal = e.val;
if (!onlyIfAbsent)
e.val = value;
break;
}
Node<K,V> pred = e;
if ((e = e.next) == null) {
pred.next = new Node<K,V>(hash, key,
value, null);
break;
}
}
}
else if (f instanceof TreeBin) {
Node<K,V> p;
binCount = 2;
if ((p = ((TreeBin<K,V>)f).putTreeVal(hash, key,
value)) != null) {
oldVal = p.val;
if (!onlyIfAbsent)
p.val = value;
}
}
}
}
if (binCount != 0) {
if (binCount >= TREEIFY_THRESHOLD)
treeifyBin(tab, i);
if (oldVal != null)
return oldVal;
break;
}
}
}
addCount(1L, binCount);
return null;
}
putVal中加了锁。
/**
* Returns the value to which the specified key is mapped,
* or {@code null} if this map contains no mapping for the key.
*
* <p>More formally, if this map contains a mapping from a key
* {@code k} to a value {@code v} such that {@code key.equals(k)},
* then this method returns {@code v}; otherwise it returns
* {@code null}. (There can be at most one such mapping.)
*
* @throws NullPointerException if the specified key is null
*/
public V get(Object key) {
Node<K,V>[] tab; Node<K,V> e, p; int n, eh; K ek;
int h = spread(key.hashCode());
if ((tab = table) != null && (n = tab.length) > 0 &&
(e = tabAt(tab, (n - 1) & h)) != null) {
if ((eh = e.hash) == h) {
if ((ek = e.key) == key || (ek != null && key.equals(ek)))
return e.val;
}
else if (eh < 0)
return (p = e.find(h, key)) != null ? p.val : null;
while ((e = e.next) != null) {
if (e.hash == h &&
((ek = e.key) == key || (ek != null && key.equals(ek))))
return e.val;
}
}
return null;
}
/**
* Tests if the specified object is a key in this table.
*
* @param key possible key
* @return {@code true} if and only if the specified object
* is a key in this table, as determined by the
* {@code equals} method; {@code false} otherwise
* @throws NullPointerException if the specified key is null
*/
public boolean containsKey(Object key) {
return get(key) != null;
}
/**
* Returns {@code true} if this map maps one or more keys to the
* specified value. Note: This method may require a full traversal
* of the map, and is much slower than method {@code containsKey}.
*
* @param value value whose presence in this map is to be tested
* @return {@code true} if this map maps one or more keys to the
* specified value
* @throws NullPointerException if the specified value is null
*/
public boolean containsValue(Object value) {
if (value == null)
throw new NullPointerException();
Node<K,V>[] t;
if ((t = table) != null) {
Traverser<K,V> it = new Traverser<K,V>(t, t.length, 0, t.length);
for (Node<K,V> p; (p = it.advance()) != null; ) {
V v;
if ((v = p.val) == value || (v != null && value.equals(v)))
return true;
}
}
return false;
}
/**
* {@inheritDoc}
*/
public int size() {
long n = sumCount();
return ((n < 0L) ? 0 :
(n > (long)Integer.MAX_VALUE) ? Integer.MAX_VALUE :
(int)n);
}
/**
* {@inheritDoc}
*/
public boolean isEmpty() {
return sumCount() <= 0L; // ignore transient negative values
}
其它的方法都没有加锁。但是ConcurrentHashMap不会抛出ConcurrentModificationException。
最后我们还有一种保证线程安全的方法,就是
List<Object> synchronizedList = Collections.synchronizedList(new ArrayList<>());
首先让我们来看下,它是如何保证线程安全的。
/**
* Returns a synchronized (thread-safe) list backed by the specified
* list. In order to guarantee serial access, it is critical that
* <strong>all</strong> access to the backing list is accomplished
* through the returned list.<p>
*
* It is imperative that the user manually synchronize on the returned
* list when iterating over it:
* <pre>
* List list = Collections.synchronizedList(new ArrayList());
* ...
* synchronized (list) {
* Iterator i = list.iterator(); // Must be in synchronized block
* while (i.hasNext())
* foo(i.next());
* }
* </pre>
* Failure to follow this advice may result in non-deterministic behavior.
*
* <p>The returned list will be serializable if the specified list is
* serializable.
*
* @param <T> the class of the objects in the list
* @param list the list to be "wrapped" in a synchronized list.
* @return a synchronized view of the specified list.
*/
public static <T> List<T> synchronizedList(List<T> list) {
return (list instanceof RandomAccess ?
new SynchronizedRandomAccessList<>(list) :
new SynchronizedList<>(list));
}
首先我们得搞清楚,几个类的关系图。
这个是最重要的,我们的List有的实现了RandomAccess 接口,比如ArrayList,有的没有,比如LinkedList。但是通过这个图片我们能清晰地看到到,SynchronizedRandomAccessList是SynchronizedList的子类。
我们来查看一下构造方法。
SynchronizedRandomAccessList(List<E> list) {
super(list);
}
SynchronizedRandomAccessList(List<E> list, Object mutex) {
super(list, mutex);
}
SynchronizedRandomAccessList是直接调用super的,也就是SynchronizedList的构造。
final List<E> list;
SynchronizedList(List<E> list) {
super(list);
this.list = list;
}
SynchronizedList(List<E> list, Object mutex) {
super(list, mutex);
this.list = list;
}
而在SynchronizedList中,将传入的List赋值给自己的成员变量list,然后又调用了super。
final Collection<E> c; // Backing Collection
final Object mutex; // Object on which to synchronize
SynchronizedCollection(Collection<E> c) {
this.c = Objects.requireNonNull(c);
mutex = this;
}
SynchronizedCollection(Collection<E> c, Object mutex) {
this.c = Objects.requireNonNull(c);
this.mutex = Objects.requireNonNull(mutex);
}
在SynchronizedCollection中有两个成员变量进行了赋值,分别是mutex和c。他们具体的作用是什么呢。
现在让我们回来看看他们是如何实现线程安全的,我们还以常用的方法为例。
public boolean equals(Object o) {
if (this == o)
return true;
synchronized (mutex) {return list.equals(o);}
}
public int hashCode() {
synchronized (mutex) {return list.hashCode();}
}
public E get(int index) {
synchronized (mutex) {return list.get(index);}
}
public E set(int index, E element) {
synchronized (mutex) {return list.set(index, element);}
}
public void add(int index, E element) {
synchronized (mutex) {list.add(index, element);}
}
public E remove(int index) {
synchronized (mutex) {return list.remove(index);}
}
public int indexOf(Object o) {
synchronized (mutex) {return list.indexOf(o);}
}
public int lastIndexOf(Object o) {
synchronized (mutex) {return list.lastIndexOf(o);}
}
public boolean addAll(int index, Collection<? extends E> c) {
synchronized (mutex) {return list.addAll(index, c);}
}
public ListIterator<E> listIterator() {
return list.listIterator(); // Must be manually synched by user
}
public ListIterator<E> listIterator(int index) {
return list.listIterator(index); // Must be manually synched by user
}
public List<E> subList(int fromIndex, int toIndex) {
synchronized (mutex) {
return new SynchronizedList<>(list.subList(fromIndex, toIndex),
mutex);
}
}
@Override
public void replaceAll(UnaryOperator<E> operator) {
synchronized (mutex) {list.replaceAll(operator);}
}
@Override
public void sort(Comparator<? super E> c) {
synchronized (mutex) {list.sort(c);}
}
非常简单,全部使用了mutex对象来加锁实现。实际操作的list也是我们传入的list。这个mutex如果我们传入就使用我们传入的,不传入就直接使用list。
Collections.synchronizedMap()
Collections.synchronizedSet()
原理完全相同,在这里就不再分析了。
针对这几种处理多线程的方法,个人推荐最后一种,但是也要注意在使用listIterator时,也是不支持多线程同时访问的。