JVM源码分析之线程局部缓存TLAB
简书 占小狼
转载请注明原创出处,谢谢!
背景
介绍TLAB之前先思考一个问题:
创建对象时,需要在堆上申请指定大小的内存,如果同时有大量线程申请内存的话,可以通过锁机制或者指针碰撞的方式确保不会申请到同一块内存,在JVM运行中,内存分配是一个极其频繁的动作,这种方式势必会降低性能。
因此,在Hotspot 1.6的实现中引入了TLAB技术。
什么是TLAB
TLAB全称ThreadLocalAllocBuffer
,是线程的一块私有内存,如果设置了虚拟机参数 -XX:UseTLAB
,在线程初始化时,同时也会申请一块指定大小的内存,只给当前线程使用,这样每个线程都单独拥有一个Buffer,如果需要分配内存,就在自己的Buffer上分配,这样就不存在竞争的情况,可以大大提升分配效率,当Buffer容量不够的时候,再重新从Eden区域申请一块继续使用,这个申请动作还是需要原子操作的。
TLAB的目的是在为新对象分配内存空间时,让每个Java应用线程能在使用自己专属的分配指针来分配空间,均摊对GC堆(eden区)里共享的分配指针做更新而带来的同步开销。
TLAB只是让每个线程有私有的分配指针,但底下存对象的内存空间还是给所有线程访问的,只是其它线程无法在这个区域分配而已。当一个TLAB用满(分配指针top撞上分配极限end了),就新申请一个TLAB,而在老TLAB里的对象还留在原地什么都不用管——它们无法感知自己是否是曾经从TLAB分配出来的,而只关心自己是在eden里分配的。
TLAB实现
实现位于/Users/zhanjun/openjdk/hotspot/src/share/vm/memory/threadLocalAllocBuffer.hpp
// ThreadLocalAllocBuffer: a descriptor for thread-local storage used by
// the threads for allocation.
// It is thread-private at any time, but maybe multiplexed over
// time across multiple threads. The park()/unpark() pair is
// used to make it avaiable for such multiplexing.
class ThreadLocalAllocBuffer: public CHeapObj<mtThread> {
friend class VMStructs;
private:
HeapWord* _start; // address of TLAB
HeapWord* _top; // address after last allocation
HeapWord* _pf_top; // allocation prefetch watermark
HeapWord* _end; // allocation end (excluding alignment_reserve)
size_t _desired_size; // desired size (including alignment_reserve)
size_t _refill_waste_limit; // hold onto tlab if free() is larger than this
TLAB简单来说本质上就是三个指针:start,top 和 end,每个线程都会从Eden分配一大块空间,例如说100KB,作为自己的TLAB,其中 start 和 end 是占位用的,标识出 eden 里被这个 TLAB 所管理的区域,卡住eden里的一块空间不让其它线程来这里分配。而 top 就是里面的分配指针,一开始指向跟 start 同样的位置,然后逐渐分配,直到再要分配下一个对象就会撞上 end 的时候就会触发一次 TLAB refill,refill过程后续会解释。
_desired_size 是指TLAB的内存大小。
_refill_waste_limit 是指最大的浪费空间,假设为5KB,通俗一点讲就是:
1、假如当前TLAB已经分配96KB,还剩下4KB,但是现在new了一个对象需要6KB的空间,显然TLAB的内存不够了,这时可以简单的重新申请一个TLAB,原先的TLAB交给Eden管理,这时只浪费4KB的空间,在_refill_waste_limit 之内。
2、假如当前TLAB已经分配90KB,还剩下10KB,现在new了一个对象需要11KB,显然TLAB的内存不够了,这时就不能简单的抛弃当前TLAB,这11KB会被安排到Eden区进行申请。
在Java代码中执行new Thread()的时候,会触发以下代码
// The first routine called by a new Java thread
void JavaThread::run() {
// initialize thread-local alloc buffer related fields
this->initialize_tlab();
// used to test validitity of stack trace backs
this->record_base_of_stack_pointer();
// Record real stack base and size.
this->record_stack_base_and_size();
// Initialize thread local storage; set before calling MutexLocker
this->initialize_thread_local_storage();
this->create_stack_guard_pages();
this->cache_global_variables();
JavaThread的run方法中,第一步就是调用this->initialize_tlab();
方法初始化TLAB,initialize_tlab
实现如下:
void initialize_tlab() {
if (UseTLAB) {
tlab().initialize();
}
}
其中tlab()
返回的就是一个ThreadLocalAllocBuffer对象,调用initialize()
初始化TLAB,实现如下:
void ThreadLocalAllocBuffer::initialize() {
initialize(NULL, // start
NULL, // top
NULL); // end
set_desired_size(initial_desired_size());
// Following check is needed because at startup the main (primordial)
// thread is initialized before the heap is. The initialization for
// this thread is redone in startup_initialization below.
if (Universe::heap() != NULL) {
size_t capacity = Universe::heap()->tlab_capacity(myThread()) / HeapWordSize;
double alloc_frac = desired_size() * target_refills() / (double) capacity;
_allocation_fraction.sample(alloc_frac);
}
set_refill_waste_limit(initial_refill_waste_limit());
initialize_statistics();
}
1、设置当前TLAB的_desired_size,该值通过initial_desired_size()
方法计算;
2、设置当前TLAB的_refill_waste_limit,该值通过initial_refill_waste_limit()
方法计算;
3、初始化一些统计字段,如_number_of_refills、_fast_refill_waste、_slow_refill_waste、_gc_waste和_slow_allocations;
字段_desired_size的计算过程分析
size_t ThreadLocalAllocBuffer::initial_desired_size() {
size_t init_sz;
if (TLABSize > 0) {
init_sz = MIN2(TLABSize / HeapWordSize, max_size());
} else if (global_stats() == NULL) {
// Startup issue - main thread initialized before heap initialized.
init_sz = min_size();
} else {
// Initial size is a function of the average number of allocating threads.
unsigned nof_threads = global_stats()->allocating_threads_avg();
init_sz = (Universe::heap()->tlab_capacity(myThread()) / HeapWordSize) /
(nof_threads * target_refills());
init_sz = align_object_size(init_sz);
init_sz = MIN2(MAX2(init_sz, min_size()), max_size());
}
return init_sz;
}
TLABSize在argument模块中默认会设置大小为 256 * K,也可以通过JVM参数选择进行设置,不过即使设置了也会和一个最大值max_size进行比较,然后取一个较小值,其中max_size计算如下:
const size_t ThreadLocalAllocBuffer::max_size() {
// TLABs can't be bigger than we can fill with a int[Integer.MAX_VALUE].
// This restriction could be removed by enabling filling with multiple arrays.
// If we compute that the reasonable way as
// header_size + ((sizeof(jint) * max_jint) / HeapWordSize)
// we'll overflow on the multiply, so we do the divide first.
// We actually lose a little by dividing first,
// but that just makes the TLAB somewhat smaller than the biggest array,
// which is fine, since we'll be able to fill that.
size_t unaligned_max_size = typeArrayOopDesc::header_size(T_INT) +
sizeof(jint) *
((juint) max_jint / (size_t) HeapWordSize);
return align_size_down(unaligned_max_size, MinObjAlignment);
}
这里明确说明了TLAB的大小不能超过可以容纳 int[Integer.MAX_VALUE],有点疑惑,why?
字段_refill_waste_limit计算分析
size_t initial_refill_waste_limit() {
return desired_size() / TLABRefillWasteFraction;
}
计算逻辑很简单,其中TLABRefillWasteFraction
默认 64
内存分配
new一个对象,假设需要1K的大小,我们一步一步看看是如何分配的。
instanceOop instanceKlass::allocate_instance(TRAPS) {
assert(!oop_is_instanceMirror(), "wrong allocation path");
bool has_finalizer_flag = has_finalizer(); // Query before possible GC
int size = size_helper(); // Query before forming handle.
KlassHandle h_k(THREAD, as_klassOop());
instanceOop i;
i = (instanceOop)CollectedHeap::obj_allocate(h_k, size, CHECK_NULL);
if (has_finalizer_flag && !RegisterFinalizersAtInit) {
i = register_finalizer(i, CHECK_NULL);
}
return i;
}
对象的内存分配入口为instanceKlass::allocate_instance()
,通过CollectedHeap::obj_allocate()
方法在堆内存上进行分配
oop CollectedHeap::obj_allocate(KlassHandle klass, int size, TRAPS) {
debug_only(check_for_valid_allocation_state());
assert(!Universe::heap()->is_gc_active(), "Allocation during gc not allowed");
assert(size >= 0, "int won't convert to size_t");
HeapWord* obj = common_mem_allocate_init(klass, size, CHECK_NULL);
post_allocation_setup_obj(klass, obj);
NOT_PRODUCT(Universe::heap()->check_for_bad_heap_word_value(obj, size));
return (oop)obj;
}
其中common_mem_allocate_init()
方法最终会调用CollectedHeap::common_mem_allocate_noinit()
方法,实现如下:
HeapWord* CollectedHeap::common_mem_allocate_noinit(KlassHandle klass, size_t size, TRAPS) {
// Clear unhandled oops for memory allocation. Memory allocation might
// not take out a lock if from tlab, so clear here.
CHECK_UNHANDLED_OOPS_ONLY(THREAD->clear_unhandled_oops();)
if (HAS_PENDING_EXCEPTION) {
NOT_PRODUCT(guarantee(false, "Should not allocate with exception pending"));
return NULL; // caller does a CHECK_0 too
}
HeapWord* result = NULL;
if (UseTLAB) {
result = allocate_from_tlab(klass, THREAD, size);
if (result != NULL) {
assert(!HAS_PENDING_EXCEPTION,
"Unexpected exception, will result in uninitialized storage");
return result;
}
}
bool gc_overhead_limit_was_exceeded = false;
result = Universe::heap()->mem_allocate(size,
&gc_overhead_limit_was_exceeded);
根据UseTLAB的值,决定是否在TLAB上进行内存分配,如果JVM参数中没有手动取消UseTLAB,会调用allocate_from_tlab()
在TLAB上尝试分配,因为可能存在分配失败的情况,比如TLAB容量不足,看下allocate_from_tlab()
的实现:
HeapWord* CollectedHeap::allocate_from_tlab(KlassHandle klass, Thread* thread, size_t size) {
assert(UseTLAB, "should use UseTLAB");
HeapWord* obj = thread->tlab().allocate(size);
if (obj != NULL) {
return obj;
}
// Otherwise...
return allocate_from_tlab_slow(klass, thread, size);
从上述实现可以看出,先会尝试调用ThreadLocalAllocBuffer
的 allocate 方法,如果返回为空,再执行allocate_from_tlab_slow()
进行分配,从这个方法命名可以看出这是比较慢的分配路径。
ThreadLocalAllocBuffer
的 allocate 方法实现如下:
inline HeapWord* ThreadLocalAllocBuffer::allocate(size_t size) {
invariants();
HeapWord* obj = top();
if (pointer_delta(end(), obj) >= size) {
// successful thread-local allocation
#ifdef ASSERT
// Skip mangling the space corresponding to the object header to
// ensure that the returned space is not considered parsable by
// any concurrent GC thread.
size_t hdr_size = oopDesc::header_size();
Copy::fill_to_words(obj + hdr_size, size - hdr_size, badHeapWordVal);
#endif // ASSERT
// This addition is safe because we know that top is
// at least size below end, so the add can't wrap.
set_top(obj + size);
invariants();
return obj;
}
return NULL;
}
通过判断当前TLAB的剩余容量是否大于需要分配的大小,来决定分配结果,如果当前剩余容量不够,就返回NULL,表示分配失败。
慢分配allocate_from_tlab_slow()
实现如下:
HeapWord* CollectedHeap::allocate_from_tlab_slow(KlassHandle klass, Thread* thread, size_t size) {
// Retain tlab and allocate object in shared space if
// the amount free in the tlab is too large to discard.
if (thread->tlab().free() > thread->tlab().refill_waste_limit()) {
thread->tlab().record_slow_allocation(size);
return NULL;
}
// Discard tlab and allocate a new one.
// To minimize fragmentation, the last TLAB may be smaller than the rest.
size_t new_tlab_size = thread->tlab().compute_size(size);
thread->tlab().clear_before_allocation();
if (new_tlab_size == 0) {
return NULL;
}
// Allocate a new TLAB...
HeapWord* obj = Universe::heap()->allocate_new_tlab(new_tlab_size);
if (obj == NULL) {
return NULL;
}
// 删除了一些代码
thread->tlab().fill(obj, obj + size, new_tlab_size);
return obj;
}
1、如果当前TLAB的剩余容量大于浪费阈值,就不在当前TLAB分配,直接在共享的Eden区进行分配,并且记录慢分配的内存大小;
2、如果剩余容量小于浪费阈值,说明可以丢弃当前TLAB了;
3、通过allocate_new_tlab()
方法,从eden新分配一块裸的空间出来(这一步可能会失败),如果失败说明eden没有足够空间来分配这个新TLAB,就会触发YGC。
申请好新的TLAB内存之后,执行TLAB的fill()
方法,实现如下:
void ThreadLocalAllocBuffer::fill(HeapWord* start,
HeapWord* top,
size_t new_size) {
_number_of_refills++;
if (PrintTLAB && Verbose) {
print_stats("fill");
}
assert(top <= start + new_size - alignment_reserve(), "size too small");
initialize(start, top, start + new_size - alignment_reserve());
// Reset amount of internal fragmentation
set_refill_waste_limit(initial_refill_waste_limit());
}
包括下述几个动作:
1、统计refill的次数
2、初始化重新申请到的内存块
3、将当前TLAB抛弃(retire)掉,这个过程中最重要的动作是将TLAB末尾尚未分配给Java对象的空间(浪费掉的空间)分配成一个假的“filler object”(目前是用int[]作为filler object)。这是为了保持GC堆可以线性parse(heap parseability)用的。