About IFunc

2015-01-07  本文已影响212人  太麻烦了不取了

[
** Updated 02/03/2015 **
The patch to implement IFUNC for arm is submitted here - https://sourceware.org/ml/binutils/2015-01/msg00258.html
]

Scenario

A nasty bug happens in the IFUNC implementation, so write down what I understand for IFUNC for future reference.

IFunc is nothing advanced, it is merely a trick to choose, usually depending on cpu features, a certain function implementation version, the decision is not made before every function invocation, but just once right before binary execution.

A typical usage would be to select one of the following memcpy implementation for a certain hardware.

The naive way

<pre><code>
void* memcpy(source, dest, size)
{
cpu_features = get_cpu_feture();
if (cpu_has_neon(cpu_features))
return memcpy_neon(source, dest, size);
else if(cpu_has_vfp(cpu_features))
return memcpy_vfp(source, dest, size);
return memcpy_generic_arm(source, dest, size);
}
</code></pre>

Which apparently incurs big performance penalty, the same logic executes for every memcpy invocation.

The ifunc way

IFunc comes in rescue for this scenario - defines a memcpy resolve function, instead of doing actual work, returning a function pointer, depending on a certain logic, in which the actual work will be done. Mark memcpy as a ifunc with resolver set to the aforementioned “memcpy resolver” like below.

<pre><code>
void *memcpy (void *, const void *, size_t)
attribute ((ifunc ("resolve_memcpy")));

// Returns a function pointer
static void (resolve_memcpy (void)) (void)
{
cpu_features = xx; /
for arm, r0 is preset to the the cpu feature value. */

if (cpu_has_neon(cpu_features))
return &memcpy_neon;
else if(cpu_has_vfp(cpu_features))
return &memcpy_vfp;
return &memcpy_generic_arm;
}
</code></pre>

The big difference from “the naive way” is that resolve_memcpy is guaranteed to be called only and exactly once, and that is before main execution (usually in __start).

Implementation

Compiler side

Whenever seeing a “__attribute((ifunc(...))”, mark the function symbol as “IFUNC” in the symbol table, that’s it, simple enough.

Static linker side

[
** Updated 02/03/2015 ** – notice, arm and aarch64 has some slightly different implementation here. For aarch64, the resolve function address is encoded in addend field of a relocation, while for arm, the address is written into the got entry.

<pre><code>
// This is aarch64 implementation - aarch64/dl-irel.h
if (__glibc_likely (r_type == R_AARCH64_IRELATIVE))
{
// the resolve function address is encoded in addend field.
ElfW(Addr) value = elf_ifunc_invoke (reloc->r_addend);
*reloc_addr = value;
}

// This is arm implementation – arm/dl-irel.h
if (__builtin_expect (r_type == R_ARM_IRELATIVE, 1))
{
// the resolve function address in written into the relocation address (the got entry)
Elf32_Addr value = elf_ifunc_invoke (*reloc_addr);
*reloc_addr = value;
}
</code></pre>

This example is based on arm implementation.
]

Whenever seeing a call to an ifunc, the linker does these 3 things -

For example -
<pre><code>
memcpy_pltentry:
0 add r12, pc, #4
4 add r12, r12, #0
8 ldr pc, [r12, #0] // transfer pc to 2000, the content of [12]

memcpy_gotentry:
12 2000 // Attach an IRELATIVE relocation here.

a_routine:
1000 b 0 // call memcpy via plt,
// 0 is the address of memcpy_pltentry
...

memcpy_resolver:
2000 mov r0, 3000
bx lr

memcpy_neon:
3000 ...

memcpy_vfp:
4000 ...

memcpy_generic_arm:
5000 ...
</code></pre>

Right before executing main

glibc will iterative all IRELATIVE relocations, for each such relocation it

All later invocation to memcpy goes to memcpy_neon, and memcpy_resolver will ** never be called again**.

After step 1,2, the above memory layout becomes -

<pre><code>
memcpy_pltentry:
0 add r12, pc, #4
4 add r12, r12, #0
8 ldr pc, [r12, #0] // transfer pc to 3000 now,
// the content of [12]

memcpy_gotentry:
12 3000 // 3000 is the value returned by memcpy_resolver.

a_routine:
1000 b 0 // call memcpy via plt,
// 0 is the address of memcpy_pltentry
...

memcpy_resolver:
2000 mov r0, 3000
bx lr

memcpy_neon:
3000 ...

memcpy_vfp:
4000 ...

memcpy_generic_arm:
5000 ...
</code></pre>

上一篇下一篇

猜你喜欢

热点阅读