13 Mar 2006 (Mon)

07:41:11 # Life How costly is a shared library function call compared to a static library function call? Linux/glibc/ELF shared library function calls are said to be slower than static library function calls, because function calls a redirected through plt/got section, which jumps to addresses specified by the offset in GOT. The background for this indirection is that ELF specification implies that load address of a shared library is not pre-determined, and addresses are decided after load. To make this happen, allowing write to the whole text segment (where all the code is, including jump addresses and other things) will disallow read-only mmap of executable binary. That means backing store is required in terms of swap space. To minimise the effect of modified jump instructions, the modifiable jump instructions are split off into another section, which is writable all the time, called GOT. (this is my understanding, if you find any corrections, please send me a mail, so I can secretly update this page). Anyway, there's an extra jump instruction involved for shared library function calls. I made a short micro-benchmark and measured the time required to make sure that was the case. I called function call for 1G times on a 500MHz powerpc. Apparently, it takes 2 clock-cycles of overhead.

shared library static link
real 17.646 13.746
user 17.005 13.233
system 0.024 0.018

The branch instruction written to plt section was like this (powerpc):

0x10011b70 <function_in_lib@plt+0>:	b       0xffdf590 <function_in_lib>
	

I've done the same test on Athlon 64 2.2GHz, running it for 2.2G times (2200000000). It was also using about 2 clocks.

shared library static link
real 8.185 6.142
user 8.168 6.125
system 0.000 0.002

Happy with the results, I measured the profile information with oprofile. The result was a bit unexpected. Why? To be continued... (maybe)

shared library static link
function_in_lib 61% 25%
main 28% 66%
.plt 4% -

Ian Wienand pointed out to me that GOT is the writable part, not PLT. Apparently on my powerpc system, PLT seems to be modified, but on amd64, PLT is just a table of jumps, and GOT is the table of addresses. He pointed me to this excellent article. Of course, I should have looked at my own article which talked about GOTs.

I've checked, and it looks like the plt section is modified in powerpc, I'm not too sure right now.

(gdb) disassemble 0x10011b70
Dump of assembler code for function function_in_lib@plt:
0x10011b70 <function_in_lib@plt+0>:	li      r11,4
0x10011b74 <function_in_lib@plt+4>:	b       0x10011b40 <_GLOBAL_OFFSET_TABLE_+44>
End of assembler dump.
(gdb) cont
Continuing.

Breakpoint 2, main (ac=1, av=<value optimized out>) at main.c:50
(gdb) disassemble 0x10011b70
Dump of assembler code for function function_in_lib@plt:
0x10011b70 <function_in_lib@plt+0>:	b       0xffdf590 <function_in_lib>
0x10011b74 <function_in_lib@plt+4>:	b       0x10011b40 <_GLOBAL_OFFSET_TABLE_+44>
End of assembler dump.
	
Junichi Uekawa

$Id: dancer-diary.el,v 1.90 2006/01/31 11:16:16 dancer Exp $