Archives by Category
Contact
- Hagen Paul Pfeifer
- http://jauu.net
- hagen@jauu.net (encrypted preferred)
- KeyId: 0x98350C22
- Telephone: +49 174 5455209
Follow this blog
CPU Cycles versus Cacheline Miss
- Published in: programming
- | Time: 22:44:24 CEST
- | SHA1: ed5270619a581e39dfa90db4432100871b564bca
Optimization changed over time – optimization today does not address CPU
cycle/instruction, rather reduced memory transaction are the key to success.
Keep the code small and keep the code in the cacheline and reduce memory loads.
A cache miss is equivalent to 100 instructions during a CPU stall. Therefore
keeping the .text footprint small can boost your application more then some
sophisticated CPU tweaks. Pahole provides a feeling how the .data segment is
constructed, how structs are aligned in memory, it provides information if
holes in structs exists and how the linker align the data at boundaries. This
scratches the surface of optimization techniques and the kernel uses highly
sophisticated techniques to optimize for memory transactions: false sharing of
elements which are used mainly readonly (put it in another .data section),
align data on different cachelines to avoid false sharing, UNinline functions
to reduce the memory footprint and so on.
One of the best books to understand this kind of optimization is called “UNIX Systems for Modern Architectures – Symmetric Multiprocessing and Caching for Kernel Programmers” from Curt Schimmel. This is by far the best book in this area. Beside this the optimization manuals from AMD and INTEL are worth to read it too, because they provide a in deep understanding about the actual processor specific tweaks.