Archives by Category
Contact
- Hagen Paul Pfeifer
- http://jauu.net
- hagen@jauu.net (encrypted preferred)
- KeyId: 0x98350C22
- Telephone: +49 174 5455209
Follow this blog
CRC 16 CCITT Performance
- Published in: programming
- | Time: 00:45:23 CEST
- | SHA1: 4df1db1489ab9239be48873421c81ab91b50bda9
The CRC-16-CCITT is a cyclic redundancy check – one among many – defined and derived from polynomial x16 + x12 + x5 + 1. It is used by X.25, HDLC, XMODEM, bluetooth, phy header verification and many others. During some analysis I stumbled across the following implementation:
unsigned char ser_data; static unsigned int crc; crc = (unsigned char)(crc >> 8) | (crc << 8); crc ^= ser_data; crc ^= (unsigned char)(crc & 0xff) >> 4; crc ^= (crc << 8) << 4; crc ^= ((crc & 0xff) << 4) << 1;
The expression (crc << 8) << 4 quizzicaled me a little bit because crc<<12
seems more efficient and is mainly equivalent. The advice on the
site is to implement the above
algorithm with the two shift operation because the latter generates much more
code and executes slower! I compiled the above code fragment, disabled any
optimization and voila:
[...] shrb $4, %al movzbl %al, %eax xorw %ax, -2(%rbp) movzwl -2(%rbp), %eax sall $12, %eax movl %eax, %edx movzwl -2(%rbp), %eax xorl %edx, %eax movw %ax, -2(%rbp) movzwl -2(%rbp), %eax andl $255, %eax sall $5, %eax [...]
Independent of the compiler optimization level (-O0 .. -O7) the compiler
always generates a 12 bit shift. It is even not possible to instruct the
compiler not to do so. But why does the compiler does not optimize the
construct more aggressively? It does, if the user tell him to do so. ;-) But
these optimization are more cosmetic and restricted to reduce the instruction
set size.
[...] shrb $4, %dl movzbl %dl, %edx xorl %eax, %edx movl %edx, %eax sall $12, %eax xorl %edx, %eax movzbl %al, %edx sall $5, %edx [...]
In average the processing time on my x86_64 take 2.06076e^-05^ us per byte:
![]()