Archives by Category
Contact
- Hagen Paul Pfeifer
- http://jauu.net
- hagen@jauu.net (encrypted preferred)
- KeyId: 0x98350C22
- Telephone: +49 174 5455209
Follow this blog
SIMD++, SSE4 and SIMD16
- Published in: programming
- | Time: 00:23:11 CEST
- | SHA1: fa2a11e548db85dba39e9f0179b17669f56a98c8
The SSE4 programming reference is out – a opportunity to study the improvements.
Besides 47 (plus 7 additional for the nehalem microarchitecture) new instructions which mainly focus on multimedia acceleration. MPSADBW (Sum of Absolute Differences), PHMINPOSUW Minimum Search (find minimum uint16_t from eight elements) (if you invite the source you had an fast max() ;-), ROUND (round floating point types) and other instructions too.
Dot product matrix calculation, load hint instruction (MOVNTDQA) to store aligned data in a small data-set, packed integer format conversions (convert in wider data types), IEEE 754 Compliance operations.
A nice giveaway are also the string instructions:
- max 16byte
- explicit length declaration or NULL termination
- string compare function support PCMPxSTRx)
- strlen()
The nehalem microarchitecture also support:
- CRC32
- POPCNT – for searching bit pattern
Recently I read a paper about the increased register width of 512bit for SIMD instructions (called SIMD16). More direct addressable register are also planned. Intel will introduce themin Larrabee.