Different data types can be packed in vector registers as follows
Packed data type |
Vector length |
Bits per element |
Data type range |
signed bytes |
16 |
8 |
-2**7 to 2**7-1 |
unsigned bytes |
16 |
8 |
0 до 2**8-1 |
signed words |
8 |
16 |
-2**15 to 2**15-1 |
unsigned words |
8 |
16 |
0 до 2**16 |
signed doublewords |
4 |
32 |
-2**31 to 2**31-1 |
unsigned doublewords |
4 |
32 |
0 до 2**32-1 |
signed quadwords |
2 |
64 |
-2**63 to 2*63-1 |
unsigned quadwords |
2 |
64 |
0 до 2**64-1 |
single-precision fps |
4 |
32 |
2**-126 to 2**127 |
double-precision fps |
2 |
64 |
2**-1022 to 2**1023 |
Selecting the appropriate data type for calculations can significantly affect application performance.
Optimization with Switches
Рис.
5.5.
SIMD – SSE, SSE2, SSE3, SSE4.2 Support
Instruction groups
Data movement instructions
Instruction |
Suffix |
Description |
movdqa |
|
move double quadword aligned |
movdqu |
|
move double quadword unaligned |
mova |
[ ps, pd ] |
move floating-point aligned |
movu |
[ ps, pd ] |
move floating-point unaligned |
movhl |
[ ps ] |
move packed floating-point high to low |
movlh |
[ ps ] |
move packed floating-point low to high |
movh |
[ ps, pd ] |
move high packed floating-point |
movl |
[ ps, pd ] |
move low packed floating-point |
mov |
[ d, q, ss, sd ] |
move scalar data |
lddqu |
|
load double quadword unaligned |
Mov <d/sh/sl>dup |
|
move and duplicate |
pextr |
[ w ] |
extract word |
pinstr |
[ w ] |
insert word |
pmovmsk |
[ b ] |
move mask |
movmsk |
[ ps, pd ] |
move mask |
An aligned data movement instruction cannot be applied to the memory location which is not aligned by 16 (bytes).
Intel arithmetic instructions
Instruction |
Suffix |
Description |
padd |
[ b, w, d, q ] |
packed addition (signed and unsigned) |
psub |
[ b, w, d, q ] |
packed subtraction (signed and unsigned) |
padds |
[ b, w ] |
packed addition with saturation (signed) |
paddus |
[ b, w ] |
packed addition with saturation (unsigned) |
psubs |
[ b, w ] |
packed subtraction with saturation (signed) |
psubus |
[ b, w ] |
packed subtraction with saturation (unsigned) |
pmins |
[ w ] |
packed minimum (signed) |
pminu |
[ b ] |
packed minimum (unsigned) |
pmaxs |
[ w ] |
packed maximum (signed) |
pmaxu |
[ b ] |
packed maximum (unsigned) |
Floating-point arithmetic instructions
Instruction |
Suffix |
Description |
add |
[ ss, ps, sd, pd ] |
addition |
div |
[ ss, ps, sd, pd ] |
division |
min |
[ ss, ps, sd, pd ] |
minimum |
max |
[ ss, ps, sd, pd ] |
maximum |
mul |
[ ss, ps, sd, pd ] |
multiplication |
sqrt |
[ ss, ps, sd, pd ] |
square root
|
sub |
[ ss, ps, sd, pd ] |
subtraction |
rcp |
[ ss, ps] |
approximated reciprocal
|
rsqrt |
[ ss, ps] |
approximated reciprocal square root
|
Idiomatic arithmetic instructions
Instruction |
Suffix |
Description |
pang |
[ b, w ] |
packed average with rounding (unsigned) |
pmulh/pmulhu/pmull |
[ w ] |
packed multiplication |
psad |
[ bw ] |
packed sum of absolute differences (unsigned) |
pmadd |
[ wd ] |
packed multiplication and addition (signed) |
addsub |
[ ps, pd ] |
floating-point addition/subtraction |
hadd |
[ ps, pd ] |
floating-point horizontal addition |
hsub |
[ ps, pd ] |
floating-point horizontal subtraction |
Logical instructions
Instruction |
Suffix |
Description |
pand |
|
bitwise logical AND
|
pandn |
|
bitwise logical AND-NOT
|
por |
|
bitwise logical OR
|
pxor |
|
bitwise logical XOR
|
and |
[ ps, pd ] |
bitwise logical AND
|
andn |
[ ps, pd ] |
bitwise logical AND-NOT
|
or |
[ ps, pd ] |
bitwise logical OR
|
xor |
[ ps, pd ] |
bitwise logical XOR
|
Comparison instructions :
Таблица
.
Instruction |
Suffix |
Description |
pcmp<cc> |
[ b, w, d ] |
packed compare |
cmp<cc> |
[ ss, ps, sd, pd ] |
floating-point compare |
<cc> defines comparison operation.
lt – less, gt – greater, eq - equal
Conversion instructions
Instruction |
Suffix |
Description |
packss |
[wb, dw] |
pack with saturation (signed) |
paсkus |
[wb] |
pack with saturation (unsigned) |
cvt<s2d> |
|
conversion |
cvtt<s2d> |
|
conversion with truncation |
Shift instructions
Instruction |
Suffix |
Description |
psll |
[ w, d, q, dq ] |
shift left logical (zero in) |
psra |
[w, d] |
shift right arithmetic (sign in) |
psrl |
[ w, d, q, dq ] |
shift right logical (zero in) |
Shuffle instructions
Instruction |
Suffix |
Description |
pshuf |
[ w, d ] |
packed shuffle |
pshufh |
[w] |
packed shuffle high |
pshufl |
[w] |
packed shuffle low |
ырга |
[ ps, pd ] |
shuffle |
Unpack instructions
Instruction |
Suffix |
Description |
punpckh |
[bw, wd, dq, qdq] |
unpack high |
punpckl |
[bw, wd, dq, qdq] |
unpack low |
unpckh |
[ps, pd] |
unpack high |
unpckl |
[ps, pd] |
unpack low |
Cacheability control and prefetch instructions
Instruction |
Suffix |
Description |
movnt |
[ ps, pd, q, dq ] |
move aligned non-temporal
|
prefetch<hint> |
|
prefetch with hint |
State management instructions
These instructions are commonly used by operating system.