Different data types can be packed in vector registers as follows
|
Packed data type |
Vector length |
Bits per element |
Data type range |
| signed bytes |
16 |
8 |
-2**7 to 2**7-1 |
| unsigned bytes |
16 |
8 |
0 до 2**8-1 |
| signed words |
8 |
16 |
-2**15 to 2**15-1 |
| unsigned words |
8 |
16 |
0 до 2**16 |
| signed doublewords |
4 |
32 |
-2**31 to 2**31-1 |
| unsigned doublewords |
4 |
32 |
0 до 2**32-1 |
| signed quadwords |
2 |
64 |
-2**63 to 2*63-1 |
| unsigned quadwords |
2 |
64 |
0 до 2**64-1 |
|
single-precision fps |
4 |
32 |
2**-126 to 2**127 |
| double-precision fps |
2 |
64 |
2**-1022 to 2**1023 |
Selecting the appropriate data type for calculations can significantly affect application performance.
Optimization with Switches
Рис.
5.5.
SIMD – SSE, SSE2, SSE3, SSE4.2 Support
Instruction groups
Data movement instructions
| Instruction |
Suffix |
Description |
| movdqa |
|
move double quadword aligned |
| movdqu |
|
move double quadword unaligned |
| mova |
[ ps, pd ] |
move floating-point aligned |
| movu |
[ ps, pd ] |
move floating-point unaligned |
| movhl |
[ ps ] |
move packed floating-point high to low |
| movlh |
[ ps ] |
move packed floating-point low to high |
| movh |
[ ps, pd ] |
move high packed floating-point |
| movl |
[ ps, pd ] |
move low packed floating-point |
| mov |
[ d, q, ss, sd ] |
move scalar data |
| lddqu |
|
load double quadword unaligned |
| Mov <d/sh/sl>dup |
|
move and duplicate |
| pextr |
[ w ] |
extract word |
| pinstr |
[ w ] |
insert word |
| pmovmsk |
[ b ] |
move mask |
| movmsk |
[ ps, pd ] |
move mask |
An aligned data movement instruction cannot be applied to the memory location which is not aligned by 16 (bytes).
Intel arithmetic instructions
| Instruction |
Suffix |
Description |
| padd |
[ b, w, d, q ] |
packed addition (signed and unsigned) |
| psub |
[ b, w, d, q ] |
packed subtraction (signed and unsigned) |
| padds |
[ b, w ] |
packed addition with saturation (signed) |
| paddus |
[ b, w ] |
packed addition with saturation (unsigned) |
| psubs |
[ b, w ] |
packed subtraction with saturation (signed) |
| psubus |
[ b, w ] |
packed subtraction with saturation (unsigned) |
| pmins |
[ w ] |
packed minimum (signed) |
| pminu |
[ b ] |
packed minimum (unsigned) |
| pmaxs |
[ w ] |
packed maximum (signed) |
| pmaxu |
[ b ] |
packed maximum (unsigned) |
Floating-point arithmetic instructions
| Instruction |
Suffix |
Description |
| add |
[ ss, ps, sd, pd ] |
addition |
| div |
[ ss, ps, sd, pd ] |
division |
| min |
[ ss, ps, sd, pd ] |
minimum |
| max |
[ ss, ps, sd, pd ] |
maximum |
| mul |
[ ss, ps, sd, pd ] |
multiplication |
| sqrt |
[ ss, ps, sd, pd ] |
square root
|
| sub |
[ ss, ps, sd, pd ] |
subtraction |
| rcp |
[ ss, ps] |
approximated reciprocal
|
| rsqrt |
[ ss, ps] |
approximated reciprocal square root
|
Idiomatic arithmetic instructions
| Instruction |
Suffix |
Description |
| pang |
[ b, w ] |
packed average with rounding (unsigned) |
| pmulh/pmulhu/pmull |
[ w ] |
packed multiplication |
| psad |
[ bw ] |
packed sum of absolute differences (unsigned) |
| pmadd |
[ wd ] |
packed multiplication and addition (signed) |
| addsub |
[ ps, pd ] |
floating-point addition/subtraction |
| hadd |
[ ps, pd ] |
floating-point horizontal addition |
| hsub |
[ ps, pd ] |
floating-point horizontal subtraction |
Logical instructions
| Instruction |
Suffix |
Description |
| pand |
|
bitwise logical AND
|
| pandn |
|
bitwise logical AND-NOT
|
| por |
|
bitwise logical OR
|
| pxor |
|
bitwise logical XOR
|
| and |
[ ps, pd ] |
bitwise logical AND
|
| andn |
[ ps, pd ] |
bitwise logical AND-NOT
|
| or |
[ ps, pd ] |
bitwise logical OR
|
| xor |
[ ps, pd ] |
bitwise logical XOR
|
Comparison instructions :
Таблица
.
| Instruction |
Suffix |
Description |
| pcmp<cc> |
[ b, w, d ] |
packed compare |
| cmp<cc> |
[ ss, ps, sd, pd ] |
floating-point compare |
<cc> defines comparison operation.
lt – less, gt – greater, eq - equal
Conversion instructions
| Instruction |
Suffix |
Description |
| packss |
[wb, dw] |
pack with saturation (signed) |
| paсkus |
[wb] |
pack with saturation (unsigned) |
| cvt<s2d> |
|
conversion |
| cvtt<s2d> |
|
conversion with truncation |
Shift instructions
| Instruction |
Suffix |
Description |
| psll |
[ w, d, q, dq ] |
shift left logical (zero in) |
| psra |
[w, d] |
shift right arithmetic (sign in) |
| psrl |
[ w, d, q, dq ] |
shift right logical (zero in) |
Shuffle instructions
| Instruction |
Suffix |
Description |
| pshuf |
[ w, d ] |
packed shuffle |
| pshufh |
[w] |
packed shuffle high |
| pshufl |
[w] |
packed shuffle low |
| ырга |
[ ps, pd ] |
shuffle |
Unpack instructions
| Instruction |
Suffix |
Description |
| punpckh |
[bw, wd, dq, qdq] |
unpack high |
| punpckl |
[bw, wd, dq, qdq] |
unpack low |
| unpckh |
[ps, pd] |
unpack high |
| unpckl |
[ps, pd] |
unpack low |
Cacheability control and prefetch instructions
| Instruction |
Suffix |
Description |
| movnt |
[ ps, pd, q, dq ] |
move aligned non-temporal
|
| prefetch<hint> |
|
prefetch with hint |
State management instructions
These instructions are commonly used by operating system.