SSE2 Instruction cycles? back

(L) [2006/09/13] [Michael77] [SSE2 Instruction cycles?] Wayback!

Hi,


does anyone know a reference of how many cpu cycle each sse instruction needs (in the best case of course). I am currently wondering, if it makes any difference to use _mm_load1_ps or _mm_set_ps1 to set a single float value to all registers.


Michael
(L) [2006/09/13] [tbp] [SSE2 Instruction cycles?] Wayback!

A full load is faster, if only because of reduced dependencies and load-excute mechanisms. But then if that read is postponed for some reason...

Plus you can't say exactly what will be generated for a given  _mm_set_ps1 because it's a composite and compilers can do as they fancy (ie writing a full vector beforehand and loading it).

Sooooo... if you want to scrutinize what's happening at this level, disassemble.


[LINK http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25112.PDF]

[LINK http://www.agner.org/optimize/]

No direct link to Intel's doc, they keep moving it around.
_________________
May you live in interesting times.

[LINK https://gna.org/projects/radius/ radius] | [LINK http://ompf.org/ ompf] | [LINK http://ompf.org/wiki/ WompfKi]

back