Neoload yaml nlp

There's no symbol associated with that call target, so I guess you didn't compile with debug info enabled. There is a a function call inside the timed region, callq 403c0b (as well as the _kmpc_end_serialized_parallel stuff).

See FLOPS per cycle for sandy-bridge and haswell SSE2/AVX/AVX2 Your Skylake-derived CPU can actually do 2x 4-wide SIMD double-precision FMA operations per core per clock, and each FMA counts as two FLOPs, so theoretical max = 16 double-precision FLOPs per core clock, so 24 * 16 = 384 GFLOP/S.

_tag_value_Z12do_timed_runRKmRd.281:ġ FP operation per core clock cycle would be pathetic for a modern superscalar CPU. _tag_value_Z12do_timed_runRKmRd.279:Ĭall _kmpc_end_serialized_parallel #123.5