Avatar
むやみにUnrollしてコードサイズ増やすとCPUキャッシュのヒット率が下がって、むしろ遅くなるのでそういうしきい値が入ってるんですけど、今回のRustの場合だと展開後の定数畳み込みと、それによるdead code eliminatinでうまくハマってる感じだと思います。 (edited)
6:04 AM
perf-statの結果 $ perf stat -d -d -- ./target.rust-run 5000000 Performance counter stats for './target.rust-run 5000000': 187.63 msec task-clock # 0.994 CPUs utilized 20 context-switches # 0.107 K/sec 0 cpu-migrations # 0.000 K/sec 85 page-faults # 0.453 K/sec 864088142 cycles # 4.605 GHz (32.07%) 385088 stalled-cycles-frontend # 0.04% frontend cycles idle (34.18%) 421948 stalled-cycles-backend # 0.05% backend cycles idle (36.31%) 1486120161 instructions # 1.72 insn per cycle # 0.00 stalled cycles per insn (38.44%) 18768109 branches # 100.027 M/sec (40.55%) 29184 branch-misses # 0.16% of all branches (42.36%) 543690464 L1-dcache-loads # 2897.670 M/sec (40.26%) 19859 L1-dcache-load-misses # 0.00% of all L1-dcache hits (38.12%) <not supported> LLC-loads <not supported> LLC-load-misses 141810 L1-icache-loads # 0.756 M/sec (35.99%) 2004 L1-icache-load-misses # 1.41% of all L1-icache hits (33.87%) 78 dTLB-loads # 0.416 K/sec (31.97%) 21 dTLB-load-misses # 26.92% of all dTLB cache hits (31.96%) 0 iTLB-loads # 0.000 K/sec (31.97%) 18 iTLB-load-misses # 0.00% of all iTLB cache hits (31.96%) 0.188802738 seconds time elapsed 0.188263000 seconds user 0.000000000 seconds sys
6:04 AM
$ perf stat -d -d -- ./head.swift-run 5000000 Performance counter stats for './head.swift-run 5000000': 260.12 msec task-clock # 0.996 CPUs utilized 26 context-switches # 0.100 K/sec 0 cpu-migrations # 0.000 K/sec 71 page-faults # 0.273 K/sec 1200493971 cycles # 4.615 GHz (30.86%) 451884 stalled-cycles-frontend # 0.04% frontend cycles idle (32.41%) 132 stalled-cycles-backend # 0.00% backend cycles idle (33.94%) 2904262641 instructions # 2.42 insn per cycle # 0.00 stalled cycles per insn (35.46%) 217096810 branches # 834.616 M/sec (37.00%) 10523 branch-misses # 0.00% of all branches (38.42%) 675633481 L1-dcache-loads # 2597.435 M/sec (38.42%) 10807 L1-dcache-load-misses # 0.00% of all L1-dcache hits (38.42%) <not supported> LLC-loads <not supported> LLC-load-misses 119631 L1-icache-loads # 0.460 M/sec (38.43%) 3361 L1-icache-load-misses # 2.81% of all L1-icache hits (38.41%) 6382 dTLB-loads # 0.025 M/sec (36.86%) 116 dTLB-load-misses # 1.82% of all dTLB cache hits (35.33%) 417 iTLB-loads # 0.002 M/sec (33.80%) 176 iTLB-load-misses # 42.21% of all iTLB cache hits (32.25%) 0.261201137 seconds time elapsed 0.260805000 seconds user 0.000000000 seconds sys