Description
See the assembly for https://is.gd/0Owhgw . Note that you'll need to explicitly specify Release, because I can't link to Release code for some reason.
Note this bit here:
movl $400, %edx
movq %rbx, %rsi
callq memcpy@PLT
That looks like a very unnecessary and large copy.
I ran into this working on rust-selectors for stylo. In the parser, we have a very large stack-allocated SmallVec, under the assumption that it is never moved. I added some inline functions that take the buffer as |self|, and parsing got several percent slower. I rewrote the code to pass things as &mut, and the overhead went away.
Interestingly, doing:
for x in some_small_vec.into_iter() {
...
}
Doesn't have any memcpy in the disassembly, despite the fact that into_iter() takes self. Presumably some iterator optimization going on.
CC @SimonSapin @mbrubeck @Manishearth @gankro @jonathandturner