Its unlikely you want to do short -> int, int -> float etc, conversion during swapping (if its needed we could have a non type checking macro). Double that the optimized assembler outbut using SWAP() remains unchanged from before. This exposed quite a few places where redundant type conversion was going on. Also remove curve.c's swapdata() and replace its use with swap_v3_v3()