Recent trends in SIMD architecture have tended toward longer vector lengths. However, legacy applications compiled with short-SIMD ISA cannot benefit from long-SIMD architectures which support improved parallelism, resulting in only a small fraction of potential performance. This paper presents a dynamic binary translation technique that enables short-SIMD binaries to exploit benefits of new SIMD architectures by rewriting short-SIMD codes. We propose a general approach which translates short-SIMD loops to machine-independent IR, conducts SIMD transformation/optimization at this IR level, and finally translates to long-SIMD instructions. Benchmark results show that average speedups of 1.59X/2.82X are achieved for NEON to AVX2/AVX-512 loop transformation.
02-33664888 ext. 404