Exploiting Longer SIMD Lanes in Dynamic Binary Translation.
Ding-Yong Hong, Sheng-Yu Fu, Yu-Ping Liu, Jan-Jan Wu, Wei-Chung Hsu

Conference
Best paper award

Venue

ICPADS 2016

Abstract

Recent trends in SIMD architecture have tended toward longer vector lengths and more enhanced SIMD features have been introduced in the newer vector instruction sets. However, legacy or proprietary applications compiled with short-SIMD ISA cannot benefit from the long-SIMD architecture, which supports improved parallelism and enhanced vector primitives, and thus only achieve a small fraction of potential peak performance. This paper presents a dynamic binary translation technique that enables short-SIMD binaries to exploit the benefits of the new SIMD architecture by rewriting short-SIMD loop code. We propose a general approach that translates loops consisting of short-SIMD instructions to machine-independent IR, conducts SIMD loop transformation/optimization at this IR level, and finally translates to long-SIMD instructions. Two solutions are presented to enforce SIMD load/store alignment, one for the problem caused by the binary translator's internal translation condition and one general approach using loop peeling optimization. The benchmark results show that an average speedup of 1.45X is achieved for NEON to AVX2 loop transformation.

Author Links

2. 傅勝余
PhD student
3. 劉聿平
Master student
5. 徐慰中
Advisor

External Links

Digital Library
Find with DOI
DBLP
Find on DBLP
Google Scholar
Search on Google Scholar

Cite This Paper

Ding-Yong Hong, Sheng-Yu Fu, Yu-Ping Liu, Jan-Jan Wu, Wei-Chung Hsu:
Exploiting Longer SIMD Lanes in Dynamic Binary Translation. ICPADS 2016
BibTex
Download BibTex (.bib)

台北市大安區羅斯福路四段1號 德田館404室
02-33664888 ext. 404