Adaptive Runtime Exploiting Sparsity in Tensor of Deep Learning Neural Network on Heterogeneous Systems
Kuo-You Peng, Sheng-Yu Fu, Yu-Ping Liu, WeiChung Hsu

Conference

Venue

SAMOS 2017

Abstract

Deep neural networks have been widely applied in many areas, such as computer vision, natural language processing and information retrieval. However, due to the high computation and memory demands, deep learning applications have not been adopted in edge learning. In this paper, we exploit the sparsity in tensors to reduce the computation overheads and memory demands. Unlike other approaches which rely on hardware accelerator designs or sacrifice model accuracy for the performance by pruning parameters, we adaptively partition and deploy the workload to heterogeneous devices to reduce computation and memory requirements and increase computing efficiency. We had implemented our partitioning algorithms in Google's TensorFlow and evaluated on an AMD Kaveri system, which is an HSA-based heterogeneous computing system. Our method has effectively reduced the computation time, cache accesses, and cache miss rates, without impacting the accuracy of the learning models. Our approach achieves 66% and 88% speedup for the lenet-5 model and the lenet-1024-1024 model, respectively. For reducing memory traffic, our approach reduces 71% instruction cache references, 32% data cache references. Our system has also improved cache miss rate from 1.6% to 0.5% during the training of the lenet-1024-1024 model.

Author Links

1. 彭國祐
Master student
2. 傅勝余
PhD student
3. 劉聿平
Master student
4. 徐慰中
Advisor

External Links

Digital Library
Find with DOI
DBLP
Find on DBLP
Google Scholar
Search on Google Scholar

Cite This Paper

Kuo-You Peng, Sheng-Yu Fu, Yu-Ping Liu, WeiChung Hsu:
Adaptive Runtime Exploiting Sparsity in Tensor of Deep Learning Neural Network on Heterogeneous Systems SAMOS 2017
BibTex
Download BibTex (.bib)

台北市大安區羅斯福路四段1號 德田館404室
02-33664888 ext. 404