Ray-tracing has received great attention over the years due to the high demand for global illumination appliances. Due to its embarrassingly parallel characteristics, the ray-tracing algorithm has been ported to the graphics processing unit (GPU) on heterogeneous systems that run thousands of threads in a single-instruction-multiple-thread fashion. However, the irregularity of ray-tracing causes a performance penalty on the GPU. The control flow divergence and early-termination problems severely degrade the hardware utilization, which makes the GPU computation inefficient while traversing through each iteration of the algorithm. Furthermore, additional overheads caused by data marshalling and load unbalancing negate the benefits of using heterogeneous systems. To tackle these issues, we designed a pipeline-based runtime methodology that leverages the features of heterogeneous system architecture (HSA)-compliant heterogeneous frameworks, such as shared virtual memory and fast kernel dispatching. This method merges the workloads from different iteration stages and dispatches them simultaneously. The merged workload is further assigned to a heterogeneous queue to enhance load balancing and scalability. With the proposed technologies, the performance of ray-tracing is enhanced significantly while effectively increasing the utilization of HSA-compliant heterogeneous systems. Based on the experiment results, the throughput becomes 4.37 times greater than the original setup on average in a single GPU mode and would always yield a greater throughput with a heterogeneous queue on multiple cores.
02-33664888 ext. 404