Research Highlights 2012-2013

1) Title: “A SPICE-Compatible Model of Graphene Nano-Ribbon Field-Effect Transistors Enabling Circuit-Level Delay and Power Analysis under Process Variation”.

This study provides not only the first SPICE-compatible device model to evaluate graphene nano-ribbon FETs (GNRFETs) under process variation, but also important insights about how GNRFET-based circuits compare with circuits done with CMOS technology. The results provide realistic future outlook of GNRFETs and guidelines about how much future GNRFET manufacturing needs to be improved to make GNRFETs more competitive. We are releasing this new model to the public domain. This paper is published at IEEE/ACM Design, Automation & Test in Europe conference in March 2013.

2) Title: “TIGER: Tiled Iterative Genome Assembler”.

The basic idea behind TIGER is to decompose the assembly problem into smaller sub-problems so that each sub-problem can be managed more efficiently and effectively in terms of resource usage and quality result. Such a problem decomposition is designed as an iterative refinement process analogous to the conjugate-gradient iterative solvers for large systems of linear equations. Our results show that TIGER not only demonstrates great scalability dealing with large amount of NGS reads, but also outperforms the state-of-the-art assemblers for several large genomes. This is a collaborative work with Wen-Mei Hwu and Jian Ma. Several groups are already using TIGER for their assembly tasks. It is published in the journal BMC Bioinformatics in December 2012.

3) Title: “Improving High Level Synthesis Optimization Opportunity through Polyhedral Transformations”.

This work targets a critical challenge faced in high-level synthesis where data streaming across multiple communicating modules (blocks) cannot be easily generated due to complicated data access patterns for each module. Without such communicating efficiency, high-level synthesis (HLS) cannot optimize large designs effectively. We developed an integrated framework to model and enable both intra- and inter-block optimizations. This integrated technique substantially improves the opportunity to use the powerful HLS optimizations that implement parallelism, pipelining, and fine-grained communication. It is published at ACM International Symposium on Field Programmable Gate Arrays in February 2013.

4) Title: “Throughput-Oriented Kernel Porting onto FPGAs”.

This is a follow-up work of the very successful CUDA-to-FPGA compilation flow we have developed in the past three years (collaborating with Wen-Mei Hwu's group, UCLA, and ADSC). On top of the existing FCUDA framework, this new work proposed a code optimization engine which analyzes and restructures CUDA kernels (that are optimized for GPU devices) in order to facilitate synthesis of high-throughput custom accelerators on FPGA. The proposed framework enables efficient performance porting without manual code tweaking or annotation by the user, which is a major improvement over the previous FCUDA framework. It is published at IEEE/ACM Design Automation Conference in June 2013.

5) Title: “An Accurate GPU Performance Model for Effective Control Flow Divergence Optimization”.

Graphics processing units (GPUs) are increasingly critical for general-purpose parallel processing performance. Due to the SIMD (single-instruction multiple-data) execution style, GPU performance can be significantly affected if computation threads must take diverging control paths. In this work, we proposed effective GPU performance modeling methods and thread regrouping heuristics to optimize against control flow divergence. For certain applications, such optimizations can provide up to 3X performance speedup. It is published at IEEE International Parallel & Distributed Processing Symposium in May 2012.