Research Highlights 2013-2014

1) Title: “BLESS: Bloom filter-based error correction solution for high-throughput sequencing reads”.

Rapid advances in next-generation sequencing (NGS) technology have led to exponential increase in the amount of genomic information. However, NGS reads contain far more errors than data from traditional sequencing methods. We developed a novel algorithm, named BLoom-filter-based Error correction Solution for high-throughput Sequencing reads (BLESS), which uses a single minimum-sized Bloom filter, allowing us to correct more errors with a 40× memory usage reduction on average compared with previous methods. After errors were corrected using BLESS, 69% of initially unaligned reads could be aligned correctly. Additionally, de novo assembly results became 50% longer with 66% fewer assembly errors. This work is published in Feb. 2014 in the journal Bioinformatics, which has an impact factor of 5.5. We are releasing this tool on line to benefit the research community.

2) Title: “Efficient GPU Spatial-Temporal Multitasking”.

The emergence of GPUs has led to proliferation of GPU-accelerated applications. As a result, many applications can be competing for access to GPU resources, and efficient utilization of the GPU resources is critical to system performance. In this work, we propose a software-hardware solution for efficient spatial-temporal multitasking. We pair an efficient heuristic in software with hardware thread-block interleaving to implement spatial-temporal multitasking. Our experiments on Fermi GTX480 demonstrate performance improvement by up to 46% (average 26%) over sequential GPU task execution and 37% (average 18%) over default concurrent multitasking. Compared with the state-of-the-art Kepler K20 using Hyper-Q technology, our technique achieves up to 40% (average 17%) performance improvement over default concurrent multitasking. This work is accepted by IEEE Transactions on Parallel and Distributed Systems, which has an impact factor of 1.8.

3) Title: “Improving polyhedral code generation for high-level synthesis”.

High-level synthesis (HLS) tools are now capable of generating high-quality RTL codes for a number of programs. The polyhedral compilation framework has shown great promise in this area with the development of HLS-specific polyhedral transformation techniques and tools. In this work we study the changes to the state-of-the-art polyhedral code generator CLooG to tailor it for HLS purposes. In particular, we develop various techniques to significantly improve resource utilization on the FPGA. We also develop techniques geared towards effective code generation of rectangularly tiled code, leading to further improvements in resource utilization. We demonstrate an area resource reduction by 2x on average (up to 10x) with high-level synthesis. This paper is published at IEEE/ACM International Conference on Hardware/Software Codesign and System Synthesis in September 2013 and won the Best Paper Award.

4) Title: “Fast and Effective Placement and Routing Directed High-Level Synthesis for FPGAs”.

Achievable frequency (fmax) is a widely used input constraint for designs targeting Field-Programmable Gate Arrays (FPGAs). However, for high-level synthesis (HLS) design flows, it is challenging to evaluate the real critical delay at the behavioral level. In this paper, we introduce a new HLS flow that integrates with Altera's Quartus II synthesis and fast placement and routing (PAR) tool to obtain realistic post-PAR delay estimates. This integration enables an iterative flow that improves the performance of the design with both behavioral-level and circuit-level optimizations using realistic delay information. We demonstrate our HLS flow produces up to 24% (on average 20%) improvement in fmax and up to 22% (on average 20%) improvement in execution latency. Furthermore, results demonstrate that our flow is able to achieve from 65% to 91% of the theoretical fmax on Stratix IV devices (550MHz). This paper is published at ACM/SIGDA International Symposium on Field Programmable Gate Arrays in February 2014.

5) Title: “CNPUF: A Carbon Nanotube-based Physically Unclonable Function for Secure Low-Energy Hardware Design”.

Physically Unclonable Functions (PUFs) are used to provide identification, authentication and secret key generation based on unique and unpredictable physical characteristics. Carbon Nanotube Field Effect Transistors (CNFETs) were shown to have excellent electrical and unique physical characteristics and are promising candidates to replace silicon transistors in future VLSI designs. We present Carbon Nanotube PUF (CNPUF), the first PUF design that takes advantage of unique CNFET characteristics. CNPUF achieves higher reliability against environmental variations and increased resistance against modeling attacks. Furthermore, CNPUF has a considerable power and energy reduction in comparison to previous ultra-low power PUF designs of 89.6% and 98%, respectively. Additionally, CNPUF allows power-security tradeoff. This work is done in collaboration with Prof. Martin Wong and is published at IEEE/ACM Asia and South Pacific Design Automation Conference in January 2014.