NEW! HELLO: Open Source


HELLO is a new DNA variant calling tool, where we use novel DNN (Deep Neural Network) architectures and customized variant inference functions that account for the underlying nature of sequencing data. Our method allows vastly smaller DNNs to outperform the Inception-v3 architecture used in DeepVariant for indel and substitution-type variant calls. Our improved accuracy and problem-specific customization of DNN models could enable more accurate pipelines and further method development in the field. Available since 2021.

Download HELLO

  1. Anand Ramachandran, Steven Lumetta, Eric Klee, and Deming Chen, “HELLO: Improved Neural Network Architectures and Methodologies for Small Variant Calling”, BMC Bioinformatics, 2021.


NEW! ScaleHLS: Open Source


ScaleHLS is a next-generation HLS compilation flow, on top of a multi-level compiler infrastructure called MLIR. ScaleHLS is able to represent and optimize HLS designs at multiple levels of abstraction and provides an HLS-dedicated transform and analysis library to solve the optimization problems at the suitable representation levels. We also build an automated DSE engine to explore the multi-dimensional design space efficiently. ScaleHLS shows amazing quality-of-results – up to 768.1× better on computation kernel level programs and up to 3825.0× better on neural network models, compared to the unoptimized designs. Available since 2021.

Download ScaleHLS



NEW! WinoCNN: Open Source


WinoCNN combines systolic array and fast Winograd algorithm for CNN acceleration. This system supports flexible convolution kernel sizes without sacrificing DSP efficiency through various algorithmic, architecture and on-chip memory subsystem designs and optimizations. Overall, our accelerator delivers high throughput and state-of-the-art DSP efficiency compared to previous accelerator implementations. Available since 2021.

Download WinoCNN

  1. Xinheng Liu, Yao Chen, Cong Hao, Ashutosh Dhar, and Deming Chen, “WinoCNN: Kernel Sharing Winograd Systolic Array for Efficient Convolutional Neural Network Acceleration on FPGAs”, Proceedings of IEEE International Conference on Application-specific Systems, Architectures and Processors, July 2021.


NEW! TwinDNN: Open Source


TwinDNN system pairs a high-accuracy heavy-duty network with a low-latency light-weight (e.g., highly compressed) network using a hierarchical inference logic that will infer high-accuracy network when the prediction of low-latency network is not considered confident. TwinDNN can recover up to 94% of accuracy drop caused by extreme network compression, with more than 90% speedup. Available since 2021.

Download TwinDNN

  1. Hyunmin Jeong and Deming Chen, “TwinDNN: A Tale of Two Deep Neural Networks”, Proceedings of IEEE International Conference on Application-specific Systems, Architectures and Processors, July 2021.


SkyNet: Open Source


SkyNet is a new hardware-efficient DNN model specialized in object detection and tracking. SkyNet was developed based on the SkyNet Design Methodology to facilitate edge AI solutions, and demonstrated in the 56th IEEE/ACM Design Automation Conference System Design Contest (DAC-SDC), a low power object detection challenge for real-life unmanned aerial vehicle (UAV) applications. SkyNet won the First Place Award for both GPU and FPGA tracks of the contest in 2019. Available since 2019.

Download SkyNet

  1. Xiaofan Zhang, Haoming Lu, Cong Hao, Jiachen Li, Bowen Cheng, Yuhong Li, Kyle Rupnow, Jinjun Xiong, Thomas Huang, Honghui Shi, Wen-Mei Hwu, and Deming Chen, “SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems,” Proceedings of Machine Learning and Systems (MLSys 2020), March 2020.


DNNBuilder: Open Source


This package provides a novel solution that can automatically convert the Caffe trained DNN to the FPGA RTL level implementation without involving any hardware programming effort. It also provides uniform APIs to the users for their AI recognition task. The developers, without any FPGA programming experience, can deploy their FPGA accelerated deep learning services for both cloud and edge computing, only providing their trained Caffe model. The paper for DNNBuilder has won the IEEE/ACM William J. McCalla ICCAD Best Paper Award in 2018. Available since 2019.

Download DNNBuilder

  1. Xiaofan Zhang, Junsong Wang, Chao Zhu, Yonghua Lin, Jinjun Xiong, Wen-mei Hwu, and Deming Chen, “DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs”, Proceedings of IEEE/ACM International Conference on Computer-Aided Design, November 2018. (Best Paper Award)


uL2Q: Open Source


This open-source package introduces an ultra-low loss quantization (μL2Q) method that provides DNN quantization schemes based on comprehensive quantitative data analysis. μL2Q builds the transformation of the original data to a data space with standard normal distribution, and then finds the optimal parameters to minimize the loss of the quantization of a target bitwidth. Our method can deliver consistent accuracy improvements compared to the state-of-the-art quantization solutions with the same compression ratio. Available since 2019.

Download μL2Q

  1. Cheng Gong, Ye Lu, Cong Hao, Xiaofan Zhang, Tao Li, Deming Chen, and Yao Chen, “μL2Q: An Ultra-Low Loss Quantization Method for DNN Compression,” Proceedings of International Joint Conference on Neural Networks (IJCNN), July 2019.


T-DLA: Open Source


T-DLA (Ternarized Deep Learning Accelerator) is an open-source microprocessor designed specifically for accelerating DNN models trained with ternarized weights. This is the first instruction-based DLA design targeting ternary-quantized weights. The T-DLA system delivers up to 0.4 TOPS with 2.58 W power consumption. It is 873.6× and 5.1× faster on ImageNet for Resnet-18 model comparing to Xeon E5-2630 CPU and Nvidia 1080 Ti GPU respectively. Available since 2019.

Download T-DLA

  1. Yao Chen, Kai Zhang, Cheng Gong, Cong Hao, Xiaofan Zhang, Tao Li, and Deming Chen, “T-DLA: An Open-source Deep Learning Accelerator for Ternarized DNN Models on Embedded FPGA,” Proceedings of IEEE Computer Society Annual Symposium on VLSI, July 2019.


DNN IP: Open Source


This IP Package includes an open-source IP repository specifically designed for machine learning applications. The IPs include: Standard convolution IPs, Depth-wise separable convolution IPs, Pooling IPs, Bounding box regression IP, and Long-term Recurrent Convolutional Network IP. Each IP is provided with: introduction, interface description, inputs and outputs description, parameter configuration, and resource and performance. The IPs are developed in C/C++. The source code is synthesizable and RTL code can be generated conveniently using Xilinx Vivado HLS. Available since 2019.

Download DNN IPs



Thanos: Open Source


This open-source package introduces Thanos, a fast graph partitioning tool which uses the cross-decomposition algorithm that iteratively partitions a graph. It also produces balanced loads of partitions. The algorithm is well suited for parallel GPU programming which leads to fast and high-quality graph partitioning solutions. Experimental results show that we have achieved a 30x speedup and 35% better edge cut reduction compared to the CPU version of the popular graph partitioning tool METIS on average. Available since 2019.

Download Thanos

  1. Dae Hee Kim, Rakesh Nagi, and Deming Chen, “Thanos: High-Performance CPU-GPU Based Graph Partitioning Using Cross-Decomposition,” Proceedings of IEEE/ACM Asia and South Pacific Design Automation Conference, January 2020.


CLOUD-DNN Open Source


Cloud-DNN is an open-source framework that maps DNN (deep neural network) models trained by Caffe to FPGAs in the cloud for inference acceleration. It takes the input *.prototxt DNN description, generates corresponding C++ network description, and then produces the final hardware accelerator IPs through high-level synthesis. The goal of Cloud-DNN is to provide more flexible and user-friendly DNN acceleration on cloud-FPGAs (e.g., AWS F1).

Download CLOUD-DNN

  1. Yao Chen, Jiong He, Xiaofan Zhang, Cong Hao, and Deming Chen, “Cloud-DNN: An Open Framework for Mapping DNN Models to Cloud FPGAs”, Proceedings of ACM/SIGDA International Symposium on Field Programmable Gate Arrays, February 2019.


RIP Open Source


This open source project contains three inter-related software packages (fast software modeling, fast hardware modeling and design space exploration, and hardware/software co-design), for the ultimate task of automated near-optimal hardware/software partitioning targeting either sophisticated SoC designs or computing on heterogeneous systems.

Download RIP

  1. W. Zuo, W. Kemmerer, J. B. Lim, L.-N. Pochet, A. Ayupoy, T. Kim, K. Han, and D. Chen, “A polyhedral-based SystemC modeling and generation framework for effective low-power design space exploration,” Proceedings of IEEE /ACM International Conference on Computer-Aided Design, November 2015. (Best Paper Award)
  2. W. Kemmerer, W. Zuo, and D. Chen, "Parallel Code-Specific CPU Simulation with Dynamic Phase Convergence Modeling for HW/SW Co-Design", Proceedings of IEEE/ACM International Conference on Computer-Aided Design, November 2016.
  3. W. Zuo, L.-N. Pochet, A. Ayupov, T. Kim, C.-W. Lin, S. Shiraishi, and D. Chen, “Accurate High-level Modeling and Automated Hardware/Software Co-design for Effective SoC Design Space Exploration" Proceedings of IEEE/ACM Design Automation Conference, June 2017.


FCUDA Open Source


A source-to-source transformation framework that can take CUDA code, generate functionally equivalent synthesizable C code, and map to an FPGA implementation using high-level synthesis for high performance and energy-efficient reconfigurable computation.

Download FCUDA

  1. T. Nguyen, Y. Chen, K. Rupnow, S. Gurumani, and D. Chen, "SoC, NoC and Hierarchical Bus Implementations of Applications on FPGAs Using the FCUDA Flow", Proceedings of IEEE Computer Society Annual Symposium on VLSI, July 2016.
  2. Y. Chen, T. Nguyen, Y. Chen, S. T. Gurumani, Y. Liang, K. Rupnow, J. Cong, W.M. Hwu, and D. Chen, “FCUDA-HB: Hierarchical and Scalable Bus Architecture Generation on FPGAs with the FCUDA Flow,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2016.
  3. T. Nguyen, S. Gurumani, K. Rupnow, and D. Chen, “FCUDA-SoC: Platform Integration for Field-Programmable SoC with the CUDA-to-FPGA Compiler,” Proceedings of ACM/SIGDA International Symposium on Field Programmable Gate Arrays, February 2016.
  4. Y. Chen, S. T. Gurumani, Y. Liang, G. Li, D. Guo, K. Rupnow, and D. Chen, “FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow”, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2015.
  5. A. Papakonstantinou, K. Gururaj, J. Stratton, D. Chen, J. Cong, and W.M. Hwu, "Efficient Compilation of CUDA Kernels for High-Performance Computing on FPGAs," ACM Transactions on Embedded Computing Systems, Special Issue on Application-Specific Processors, Vol. 13, Issue 2, September 2013.
  6. A. Papakonstantinou, D. Chen, W.M. Hwu, J. Cong, and Y. Liang, "Throughput-oriented Kernel Porting onto FPGAs," Proceedings of IEEE/ACM Design Automation Conference, June 2013.
  7. S. Gurumani, K. Rupnow, Y. Liang, H. Cholakkail, and D. Chen, "High Level Synthesis of Multiple Dependent CUDA Kernels for FPGAs," Proceedings of IEEE/ACM Asia and South Pacific Design Automation Conference, January 2013. (Invited)
  8. S. Gurumani, J. Tolar, Y. Chen, Y. Liang, K. Rupnow, and D. Chen, "Integrated CUDA-to-FPGA Synthesis with Network-on-Chip," Proceedings of IEEE International Symposium on Field-Programmable Custom Computing Machines, May 2014.
  9. A. Papakonstantinou, K. Gururaj, J. Stratton, D. Chen, J. Cong, and W.M. Hwu, "FCUDA: Enabling Efficient Compilation of CUDA Kernels onto FPGAs," Proceedings of IEEE Symposium on Application Specific Processors, July 2009. (Best Paper Award)
  10. A. Papakonstantinou, Y. Liang, J. Stratton, K. Gururaj, D. Chen, W.M. Hwu and J. Cong, "Multilevel Granularity Parallelism Synthesis on FPGAs," Proceedings of IEEE International Symposium on Field-Programmable Custom Computing Machines, May 2011. (Best Paper Award)


H.264 High Level Synthesis Benchmark


Fully synthesizable H.264 Video Decoder code, which can be synthesized into RTL with high-level synthesis for FPGA implementation and achieve real-time decoding.

Download H.264 Benchmark

  1. X. Liu, Y. Chen, T. Nguyen, S. Gurumani, K. Rupnow, and D. Chen, “High Level Synthesis of Complex Applications: An H.264 Video Decoder”, Proceedings of ACM/SIGDA International Symposium on Field Programmable Gate Arrays, February 2016.


TMDFET SPICE Models


SPICE transistor models of flexible Transition Metal Dichalcogenide Field-Effect Transistors, TMDFET.

Download TMDFET HSPICE Models

Download TMDFET Verilog-A Models

  1. Y-Y Chen, M. Gholipour, and D. Chen, "Flexible Transition Metal Dichalcogenide Field-Effect Transistors: A Circuit-Level Simulation Study of Delay and Power under Bending, Process Variation, and Scaling," Proceedings of IEEE/ACM Asia and South Pacific Design Automation Conference, Jan. 2016.
  2. M. Gholipour, Y.Y. Chen, and D. Chen, “Compact Modeling to Device- and Circuit-Level Evaluation of Flexible TMD Field-Effect Transistors,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Volume: 37, Issue: 4, Page(s): 820 - 831, April 2018.


GNRFET HSPICE Models


HSPICE transistor models of two types of Graphene Nano-Ribbon Field-Effect Transistors, MOS-GNRFET and SB-GNRFET.

Download GNRFET HSPICE Models

  1. Y-Y. Chen, A. Rogachev, A. Sangai, G. Iannaccone, G. Fiori, and D. Chen, "A SPICE-Compatible Model of Graphene Nano-Ribbon Field-Effect Transistors Enabling Circuit-Level Delay and Power Analysis Under Process Variation," Proceedings of IEEE/ACM Design, Automation & Test in Europe, March 2013.
  2. Y-Y. Chen, A. Sangai, M. Gholipour, and D. Chen, "Schottky-Barrier-Type Graphene Nano-Ribbon Field-Effect Transistors: A Study on Compact Modeling, Process Variation, and Circuit Performance," Proceedings of IEEE/ACM International Symposium on Nanoscale Architectures, July 2013.
  3. Y-Y. Chen, A. Sangai, M. Gholipour, and D. Chen, "Graphene Nano-Ribbon Field-Effect Transistors as Future Low-Power Devices," Proceedings of IEEE/ACM International Symposium on Low Power Electronics and Design, September 2013. (Invited)
  4. Y-Y Chen, A. Sangai, M. Gholipour, and D. Chen, "Effects of Process Variation on the Circuit-Level Performance of Graphene Nano-Ribbon Field-Effect Transistors," Workshop on Variability Modeling and Characterization, November 2013.
  5. M. Gholipour, Y-Y, Chen, A. Sangai, and D. Chen, "Highly Accurate SPICE-Compatible Modeling for Single- and Double-Gate GNRFETs with Studies on Technology Scaling," Proceedings of IEEE/ACM Design, Automation & Test in Europe, March 2014.
  6. M. Gholipour, Y.Y. Chen, A. Sangai, N. Masoumi, and D. Chen, “Analytical SPICE-Compatible Model of Schottky-Barrier-Type GNRFETs With Performance Analysis”, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, March 2015.
  7. Y.Y. Chen, A. Sangai, A. Rogachev, M. Gholipour, G. Iannaccone, G. Fiori, and D. Chen, “A SPICE-Compatible Model of MOS-Type Graphene Nano-ribbon Field-Effect Transistors enabling Gate- and Circuit-level Delay and Power Analysis under Process Variation,” IEEE Transactions on Nanotechnology, Volume 14, Issue 6, pp. 1068-1082, November 2015.


BLESS


Bloom-filter-based Error Correction Tool for NGS DNA reads.

Download BLESS

  1. Y. Heo, X-L. Wu, D. Chen, J. Ma, and W-M Hwu, "BLESS: Bloom-filter-based Error Correction Solution for High throughput Sequencing Reads," Bioinformatics, 2014, doi: 10.1093/bioinformatics/btu030.
  2. Yun Heo, Anand Ramachandran, Wen-Mei Hwu, Jian Ma, Deming Chen, "BLESS 2: Accurate, memory-efficient, and fast error correction method," Bioinformatics, Volume 32, Issue 15, Pages 2369–2371, 1 August 2016. https://doi.org/10.1093/bioinformatics/btw146.