NEW! DNNBuilder: Open Source


This package provides a novel solution that can automatically convert the Caffe trained DNN to the FPGA RTL level implementation without involving any hardware programming effort. It also provides uniform APIs to the users for their AI recognition task. The developers, without any FPGA programming experience, can deploy their FPGA accelerated deep learning services for both cloud and edge computing, only providing their trained Caffe model. The paper for DNNBuilder has won the IEEE/ACM William J. McCalla ICCAD Best Paper Award in 2018. Available since 2019.

Download DNNBuilder

  1. Xiaofan Zhang, Junsong Wang, Chao Zhu, Yonghua Lin, Jinjun Xiong, Wen-mei Hwu, and Deming Chen, “DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs”, Proceedings of IEEE/ACM International Conference on Computer-Aided Design, November 2018. (Best Paper Award)


NEW! uL2Q: Open Source


This open-source package introduces an ultra-low loss quantization (μL2Q) method that provides DNN quantization schemes based on comprehensive quantitative data analysis. μL2Q builds the transformation of the original data to a data space with standard normal distribution, and then finds the optimal parameters to minimize the loss of the quantization of a target bitwidth. Our method can deliver consistent accuracy improvements compared to the state-of-the-art quantization solutions with the same compression ratio. Available since 2019.

Download μL2Q

  1. Cheng Gong, Ye Lu, Cong Hao, Xiaofan Zhang, Tao Li, Deming Chen, and Yao Chen, “μL2Q: An Ultra-Low Loss Quantization Method for DNN Compression,” Proceedings of International Joint Conference on Neural Networks (IJCNN), July 2019.


NEW! T-DLA: Open Source


T-DLA (Ternarized Deep Learning Accelerator) is an open-source microprocessor designed specifically for accelerating DNN models trained with ternarized weights. This is the first instruction-based DLA design targeting ternary-quantized weights. The T-DLA system delivers up to 0.4 TOPS with 2.58 W power consumption. It is 873.6× and 5.1× faster on ImageNet for Resnet-18 model comparing to Xeon E5-2630 CPU and Nvidia 1080 Ti GPU respectively. Available since 2019.

Download T-DLA

  1. Yao Chen, Kai Zhang, Cheng Gong, Cong Hao, Xiaofan Zhang, Tao Li, and Deming Chen, “T-DLA: An Open-source Deep Learning Accelerator for Ternarized DNN Models on Embedded FPGA,” Proceedings of IEEE Computer Society Annual Symposium on VLSI, July 2019.


NEW! DNN IP: Open Source


This IP Package includes an open-source IP repository specifically designed for machine learning applications. The IPs include: Standard convolution IPs, Depth-wise separable convolution IPs, Pooling IPs, Bounding box regression IP, and Long-term Recurrent Convolutional Network IP. Each IP is provided with: introduction, interface description, inputs and outputs description, parameter configuration, and resource and performance. The IPs are developed in C/C++. The source code is synthesizable and RTL code can be generated conveniently using Xilinx Vivado HLS. Available since 2019.

Download DNN IPs



NEW! Thanos: Open Source


This open-source package introduces Thanos, a fast graph partitioning tool which uses the cross-decomposition algorithm that iteratively partitions a graph. It also produces balanced loads of partitions. The algorithm is well suited for parallel GPU programming which leads to fast and high-quality graph partitioning solutions. Experimental results show that we have achieved a 30x speedup and 35% better edge cut reduction compared to the CPU version of the popular graph partitioning tool METIS on average. Available since 2019.

Download Thanos

  1. Dae Hee Kim, Rakesh Nagi, and Deming Chen, “Thanos: High-Performance CPU-GPU Based Graph Partitioning Using Cross-Decomposition,” Proceedings of IEEE/ACM Asia and South Pacific Design Automation Conference, January 2020.


CLOUD-DNN Open Source


Cloud-DNN is an open-source framework that maps DNN (deep neural network) models trained by Caffe to FPGAs in the cloud for inference acceleration. It takes the input *.prototxt DNN description, generates corresponding C++ network description, and then produces the final hardware accelerator IPs through high-level synthesis. The goal of Cloud-DNN is to provide more flexible and user-friendly DNN acceleration on cloud-FPGAs (e.g., AWS F1).

Download CLOUD-DNN

  1. Yao Chen, Jiong He, Xiaofan Zhang, Cong Hao, and Deming Chen, “Cloud-DNN: An Open Framework for Mapping DNN Models to Cloud FPGAs”, Proceedings of ACM/SIGDA International Symposium on Field Programmable Gate Arrays, February 2019.


RIP Open Source


This open source project contains three inter-related software packages (fast software modeling, fast hardware modeling and design space exploration, and hardware/software co-design), for the ultimate task of automated near-optimal hardware/software partitioning targeting either sophisticated SoC designs or computing on heterogeneous systems.

Download RIP

  1. W. Zuo, W. Kemmerer, J. B. Lim, L.-N. Pochet, A. Ayupoy, T. Kim, K. Han, and D. Chen, “A polyhedral-based SystemC modeling and generation framework for effective low-power design space exploration,” Proceedings of IEEE /ACM International Conference on Computer-Aided Design, November 2015. (Best Paper Award)
  2. W. Kemmerer, W. Zuo, and D. Chen, "Parallel Code-Specific CPU Simulation with Dynamic Phase Convergence Modeling for HW/SW Co-Design", Proceedings of IEEE/ACM International Conference on Computer-Aided Design, November 2016.
  3. W. Zuo, L.-N. Pochet, A. Ayupov, T. Kim, C.-W. Lin, S. Shiraishi, and D. Chen, “Accurate High-level Modeling and Automated Hardware/Software Co-design for Effective SoC Design Space Exploration" Proceedings of IEEE/ACM Design Automation Conference, June 2017.


FCUDA Open Source


A source-to-source transformation framework that can take CUDA code, generate functionally equivalent synthesizable C code, and map to an FPGA implementation using high-level synthesis for high performance and energy-efficient reconfigurable computation.

Download FCUDA

  1. T. Nguyen, Y. Chen, K. Rupnow, S. Gurumani, and D. Chen, "SoC, NoC and Hierarchical Bus Implementations of Applications on FPGAs Using the FCUDA Flow", Proceedings of IEEE Computer Society Annual Symposium on VLSI, July 2016.
  2. Y. Chen, T. Nguyen, Y. Chen, S. T. Gurumani, Y. Liang, K. Rupnow, J. Cong, W.M. Hwu, and D. Chen, “FCUDA-HB: Hierarchical and Scalable Bus Architecture Generation on FPGAs with the FCUDA Flow,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2016.
  3. T. Nguyen, S. Gurumani, K. Rupnow, and D. Chen, “FCUDA-SoC: Platform Integration for Field-Programmable SoC with the CUDA-to-FPGA Compiler,” Proceedings of ACM/SIGDA International Symposium on Field Programmable Gate Arrays, February 2016.
  4. Y. Chen, S. T. Gurumani, Y. Liang, G. Li, D. Guo, K. Rupnow, and D. Chen, “FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow”, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2015.
  5. A. Papakonstantinou, K. Gururaj, J. Stratton, D. Chen, J. Cong, and W.M. Hwu, "Efficient Compilation of CUDA Kernels for High-Performance Computing on FPGAs," ACM Transactions on Embedded Computing Systems, Special Issue on Application-Specific Processors, Vol. 13, Issue 2, September 2013.
  6. A. Papakonstantinou, D. Chen, W.M. Hwu, J. Cong, and Y. Liang, "Throughput-oriented Kernel Porting onto FPGAs," Proceedings of IEEE/ACM Design Automation Conference, June 2013.
  7. S. Gurumani, K. Rupnow, Y. Liang, H. Cholakkail, and D. Chen, "High Level Synthesis of Multiple Dependent CUDA Kernels for FPGAs," Proceedings of IEEE/ACM Asia and South Pacific Design Automation Conference, January 2013. (Invited)
  8. S. Gurumani, J. Tolar, Y. Chen, Y. Liang, K. Rupnow, and D. Chen, "Integrated CUDA-to-FPGA Synthesis with Network-on-Chip," Proceedings of IEEE International Symposium on Field-Programmable Custom Computing Machines, May 2014.
  9. A. Papakonstantinou, K. Gururaj, J. Stratton, D. Chen, J. Cong, and W.M. Hwu, "FCUDA: Enabling Efficient Compilation of CUDA Kernels onto FPGAs," Proceedings of IEEE Symposium on Application Specific Processors, July 2009. (Best Paper Award)
  10. A. Papakonstantinou, Y. Liang, J. Stratton, K. Gururaj, D. Chen, W.M. Hwu and J. Cong, "Multilevel Granularity Parallelism Synthesis on FPGAs," Proceedings of IEEE International Symposium on Field-Programmable Custom Computing Machines, May 2011. (Best Paper Award)


H.264 High Level Synthesis Benchmark


Fully synthesizable H.264 Video Decoder code, which can be synthesized into RTL with high-level synthesis for FPGA implementation and achieve real-time decoding.

Download H.264 Benchmark

  1. X. Liu, Y. Chen, T. Nguyen, S. Gurumani, K. Rupnow, and D. Chen, “High Level Synthesis of Complex Applications: An H.264 Video Decoder”, Proceedings of ACM/SIGDA International Symposium on Field Programmable Gate Arrays, February 2016.


TMDFET SPICE Models


SPICE transistor models of flexible Transition Metal Dichalcogenide Field-Effect Transistors, TMDFET.

Download TMDFET HSPICE Models

Download TMDFET Verilog-A Models

  1. Y-Y Chen, M. Gholipour, and D. Chen, "Flexible Transition Metal Dichalcogenide Field-Effect Transistors: A Circuit-Level Simulation Study of Delay and Power under Bending, Process Variation, and Scaling," Proceedings of IEEE/ACM Asia and South Pacific Design Automation Conference, Jan. 2016.
  2. M. Gholipour, Y.Y. Chen, and D. Chen, “Compact Modeling to Device- and Circuit-Level Evaluation of Flexible TMD Field-Effect Transistors,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. To appear.


GNRFET HSPICE Models


HSPICE transistor models of two types of Graphene Nano-Ribbon Field-Effect Transistors, MOS-GNRFET and SB-GNRFET.

Download GNRFET HSPICE Models

  1. Y-Y. Chen, A. Rogachev, A. Sangai, G. Iannaccone, G. Fiori, and D. Chen, "A SPICE-Compatible Model of Graphene Nano-Ribbon Field-Effect Transistors Enabling Circuit-Level Delay and Power Analysis Under Process Variation," Proceedings of IEEE/ACM Design, Automation & Test in Europe, March 2013.
  2. Y-Y. Chen, A. Sangai, M. Gholipour, and D. Chen, "Schottky-Barrier-Type Graphene Nano-Ribbon Field-Effect Transistors: A Study on Compact Modeling, Process Variation, and Circuit Performance," Proceedings of IEEE/ACM International Symposium on Nanoscale Architectures, July 2013.
  3. Y-Y. Chen, A. Sangai, M. Gholipour, and D. Chen, "Graphene Nano-Ribbon Field-Effect Transistors as Future Low-Power Devices," Proceedings of IEEE/ACM International Symposium on Low Power Electronics and Design, September 2013. (Invited)
  4. Y-Y Chen, A. Sangai, M. Gholipour, and D. Chen, "Effects of Process Variation on the Circuit-Level Performance of Graphene Nano-Ribbon Field-Effect Transistors," Workshop on Variability Modeling and Characterization, November 2013.
  5. M. Gholipour, Y-Y, Chen, A. Sangai, and D. Chen, "Highly Accurate SPICE-Compatible Modeling for Single- and Double-Gate GNRFETs with Studies on Technology Scaling," Proceedings of IEEE/ACM Design, Automation & Test in Europe, March 2014.
  6. M. Gholipour, Y.Y. Chen, A. Sangai, N. Masoumi, and D. Chen, “Analytical SPICE-Compatible Model of Schottky-Barrier-Type GNRFETs With Performance Analysis”, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, March 2015.
  7. Y.Y. Chen, A. Sangai, A. Rogachev, M. Gholipour, G. Iannaccone, G. Fiori, and D. Chen, “A SPICE-Compatible Model of MOS-Type Graphene Nano-ribbon Field-Effect Transistors enabling Gate- and Circuit-level Delay and Power Analysis under Process Variation,” IEEE Transactions on Nanotechnology, Volume 14, Issue 6, pp. 1068-1082, November 2015.


BLESS


Bloom-filter-based Error Correction Tool for NGS DNA reads.

Download BLESS

  1. Y. Heo, X-L. Wu, D. Chen, J. Ma, and W-M Hwu, "BLESS: Bloom-filter-based Error Correction Solution for High throughput Sequencing Reads," Bioinformatics, 2014, doi: 10.1093/bioinformatics/btu030.
  2. Yun Heo, Anand Ramachandran, Wen-Mei Hwu, Jian Ma, Deming Chen, "BLESS 2: Accurate, memory-efficient, and fast error correction method," Bioinformatics 2016, doi: 10.1093/bioinformatics/btw146