New AI Tools
banner

Chitu


Introduction:

Chitu is a high-performance large language model inference framework that emphasizes efficiency, flexibility, and usability, supporting models such as DeepSeek and LLaMA.









The High Performance Computing Institute of Tsinghua University Professor Jidong Zhai's team and Qingcheng Jizhi jointly open-sourced a high-performance large model inference engine named "Chitu". The core breakthrough of this engine lies in its ability to natively run FP8 precision models on non-NVIDIA Hopper architecture GPUs and various domestic AI chips, aiming to solve the problem of domestic AI chips being constrained by hardware in large model deployment, reduce deployment costs, and promote the development of the domestic AI ecosystem.

Breaking the Hardware Binding Dilemma:

  • Current leading FP8 models mainly rely on NVIDIA H-series high-end GPUs, leading to domestic enterprises facing chip import restrictions and domestic chips not supporting FP8 data types when deploying large models, resulting in high deployment costs.

  • "Chitu" achieves efficient deployment of native FP8 models on non-H card devices (including NVIDIA GPUs before the Hopper architecture and various domestic chips) through underlying technological innovation,摆脱ing dependence on specific hardware.

  • When deploying the full-capacity version of DeepSeek-R1-671B on an A800 cluster, compared to some foreign open-source frameworks, "Chitu" achieves a 3.15x increase in inference speed while reducing GPU usage by 50%.

  • Based on the Chitu engine, running FP8 models with 3 nodes achieves an output speed of about 75% to 90% of running BF16 models with 6 nodes, resulting in a 1.5x to 1.8x improvement in output per unit of computing power.