A retargetable MLIR-based machine learning compiler and runtime toolkit.
-
Updated
Jun 12, 2024 - C++
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
A retargetable MLIR-based machine learning compiler and runtime toolkit.
Implementations of various simulations for integrate and fire models, as well as conductance based models with synaptic neurotransmission
Open Voice OS Status Page
A high-throughput and memory-efficient inference and serving engine for LLMs
CEED Library: Code for Efficient Extensible Discretizations
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
High performance CUDA/Python library for computing quantum chemistry density-based descriptors for larger systems using GPUs.
NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes
PygmalionAI's large-scale inference engine
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
A high-performance inference system for large language models, designed for production environments.
CUDA C++ Core Libraries
(in progress) SAH kd-tree parallel construction algorithm implementation
Celeritas is a new Monte Carlo transport code designed to accelerate scientific discovery in high energy physics by improving detector simulation throughput and energy efficiency using GPUs.
HPC solver for nonlinear optimization problems
Created by Nvidia
Released June 23, 2007