Hi! I am Ashwin Nanjappa. Welcome to my corner of the web.
I accelerate DL inference at NVIDIA with TensorRT. Prior to that I got a PhD in GPU algorithms, did a postdoc in Computer Vision and worked at an AI startup. More info can be found at my old personal website.
Rest of my stuff:
Efficient hand pose estimation from single depth images
X-periment!, Singapore Science Festival, 2014
Delaunay mesh generation using the GPU
Ashwin Nanjappa, Thanh-Tung Cao, Mingcen Gao, Meng Qi, Tiow-Seng Tan, Zhiyong Huang
Merit Award, NVIDIA Poster Contest, GPU Technology Conference 2014 South East Asia)
A GPU accelerated algorithm for 3D Delaunay triangulation
Ashwin Nanjappa, Thanh-Tung Cao, Mingcen Gao, Tiow-Seng Tan
ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (I3D), 2014
Paper, Video, Code, BibTeX, DOI
🎙️ Talks / Articles
Full-Stack Innovation Fuels Highest MLPerf Inference 2.1 Results for NVIDIA (2022-08-08)
NVIDIA Developer Blog
Getting the Best Performance on MLPerf Inference 2.0 (2022-04-06)
NVIDIA Developer Blog
GTC Connect with the Experts (2022-03-23)
Optimize Deep Learning Inference Workloads using NVIDIA TensorRT and Deploying AI Models in Production with NVIDIA Triton Inference Server
GTC Connect with the Experts session (2020-03-23)
NVIDIA TensorRT Applications: Conversational AI, Recommenders, and Object Detection
Visual Search as a Cloud Service by Large-Scale Commodity GPU Adoption (2017-03-13)
SuperComputing Frontiers 2017, Singapore
Developer stories - Ashwin Nanjappa from Singapore (2017-02-08)
Interview by Workshape.io
Hand Pose Estimation Demo Booth
Best Booth Award, A*STAR Scientific Conference (ASC) 2014
The gStar4D algorithm computes the 3D Delaunay triangulation on the GPU. The CUDA implementation of gStar4D is robust and achieves a speedup of up to 5 times over the 3D Delaunay triangulator of CGAL.
The gDel3D algorithm constructs the Delaunay Triangulation of a set of points in 3D using the GPU. The algorithm utilizes a novel combination of incremental insertion, flipping and star splaying to construct Delaunay. The CUDA implementation is robust and its runtime is 10 times faster when compared to the Delaunay triangulator of CGAL.
The gReg3D algorithm computes the 3D regular (weighted Delaunay) triangulation on the GPU. Our CUDA implementation of gReg3D is robust and achieves a speedup of up to 4 times over the 3D regular triangulator of CGAL.
I created this library of code to work offline on the assignments of Heterogenous Parallel Programming, a GPU/CUDA course offered by Coursera. Many folks chipped in and have converted this into an easy to use library for the course.