Hi! I am Ashwin Nanjappa. Welcome to my corner of the web.
I accelerate DL inference at NVIDIA with TensorRT. Prior to that I got a PhD in GPU algorithms, did a postdoc in Computer Vision and worked at an AI startup. More info can be found at my old personal website.
I write regularly, maintaining both a ✍ tech blog and a ✍ personal blog.
I 🐘 toot regularly about interesting stuff in science and technology.
Rest of my stuff:
📚 Books
Caffe2 Quick Start Guide (🛒Buy)
Packt Publishing (May 31, 2019)
Instant GLEW (🛒Buy)
Packt Publishing (July 25, 2013)
📃 Papers
Mouse pose estimation from depth images
Ashwin Nanjappa, Li Cheng, Wei Gao, Chi Xu, Adam Claridge-Chang, Zoe Bichler
Paper, arXiv
GHand: A GPU algorithm for realtime hand pose estimation using depth camera
Ashwin Nanjappa, Chi Xu, Li Cheng
Eurographics, 2015
Paper, Video, DOI
Estimate Hand Poses Efficiently from Single Depth Images
Chi Xu, Ashwin Nanjappa, Xiaowei Zhang, Li Cheng
International Journal of Computer Vision (IJCV), 2015
Paper, DOI
Real-time hand pose estimation from depth camera using GPU
Ashwin Nanjappa, Chi Xu, Li Cheng
GPU Technology Conference 2014 (South East Asia)
Poster, BibTeX
Efficient hand pose estimation from single depth images
X-periment!, Singapore Science Festival, 2014
Poster
Delaunay mesh generation using the GPU
Ashwin Nanjappa, Thanh-Tung Cao, Mingcen Gao, Meng Qi, Tiow-Seng Tan, Zhiyong Huang
Merit Award, NVIDIA Poster Contest, GPU Technology Conference 2014 South East Asia)
Poster, BibTeX
A GPU accelerated algorithm for 3D Delaunay triangulation
Ashwin Nanjappa, Thanh-Tung Cao, Mingcen Gao, Tiow-Seng Tan
ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (I3D), 2014
Paper, Video, Code, BibTeX, DOI
gHull: A GPU algorithm for 3D Convex Hull
Mingcen Gao, Thanh-Tung Cao, Ashwin Nanjappa, Tiow-Seng Tan
ACM Transactions on Mathematical Software (TOMS), 2013
Paper, Video, BibTeX, DOI
Delaunay triangulation in R³ on the GPU
PhD Thesis, National University of Singapore, 2012
Thesis, Code [1, 2], BibTeX
🎙️ Talks / Articles
Setting New Records in MLPerf Inference v3.0 with Full-Stack Optimizations for AI (2023-04-05)
NVIDIA Developer Blog
Full-Stack Innovation Fuels Highest MLPerf Inference 2.1 Results for NVIDIA (2022-08-08)
NVIDIA Developer Blog
Getting the Best Performance on MLPerf Inference 2.0 (2022-04-06)
NVIDIA Developer Blog
GTC Connect with the Experts (2022-03-23)
Optimize Deep Learning Inference Workloads using NVIDIA TensorRT and Deploying AI Models in Production with NVIDIA Triton Inference Server
GTC Connect with the Experts session (2020-03-23)
NVIDIA TensorRT Applications: Conversational AI, Recommenders, and Object Detection
Visual Search as a Cloud Service by Large-Scale Commodity GPU Adoption (2017-03-13)
SuperComputing Frontiers 2017, Singapore
Developer stories - Ashwin Nanjappa from Singapore (2017-02-08)
Interview by Workshape.io
Hand Pose Estimation Demo Booth
Best Booth Award, A*STAR Scientific Conference (ASC) 2014
💾 Code
gStar4D
The gStar4D algorithm computes the 3D Delaunay triangulation on the GPU. The CUDA implementation of gStar4D is robust and achieves a speedup of up to 5 times over the 3D Delaunay triangulator of CGAL.
gDel3D
The gDel3D algorithm constructs the Delaunay Triangulation of a set of points in 3D using the GPU. The algorithm utilizes a novel combination of incremental insertion, flipping and star splaying to construct Delaunay. The CUDA implementation is robust and its runtime is 10 times faster when compared to the Delaunay triangulator of CGAL.
gReg3D
The gReg3D algorithm computes the 3D regular (weighted Delaunay) triangulation on the GPU. Our CUDA implementation of gReg3D is robust and achieves a speedup of up to 4 times over the 3D regular triangulator of CGAL.
GPU Coursera
I created this library of code to work offline on the assignments of Heterogenous Parallel Programming, a GPU/CUDA course offered by Coursera. Many folks chipped in and have converted this into an easy to use library for the course.