Ashwin Nanjappa

I am a post-doctoral research fellow at the A*STAR Bioinformatics Institute in Singapore. I work with Prof. Cheng Li on GPU-accelerated machine learning algorithms for solving computer vision problems using commodity depth cameras like XBox Kinect.

I obtained my Ph.D. at the National University of Singapore (NUS) under the guidance of Prof. Tan Tiow Seng. For my PhD thesis, I created the very first GPU-accelerated computational geometry algorithms for solving 3D Delaunay triangulation and regular triangulation problems. My CUDA implementations are highly optimized and are currently the fastest (with over 10x speedup over CGAL) and are proven to deliver stable and robust results.


My research focuses on designing GPU-accelerated algorithms for solving problems in machine vision and 3D computational geometry. I particularly like solutions that have elegance in both algorithm and source code. My source code implementations are highly optimized to utilize the complex massively parallel GPU architecture to the fullest.

Mouse pose estimation

Demo of 3D mouse pose estimation shows input depth image and estimated pose in top and side views

I am developing a realtime algorithm for full-body pose estimation of mouse using depth images. It estimates poses of a 24-joint simplified mouse model in realtime, including the spine, limbs and paws. It works with different depth cameras and types of rodents, thus enabling neuroscientists to study behavorial phenotyping.


  • Mouse pose estimation from depth images
    (joint work with Li Cheng, Wei Gao, Chi Xu, Adam Claridge-Chang and Zoe Bichler)

Hand pose estimation

Demo of GHand system
ghand science festival
GHand booth at Singapore Science Festival 2014. Our system was tried by hundreds of visitors.

GHand is a GPU-accelerated algorithm developed for realtime hand pose estimation from depth cameras like Kinect, Primesense or SoftKinetic. It can estimate full 3D hand pose with an average joint error of 20mm. It runs fully on the GPU with a realtime performance of 64FPS.

GHand has been demoed successfully in conferences to researchers and in science festivals to the public. It has been found to work well in different camera setups for hands of all shapes, colors and sizes with no prior calibration.


  • GHand: A GPU algorithm for realtime hand pose estimation using depth camera
    (joint work with Chi Xu and Li Cheng)
    Eurographics, 2015

  • Estimate Hand Poses Efficiently from Single Depth Images
    (joint work with Chi Xu, Xiaowei Zhang and Li Cheng)
    International Journal of Computer Vision (IJCV), 2015

  • Real-time hand pose estimation from depth camera using GPU
    (joint work with Chi Xu and Li Cheng)
    GPU Technology Conference 2014 (South East Asia)


  • Hand Pose Estimation Demo Booth
    Best Booth Award, A*STAR Scientific Conference (ASC) 2014

  • Efficient hand pose estimation from single depth images
    X-periment!, Singapore Science Festival, 2014

Delaunay triangulation


For my PhD thesis, I developed GPU-accelerated algorithms for 3D Delaunay triangulation and 3D regular triangulation. The overarching ideas are to maximize utilization of the massively parallel resources in GPU by dualizing discrete Voronoi, fixing 4D convex hull using star splaying and parallel insertion and fixing of triangulation. Implementations of these algorithms in CUDA are highly optimized and the runtimes are 5-10x faster when compared to the venerable CGAL.


  • GeomGPU: Algorithms of computational geometry on the GPU
    (joint work with Thanh-Tung Cao, Mingcen Gao, Meng Qi and Tiow-Seng Tan)
    Book website
    (Work in progress)

  • Delaunay mesh generation using the GPU
    (joint work with Thanh-Tung Cao, Mingcen Gao, Meng Qi, Tiow-Seng Tan and Zhiyong Huang)
    Merit Award, NVIDIA Poster Contest,
    GPU Technology Conference 2014 (South East Asia)

  • A GPU accelerated algorithm for 3D Delaunay triangulation
    (joint work with Thanh-Tung Cao, Mingcen Gao and Tiow-Seng Tan)
    ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (I3D), 2014

  • gHull: A GPU algorithm for 3D Convex Hull
    (joint work with Mingcen Gao, Thanh-Tung Cao and Tiow-Seng Tan)
    ACM Transactions on Mathematical Software (TOMS), 2013

  • Delaunay triangulation in R³ on the GPU
    PhD Thesis, National University of Singapore, 2012
    Thesis • Code [1, 2] • BibTeX


Source code from my research, my PhD and other projects can be found at Github here. Some of the popular ones are listed here:


The gStar4D algorithm computes the 3D Delaunay triangulation on the GPU. The CUDA implementation of gStar4D is robust and achieves a speedup of up to 5 times over the 3D Delaunay triangulator of CGAL.

The gStar4D algorithm uses neighbourhood information in the 3D digital Voronoi diagram as an approximation of the 3D Delaunay triangulation. It uses this to perform massively parallel creation of stars of each input point lifted to 4D and employs an unique star splaying approach to splay these 4D stars in parallel and make them consistent. The result is the 3D Delaunay triangulation of the input constructed fully on the GPU.


The gDel3D algorithm constructs the Delaunay Triangulation of a set of points in 3D using the GPU. The algorithm utilizes a novel combination of incremental insertion, flipping and star splaying to construct Delaunay. The CUDA implementation is robust and its runtime is 10 times faster when compared to the Delaunay triangulator of CGAL.


The gReg3D algorithm computes the 3D regular (weighted Delaunay) triangulation on the GPU. Our CUDA implementation of gReg3D is robust and achieves a speedup of up to 4 times over the 3D regular triangulator of CGAL.

The gReg3D algorithm extends the star splaying concepts of the gStar4D and gDel3D algorithms to construct the 3D regular (weighted Delaunay) triangulation on the GPU. This algorithm allows stars to die, finds their death certificate and uses methods to propagate this information to other stars efficiently. The result is the 3D regular triangulation of the input computed fully on the GPU.

GPU Coursera

I created this library of code to work offline on the assignments of Heterogenous Parallel Programming, a GPU/CUDA course offered by Coursera. Many folks chipped in and have converted this into an easy to use library for the course.