I am a post-doctoral research fellow at the A*STAR Bioinformatics Institute in Singapore. I work with Prof. Cheng Li on GPU-accelerated machine learning algorithms for solving computer vision problems using commodity depth cameras like XBox Kinect.
I obtained my Ph.D. at the National University of Singapore (NUS) under the guidance of Prof. Tan Tiow Seng. For my PhD thesis, I created the very first GPU-accelerated computational geometry algorithms for solving 3D Delaunay triangulation and regular triangulation problems. My CUDA implementations are highly optimized and are currently the fastest (with over 10x speedup over CGAL) and are proven to deliver stable and robust results.
My research focuses on designing GPU-accelerated algorithms for solving problems in machine vision and 3D computational geometry. I particularly like solutions that have elegance in both algorithm and source code. My source code implementations are highly optimized to utilize the complex massively parallel GPU architecture to the fullest.
Mouse pose estimation
I am developing a realtime algorithm for full-body pose estimation of mouse using depth images. It estimates poses of a 24-joint simplified mouse model in realtime, including the spine, limbs and paws. It works with different depth cameras and types of rodents, thus enabling neuroscientists to study behavorial phenotyping.
Hand pose estimation
GHand is a GPU-accelerated algorithm developed for realtime hand pose estimation from depth cameras like Kinect, Primesense or SoftKinetic. It can estimate full 3D hand pose with an average joint error of 20mm. It runs fully on the GPU with a realtime performance of 64FPS.
GHand has been demoed successfully in conferences to researchers and in science festivals to the public. It has been found to work well in different camera setups for hands of all shapes, colors and sizes with no prior calibration.
For my PhD thesis, I developed GPU-accelerated algorithms for 3D Delaunay triangulation and 3D regular triangulation. The overarching ideas are to maximize utilization of the massively parallel resources in GPU by dualizing discrete Voronoi, fixing 4D convex hull using star splaying and parallel insertion and fixing of triangulation. Implementations of these algorithms in CUDA are highly optimized and the runtimes are 5-10x faster when compared to the venerable CGAL.
GeomGPU: Algorithms of computational geometry on the GPU
(joint work with Thanh-Tung Cao, Mingcen Gao, Meng Qi and Tiow-Seng Tan)
(Work in progress)
Delaunay mesh generation using the GPU
(joint work with Thanh-Tung Cao, Mingcen Gao, Meng Qi, Tiow-Seng Tan and Zhiyong Huang)
Merit Award, NVIDIA Poster Contest,
GPU Technology Conference 2014 (South East Asia)
Poster • BibTeX
A GPU accelerated algorithm for 3D Delaunay triangulation
(joint work with Thanh-Tung Cao, Mingcen Gao and Tiow-Seng Tan)
ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (I3D), 2014
Paper • Video • Code • BibTeX • DOI
Source code from my research, my PhD and other projects can be found at Github here. Some of the popular ones are listed here:
The gStar4D algorithm computes the 3D Delaunay triangulation on the GPU. The CUDA implementation of gStar4D is robust and achieves a speedup of up to 5 times over the 3D Delaunay triangulator of CGAL.
The gStar4D algorithm uses neighbourhood information in the 3D digital Voronoi diagram as an approximation of the 3D Delaunay triangulation. It uses this to perform massively parallel creation of stars of each input point lifted to 4D and employs an unique star splaying approach to splay these 4D stars in parallel and make them consistent. The result is the 3D Delaunay triangulation of the input constructed fully on the GPU.
The gDel3D algorithm constructs the Delaunay Triangulation of a set of points in 3D using the GPU. The algorithm utilizes a novel combination of incremental insertion, flipping and star splaying to construct Delaunay. The CUDA implementation is robust and its runtime is 10 times faster when compared to the Delaunay triangulator of CGAL.
The gReg3D algorithm computes the 3D regular (weighted Delaunay) triangulation on the GPU. Our CUDA implementation of gReg3D is robust and achieves a speedup of up to 4 times over the 3D regular triangulator of CGAL.
The gReg3D algorithm extends the star splaying concepts of the gStar4D and gDel3D algorithms to construct the 3D regular (weighted Delaunay) triangulation on the GPU. This algorithm allows stars to die, finds their death certificate and uses methods to propagate this information to other stars efficiently. The result is the 3D regular triangulation of the input computed fully on the GPU.
I created this library of code to work offline on the assignments of Heterogenous Parallel Programming, a GPU/CUDA course offered by Coursera. Many folks chipped in and have converted this into an easy to use library for the course.