Back in February 2013, I successfully completed the Heterogenous Parallel Programming course offered by Coursera. This course was conducted by Prof. Wen-mei Hwu of UIUC and it is based on the UIUC course that he and David Kirk, of NVIDIA, taught back when CUDA was first introduced. I had used the videos and lecture notes of this UIUC course to learn CUDA by myself a few years ago. Subsequently, in 2010 they converted their lecture notes into the book Programming Massively Parallel Processors, which was also the very first book on CUDA programming.
Since then, CUDA has been evolving rapidly in each release of its SDK and its hardware. Also, a few other parallel programming frameworks, like OpenCL and C++AMP, have sprung up to address the reality of heterogeneous computing: systems with both CPU and GPU processors. In light of all these changes, I decided to take this course offered by Coursera since it appeared that it would cover the modern features of CUDA and other heterogeneous frameworks. Prof. Wen-mei Hwu announced that the course would be based on the upcoming second edition of their book.
Typical in the style of Coursera, the course had lectures, quizzes and assignments. The lectures were 10-20 minutes long, with the ghost image of the professor standing behind the slides. This looked a bit irritating, but made good sense since he could point out to parts of the slides. A simpler alternative would have been to use a mouse pointer on the slides. The professor lectures very slowly in a monotone, so I would typically watch his lectures at 1.5x or 2x speed. In the beginning of the course, his slides were a bit sloppy, with quite a few small mistakes. But somewhere close to halfway through the course they definitely got better and professional. Quizzes were interspersed between the lectures and they had deadlines. I missed quite a few quizzes and could not retake them.
The meat of the course was the assignments. An online code editing and submission system was built for this course that could test your code on Amazon's GPU cloud systems. For each assignment, a skeleton code was provided and the student had to solve the problem by filling out the missing CUDA kernels and their invocations. The skeleton code invoked functions from an internal library that was used to check and grade the solution. The assignment problems started from very simple ones to about average difficulty towards the end of the course. The lectures leading to the assignment provided just enough information to solve the problem. As always with CUDA, debugging the solution took most of the time. Sadly, there were no real challenging assignments offered in the course.
From the very first assignment, I saw that this online assignment system was not really that great to work with. Added to that, the Amazon system had restrictions on how frequently you could run your code. Frustrated by this, I decided to recreate the setup on my own computer by mimicking their internal library. I shared this code on Github as the coursera-heterogeneous project and pretty soon many students joined in to improve it. They helped me make it work on all platforms: Windows, Linux and Mac. They also helped update the project, so that every new assignment could be tested with it. Thanks to their help, I was able to work offline on all the assignments and only do the final submission online.
Most of the course concentrated on CUDA. Towards the end of the course, there were a few lectures covering OpenCL, MPI, OpenACC and C++AMP. The lectures gave a pretty good understanding of the CUDA hardware, programming techniques and performance tricks. The assignments did not get deep enough, so you would need a bit more experience to tackle any real world problem. The course, which started in November 2012, had a big break during New Year. I was running behind schedule and it was this break that allowed me to catch up! I was finally able to finish everything and obtained a certificate of completion with distinction.
I believe that taking this course was time well spent. But, I also believe that a lot of trouble could have been avoided if the lectures, slides and the assignment submission system had been better prepared. Given that this was the first time that such a course was being done online, I would forgive these issues. I am also happy that the course gave me a chance to deal with a real Github project with bugs, issues and feature requests. This experience was quite enlightening. A few months after this course finished, Udacity offered a similar CUDA course called Intro to Parallel Programming. I have not taken it, but it does look interesting.