Yeah. On the other hand, "implement boruvkas MST algorithm in cuda such that only the while(numcomponents > 1) loop runs on the CPU, and everything else runs in the gpu. Memcpy everything onto the gpu first and only transfer back the count each iteration/keep it in pinned memory"
It never gets it right, even after many reattempts in cursor. And even if it gets it right, it doesn't do the parallelization effectively enough - it's a hard problem to parallelize.
Quality of CS/Software Engineering programs vary that much.