I did this once - a compacting collector for C. it was a really trivial mark/sweep - and as I recall you had to write functions for each type to perform the object graph traversal. for artificial memory-intensive workloads it was around 8x the performance of boehm.