A loop calling nanosleep() to wait for the next interval, with the process using a real-time scheduler policy (SCHED_FIFO or SCHED_RR) should be stable enough.
I wonder, could DMA transfer be used in one of the newer parallel port modes? Or is there not a predefined signaling speed, making sample rate unpredictable?
https://archive.org/donate/