It's interesting to note that the algorithm can be rewritten with 2 nested loops instead of 3; for each z and x, check that there exists an integer y. At 10,000 triples found, this is the difference between (on my system) 700ms and 42000ms, a factor of 60.