EDIT: Nvidia's advertised TFLOPS are:
FP16 FP32 FP64 V100 30 15 8.5 P100 21.2 10.6 5.3 K40 4.29 4.29 1.43
That's a core that does 4x4 FP16 matrix multiplication + 4x4 FP32 accumulation in one go.
That's where V100 gets its boost, up to 120 TFLOPS.
EDIT: Nvidia's advertised TFLOPS are: