Ideally they know exactly how it will perform: Every part of the chip, including the caches, memory controller, and DRAM is implemented in a cycle accurate simulator. There are often multiple versions of that simulator, one written in C/C++ that matches the overall structure of the eventual hardware, and then simulations of the actual RTL (hardware source code, networks of gates).
The C-model and RTL model outputs are often also compared with each other as a correctness validation step, as they should ideally never diverge. (ie, implement twice, by two teams, and cross-check the results).
Those simulations are terrifically slow for larger chips, so there is a surprisingly small number of workloads that can be run through them in reasonable time. So there tend to be even more simulator implementations that sacrifice perfect performance emulation for 'good enough' performance correlation (when surprises can happen). Being able to come up with a non-exact simulator that perf-correlates with real hardware is an art in itself.
Are the C simulators hand crafted each time by the chip designer? It seems like the kind of thing that needs custom built but I’m wondering if there is a common toolset used, or platform?
The performance team usually thinks in terms of cycles. At runtime the frequency varies depending on various factors as you said, but this is mostly ignored.
The C-model and RTL model outputs are often also compared with each other as a correctness validation step, as they should ideally never diverge. (ie, implement twice, by two teams, and cross-check the results).
Those simulations are terrifically slow for larger chips, so there is a surprisingly small number of workloads that can be run through them in reasonable time. So there tend to be even more simulator implementations that sacrifice perfect performance emulation for 'good enough' performance correlation (when surprises can happen). Being able to come up with a non-exact simulator that perf-correlates with real hardware is an art in itself.