Our primary criterion for determining whether to operate on the benchmark is whether or not it's going to improve the general public’s idea of AI’s trajectory. We selected to work on FrontierMath simply because we believed that a demanding math benchmark would clarify the diploma to which AI is able to resolving novel and tricky reasoning issue