OA’s GPT-f do the job on working with GPT for MetaMath official theorem-proving notes that they use the normal GPT-2 BPE but "preliminary experimental outcomes show feasible gains with specialized tokenization techniques.
My blog -
dptotti.fic.edu.Uy