First, beware: this can only be done meaningfully within a fixed "language of proof". If you try to compare across different systems, you can get wildly different results. There is a whole domain of Proof Complexity which has lots of results about "expressivity". Adding features to your language (like `let x = <big-expression> in <other-expression>`

) can make a huge difference. Other subtler changes can still result in exponential differences in proof length.

Similarly, which compressor you use matters. Kolmogorov Complexity relies crucially on Turing-completeness to obtain a 'stable' measure of length. Normal compressors can be used to approximate it, and give surprising good results in practice. Paul Vitanyi has written a variety of papers on that topic.

Regarding a specific example, like the prime number theorem, an obvious question becomes: what results do you consider to be 'background' and which are part of the proof? The stable answer, again relying on Kolmogorov Complexity, which I explain in Understanding expression simplification [ISSAC 2004], is to in fact consider the proof length to contain **all** the previous results needed to express the proof as part of its length. Then, to compare two proofs, you subtract their lengths, which takes care of 'quotienting out' all of the material that is common. However, if the two proofs take wildly different routes through mathematics, you might need to use finer concepts such as relative complexity, i.e. you compress the proofs on their own (but with the full library), and concatenated. See the works of Vitanyi for details and variants.

I would say that we currently don't really too much about this topic. We're still in the stage where we're finding uniform ways of writing down mathematical proofs. While the existing libraries are getting large, they are still not quite large enough to do convincing data mining, although that hasn't stopped many from doing so already. But they are aware that their results are very system-relative.

Some interesting early results are available on decently large databases: some use the whole of the arxiv as their data source. I quite like Discovering Mathematical Objects of Interest—A Study of Mathematical Notations for example.