This is getting out of hand!!

I think it is important to look at 2 things we have learnt.
1) There is no appreciable performance difference between the P-M and the A64. Each has its own strong suit. The P-M seems to be slightly faster, though I firmly believe the A64 will trounce it if and when XP64 finally rears its ugly head.
2) Benchmarking is not an exact science. Benchmarks work by performing a bunch of calculations (rendering, floating point, whatever) many times, and calculating (based on some points system) how well the machine accomplished the tests. There are some obvious flaws with this approach.
i) How can anyone know if the scoring system the benchmark uses is appropriate for more than a comparison with itself? In other words, does a 10% better score in Aquamark necessarily transfer to a 10% better framerate, or is the score only meaningful in the Aquamark benchmark.
ii) Testing CPUs is not obvious. CPUs are optimized in different ways, as are the programs which test them. I have stated this many times... but once more... the A64 destroys any CPU for compiling. I *think* that is because it has more registers. The Dothan does very well at heavy load user apps (Photoshop, Cinema4D, etc...), these benefit from its 2MB L2 cache. Thus, when you write a CPU benchmark, you can NEVER test whether a CPU is always better than another one.
iii) Consider the software platform you are testing the hardware on. I guarantee that Windows XP (nor any modern OS) does not provide a stable environment for testing. At any given time the OS can decide to swap the benchmarking thread out of the CPU to do something else (defrag system memory, check page table consistency, evaluate other threads for execution, etc...). Worse than that, the complexity of modern OSes makes it such that you cannot assume that these random actions will be the same, or even equivalent each time the machine is booted, and the test is run. When supercomputers are tested, they are usually benchmarked on an OS which can guarantee no external interferance, and thus a valid result.
iv) Assuming that i through iii are of no effect, we have problems keeping a consistent software test environment. Different drivers, and OS patches can make a difference in testing.
So, all that to say, quit this benchmarking hard-on, be satisfied with the non-perfect world of FPS testing. Sure, the software is not the same, the GPUs might be clocked differently etc... but for all intents and purposes it is as close as you will get to perfect. And at the end of the day, if there were a large performance delta between the systems, it would show up in a noticible (at play time) FPS difference.