Unless Cnet has very strict testing rules, differences are likely to occur. It's also not likely that they have such strict rules, since systems can be so different. For example, some vendors provide utilities to fine-tune power settings (e.g., Sony has a utility that will sense the foreground application and tune the settings appropriately, such as set the screen to highest brightness when playing a movie); other vendors rely exclusively on the basic Windows power settings.
So, it would be difficult in such cases to ensure that the systems are set at _exactly_ the same levels. Indeed, it wouldn't even be desirable: a vendor that provides more granular settings that allow for finer control over power, and thus potentially better performance, should be rewarded for providing such capabilities, I think.
So, it seems entirely possible that one Cnet group had the 8204 at one setting when running the benchmark, while the other group had the settings at another level. Even a slight difference, say in brightness, could easily account for the delta.
Or, it could be gremlins...