I'll try to explain what Madzombie was saying, because he's right.
The CAS latency measures the number of RAM clock cycles taken to perform one particular operation. CAS4 = 4 clock cycles to complete that operation.
With DDR2-533 (which runs at 266.66Mhz), each clock cycle takes 1/266,666,666 seconds. With DDR2-667 (which runs at 333Mhz), each clock cycle takes 1/333,333,333 seconds.
An operation requiring four clock cycles with DDR2-533 RAM will clearly take 4/266,000,000 seconds = 1/66,666,666 seconds. An operation requiring five clock cycles with DDR2-667 RAM will take 5/333,000,000 seconds = 1/66,666,666 seconds. Essentially, both operations take the same time. It takes more clock cycles on DDR2-667, but those clock cycles happen faster.
It's a lot like CPUs. A P4 3Ghz and an A64 3000+ take about the same amount of time to do most things. Each operation takes far more clock cycles on the P4 - but it's running 1.2Ghz faster. The overall speed is the same as the A64 (less clock cycles per operation, but each clock cycle takes longer to complete).
That said, it's a bit irrelevant because of another technical detail: Merom's FSB can transfer 5333.33MB/s of data. Dual-channel DDR2-533 can easily provide 8533.33MB/s of data. The bottleneck here is Merom's FSB. Even if you put in DDR2-10000 (80GB/s), data can still only get to and from the CPU at 5.33GB/s.
The only times you're likely to see benefits from DDR2-667 are:
(1) If you're running in single-channel.
(2) With a HyperMemory or TurboCache video card. GMA950 doesn't count because it's too slow to make full use of system RAM anyway.
(3) Response times. I'm not so sure about why this is, but my desktop system definitely felt much more responsive when the RAM was running in-sync with the FSB (it's a Pentium-D 805, so the FSB was 533Mhz. Using DDR266 RAM in single-channel felt quicker than DDR333 in single-channel, even though the system really could have used the extra bandwidth there).