Page 1 of 1

Cache coherency

Posted: Wed Jan 31, 2007 7:48 am
by ob1
As said before (mask of destiny), the SH2 doesn't have any mechanism for cache coherency. When a CPU writes to memory, the other CPU has no mean of knowing the data he might have in its cache has to be invalidated.
I've thought about a mechanism :
each time a CPU write to memory, it puts the address on a stack and throws an interrupt to the other CPU. The other CPU, in turn, receives the interrupt, look at the stack, and invalidate the address. Et vice-versa.
"OMG, you would say, it's damn long !!!" It is. I'm afraid it's the price we have to pay. Either we got coherency, and on each write, you interrupt the other CPU, either we got independance, as mask states, but it's long (access time, anyone ?).

But damn ... How interesting are this CPU and this architecture !!!

Posted: Wed Jan 31, 2007 9:53 am
by Stef
I think the best soluce is still to avoid working on the same piece of memory when possible ;)

Posted: Wed Jan 31, 2007 10:33 am
by ob1
For sure !
And by the way, use private RAM if the data is less than 2KB.
Enable 2-way cache (and thus, 2KB of private RAM) :

Code: Select all

	mov	CCR,r0
	mov	#$19,r1
	mov	r1,@r0	; Enable Cache, set 2-way, and purge

CCR:	dc.l	$FFFFFE92
Then, private RAM is from C000 0000h to C000 07FFh.
I don't know how fast the cache acces time is, but it certainly be under the 12 cycles required for SDRAM (OK, by 8 words). I assume cache access time is 2 cycles.

Posted: Wed Jan 31, 2007 11:33 am
by Stef
I believe cache latency is 2 or 3 cycles. In Gens i got many 32X speed inacuracies because of the different latencies (RAM / ROM ...)
Virtua Racing use a lot the internal cache. It uses it for all the 3D transformation : MAC instructions combined to internal cache, the game appears to be nicely optimised on this point ;)

Posted: Thu Feb 01, 2007 8:24 pm
by Mask of Destiny
I think triggering an interupt on each write would be a lot slower than just accessing all data with the cache turned off. Interupt handling on the SH-2 takes a lot of cycles. What would probably make more sense is to set the processors up as part of a rendering pipeline in which the first processor does the first few stages of rendering and then the second processor flushes the cache for the area of memory that the first processor rendered to and then does the last few stages of rendering. That way there is only one clearly defined point where the two processors need to pass data between them.

Re: Cache coherency

Posted: Tue Jan 20, 2009 2:59 pm
by ob1
ob1 wrote:each time a CPU write to memory, it puts the address on a stack and throws an interrupt to the other CPU. The other CPU, in turn, receives the interrupt, look at the stack, and invalidate the address. Et vice-versa.
... or I could use SCI ...

Posted: Wed Jan 21, 2009 9:00 am
by Chilly Willy
It depends on how much data you want to share. If you just have a few words, just use the COMM registers - it's what they're there for. :D

If you have more data, but it's not changed very often, SCI might be better. Any significant amount of data might be better with uncached memory - both to avoid coherency issues AND to avoid flooding the caches.

For example, Doom uses uncached access to the frame buffer (on all platforms) as it's MUCH faster due to cache issues. The difference can be from 5 to 10 times as fast using an uncached frame buffer vs a cached frame buffer.

Posted: Thu Jan 22, 2009 8:29 pm
by TMorita
Mask of Destiny wrote:I think triggering an interupt on each write would be a lot slower than just accessing all data with the cache turned off. Interupt handling on the SH-2 takes a lot of cycles. ...
.
Yes.

It's faster to do this the correct way, e.g. by just using the cache-through address space on both SH2s for shared data.

Toshi

Posted: Fri Jan 23, 2009 9:16 am
by ob1
MoD and T.Mojita said.
I'll do as.