Cache coherency

Ask anything your want about the 32X Mushroom programming.

Moderator: BigEvilCorporation

Post Reply
ob1
Very interested
Posts: 468
Joined: Wed Dec 06, 2006 9:01 am
Location: Aix-en-Provence, France

Cache coherency

Post by ob1 »

As said before (mask of destiny), the SH2 doesn't have any mechanism for cache coherency. When a CPU writes to memory, the other CPU has no mean of knowing the data he might have in its cache has to be invalidated.
I've thought about a mechanism :
each time a CPU write to memory, it puts the address on a stack and throws an interrupt to the other CPU. The other CPU, in turn, receives the interrupt, look at the stack, and invalidate the address. Et vice-versa.
"OMG, you would say, it's damn long !!!" It is. I'm afraid it's the price we have to pay. Either we got coherency, and on each write, you interrupt the other CPU, either we got independance, as mask states, but it's long (access time, anyone ?).

But damn ... How interesting are this CPU and this architecture !!!
Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Post by Stef »

I think the best soluce is still to avoid working on the same piece of memory when possible ;)
ob1
Very interested
Posts: 468
Joined: Wed Dec 06, 2006 9:01 am
Location: Aix-en-Provence, France

Post by ob1 »

For sure !
And by the way, use private RAM if the data is less than 2KB.
Enable 2-way cache (and thus, 2KB of private RAM) :

Code: Select all

	mov	CCR,r0
	mov	#$19,r1
	mov	r1,@r0	; Enable Cache, set 2-way, and purge

CCR:	dc.l	$FFFFFE92
Then, private RAM is from C000 0000h to C000 07FFh.
I don't know how fast the cache acces time is, but it certainly be under the 12 cycles required for SDRAM (OK, by 8 words). I assume cache access time is 2 cycles.
Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Post by Stef »

I believe cache latency is 2 or 3 cycles. In Gens i got many 32X speed inacuracies because of the different latencies (RAM / ROM ...)
Virtua Racing use a lot the internal cache. It uses it for all the 3D transformation : MAC instructions combined to internal cache, the game appears to be nicely optimised on this point ;)
Mask of Destiny
Very interested
Posts: 628
Joined: Thu Nov 30, 2006 6:30 am

Post by Mask of Destiny »

I think triggering an interupt on each write would be a lot slower than just accessing all data with the cache turned off. Interupt handling on the SH-2 takes a lot of cycles. What would probably make more sense is to set the processors up as part of a rendering pipeline in which the first processor does the first few stages of rendering and then the second processor flushes the cache for the area of memory that the first processor rendered to and then does the last few stages of rendering. That way there is only one clearly defined point where the two processors need to pass data between them.
ob1
Very interested
Posts: 468
Joined: Wed Dec 06, 2006 9:01 am
Location: Aix-en-Provence, France

Re: Cache coherency

Post by ob1 »

ob1 wrote:each time a CPU write to memory, it puts the address on a stack and throws an interrupt to the other CPU. The other CPU, in turn, receives the interrupt, look at the stack, and invalidate the address. Et vice-versa.
... or I could use SCI ...
Chilly Willy
Very interested
Posts: 2993
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy »

It depends on how much data you want to share. If you just have a few words, just use the COMM registers - it's what they're there for. :D

If you have more data, but it's not changed very often, SCI might be better. Any significant amount of data might be better with uncached memory - both to avoid coherency issues AND to avoid flooding the caches.

For example, Doom uses uncached access to the frame buffer (on all platforms) as it's MUCH faster due to cache issues. The difference can be from 5 to 10 times as fast using an uncached frame buffer vs a cached frame buffer.
TMorita
Interested
Posts: 17
Joined: Thu May 29, 2008 8:07 am

Post by TMorita »

Mask of Destiny wrote:I think triggering an interupt on each write would be a lot slower than just accessing all data with the cache turned off. Interupt handling on the SH-2 takes a lot of cycles. ...
.
Yes.

It's faster to do this the correct way, e.g. by just using the cache-through address space on both SH2s for shared data.

Toshi
ob1
Very interested
Posts: 468
Joined: Wed Dec 06, 2006 9:01 am
Location: Aix-en-Provence, France

Post by ob1 »

MoD and T.Mojita said.
I'll do as.
Post Reply