Cache coherency

Ask anything your want about the 32X Mushroom programming.

Moderator: BigEvilCorporation

Post Reply
ob1
Very interested
Posts: 463
Joined: Wed Dec 06, 2006 9:01 am
Location: Aix-en-Provence, France

Cache coherency

Post by ob1 » Wed Jan 31, 2007 7:48 am

As said before (mask of destiny), the SH2 doesn't have any mechanism for cache coherency. When a CPU writes to memory, the other CPU has no mean of knowing the data he might have in its cache has to be invalidated.
I've thought about a mechanism :
each time a CPU write to memory, it puts the address on a stack and throws an interrupt to the other CPU. The other CPU, in turn, receives the interrupt, look at the stack, and invalidate the address. Et vice-versa.
"OMG, you would say, it's damn long !!!" It is. I'm afraid it's the price we have to pay. Either we got coherency, and on each write, you interrupt the other CPU, either we got independance, as mask states, but it's long (access time, anyone ?).

But damn ... How interesting are this CPU and this architecture !!!

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Post by Stef » Wed Jan 31, 2007 9:53 am

I think the best soluce is still to avoid working on the same piece of memory when possible ;)

ob1
Very interested
Posts: 463
Joined: Wed Dec 06, 2006 9:01 am
Location: Aix-en-Provence, France

Post by ob1 » Wed Jan 31, 2007 10:33 am

For sure !
And by the way, use private RAM if the data is less than 2KB.
Enable 2-way cache (and thus, 2KB of private RAM) :

Code: Select all

	mov	CCR,r0
	mov	#$19,r1
	mov	r1,@r0	; Enable Cache, set 2-way, and purge

CCR:	dc.l	$FFFFFE92
Then, private RAM is from C000 0000h to C000 07FFh.
I don't know how fast the cache acces time is, but it certainly be under the 12 cycles required for SDRAM (OK, by 8 words). I assume cache access time is 2 cycles.

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Post by Stef » Wed Jan 31, 2007 11:33 am

I believe cache latency is 2 or 3 cycles. In Gens i got many 32X speed inacuracies because of the different latencies (RAM / ROM ...)
Virtua Racing use a lot the internal cache. It uses it for all the 3D transformation : MAC instructions combined to internal cache, the game appears to be nicely optimised on this point ;)

Mask of Destiny
Very interested
Posts: 616
Joined: Thu Nov 30, 2006 6:30 am

Post by Mask of Destiny » Thu Feb 01, 2007 8:24 pm

I think triggering an interupt on each write would be a lot slower than just accessing all data with the cache turned off. Interupt handling on the SH-2 takes a lot of cycles. What would probably make more sense is to set the processors up as part of a rendering pipeline in which the first processor does the first few stages of rendering and then the second processor flushes the cache for the area of memory that the first processor rendered to and then does the last few stages of rendering. That way there is only one clearly defined point where the two processors need to pass data between them.

ob1
Very interested
Posts: 463
Joined: Wed Dec 06, 2006 9:01 am
Location: Aix-en-Provence, France

Re: Cache coherency

Post by ob1 » Tue Jan 20, 2009 2:59 pm

ob1 wrote:each time a CPU write to memory, it puts the address on a stack and throws an interrupt to the other CPU. The other CPU, in turn, receives the interrupt, look at the stack, and invalidate the address. Et vice-versa.
... or I could use SCI ...

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Wed Jan 21, 2009 9:00 am

It depends on how much data you want to share. If you just have a few words, just use the COMM registers - it's what they're there for. :D

If you have more data, but it's not changed very often, SCI might be better. Any significant amount of data might be better with uncached memory - both to avoid coherency issues AND to avoid flooding the caches.

For example, Doom uses uncached access to the frame buffer (on all platforms) as it's MUCH faster due to cache issues. The difference can be from 5 to 10 times as fast using an uncached frame buffer vs a cached frame buffer.

TMorita
Interested
Posts: 17
Joined: Thu May 29, 2008 8:07 am

Post by TMorita » Thu Jan 22, 2009 8:29 pm

Mask of Destiny wrote:I think triggering an interupt on each write would be a lot slower than just accessing all data with the cache turned off. Interupt handling on the SH-2 takes a lot of cycles. ...
.
Yes.

It's faster to do this the correct way, e.g. by just using the cache-through address space on both SH2s for shared data.

Toshi

ob1
Very interested
Posts: 463
Joined: Wed Dec 06, 2006 9:01 am
Location: Aix-en-Provence, France

Post by ob1 » Fri Jan 23, 2009 9:16 am

MoD and T.Mojita said.
I'll do as.

Post Reply