Cache coherency
Moderator: BigEvilCorporation
Cache coherency
As said before (mask of destiny), the SH2 doesn't have any mechanism for cache coherency. When a CPU writes to memory, the other CPU has no mean of knowing the data he might have in its cache has to be invalidated.
I've thought about a mechanism :
each time a CPU write to memory, it puts the address on a stack and throws an interrupt to the other CPU. The other CPU, in turn, receives the interrupt, look at the stack, and invalidate the address. Et vice-versa.
"OMG, you would say, it's damn long !!!" It is. I'm afraid it's the price we have to pay. Either we got coherency, and on each write, you interrupt the other CPU, either we got independance, as mask states, but it's long (access time, anyone ?).
But damn ... How interesting are this CPU and this architecture !!!
I've thought about a mechanism :
each time a CPU write to memory, it puts the address on a stack and throws an interrupt to the other CPU. The other CPU, in turn, receives the interrupt, look at the stack, and invalidate the address. Et vice-versa.
"OMG, you would say, it's damn long !!!" It is. I'm afraid it's the price we have to pay. Either we got coherency, and on each write, you interrupt the other CPU, either we got independance, as mask states, but it's long (access time, anyone ?).
But damn ... How interesting are this CPU and this architecture !!!
For sure !
And by the way, use private RAM if the data is less than 2KB.
Enable 2-way cache (and thus, 2KB of private RAM) :
Then, private RAM is from C000 0000h to C000 07FFh.
I don't know how fast the cache acces time is, but it certainly be under the 12 cycles required for SDRAM (OK, by 8 words). I assume cache access time is 2 cycles.
And by the way, use private RAM if the data is less than 2KB.
Enable 2-way cache (and thus, 2KB of private RAM) :
Code: Select all
mov CCR,r0
mov #$19,r1
mov r1,@r0 ; Enable Cache, set 2-way, and purge
CCR: dc.l $FFFFFE92
I don't know how fast the cache acces time is, but it certainly be under the 12 cycles required for SDRAM (OK, by 8 words). I assume cache access time is 2 cycles.
-
- Very interested
- Posts: 3131
- Joined: Thu Nov 30, 2006 9:46 pm
- Location: France - Sevres
- Contact:
I believe cache latency is 2 or 3 cycles. In Gens i got many 32X speed inacuracies because of the different latencies (RAM / ROM ...)
Virtua Racing use a lot the internal cache. It uses it for all the 3D transformation : MAC instructions combined to internal cache, the game appears to be nicely optimised on this point
Virtua Racing use a lot the internal cache. It uses it for all the 3D transformation : MAC instructions combined to internal cache, the game appears to be nicely optimised on this point
-
- Very interested
- Posts: 616
- Joined: Thu Nov 30, 2006 6:30 am
I think triggering an interupt on each write would be a lot slower than just accessing all data with the cache turned off. Interupt handling on the SH-2 takes a lot of cycles. What would probably make more sense is to set the processors up as part of a rendering pipeline in which the first processor does the first few stages of rendering and then the second processor flushes the cache for the area of memory that the first processor rendered to and then does the last few stages of rendering. That way there is only one clearly defined point where the two processors need to pass data between them.
Re: Cache coherency
... or I could use SCI ...ob1 wrote:each time a CPU write to memory, it puts the address on a stack and throws an interrupt to the other CPU. The other CPU, in turn, receives the interrupt, look at the stack, and invalidate the address. Et vice-versa.
-
- Very interested
- Posts: 2984
- Joined: Fri Aug 17, 2007 9:33 pm
It depends on how much data you want to share. If you just have a few words, just use the COMM registers - it's what they're there for.
If you have more data, but it's not changed very often, SCI might be better. Any significant amount of data might be better with uncached memory - both to avoid coherency issues AND to avoid flooding the caches.
For example, Doom uses uncached access to the frame buffer (on all platforms) as it's MUCH faster due to cache issues. The difference can be from 5 to 10 times as fast using an uncached frame buffer vs a cached frame buffer.
If you have more data, but it's not changed very often, SCI might be better. Any significant amount of data might be better with uncached memory - both to avoid coherency issues AND to avoid flooding the caches.
For example, Doom uses uncached access to the frame buffer (on all platforms) as it's MUCH faster due to cache issues. The difference can be from 5 to 10 times as fast using an uncached frame buffer vs a cached frame buffer.
Yes.Mask of Destiny wrote:I think triggering an interupt on each write would be a lot slower than just accessing all data with the cache turned off. Interupt handling on the SH-2 takes a lot of cycles. ...
.
It's faster to do this the correct way, e.g. by just using the cache-through address space on both SH2s for shared data.
Toshi