So I'm thinking about porting a multi-threaded program I'm working on (an interpretter for a dataflow language if you're curious) to the 32X (or perhaps the Saturn) and I was trying to think of a sane way to use both processors in a reasonably efficient fashion. I think I have a reasonable solution, but I'd like some feedback to make sure I haven't missed anything obvious. So here it goes:
Threads locked to the processor they started on
Code and stack accessed through cached memory region
Globals and heap accessed through non-cached memory region
The logic behind this being that code is read only and therefore we don't have any coherency problems there. Each stack will only be touched by the single thread it belongs to and since threads can't move between processors only one processor will ever look at a given stack. Globals and heap on the other hand are potentially shared by all threads and manually flushing everything is going to be difficult to do properly and probably not very performant (apart from some special cases where the data is mostly read only, but those I can handle as exceptions if there's enough performance to be gained).
Will this approach work? Does it sound like a good compromise between performance and complexity?
Cache Coherency Sanity Check
Moderator: BigEvilCorporation
-
- Very interested
- Posts: 616
- Joined: Thu Nov 30, 2006 6:30 am
-
- Very interested
- Posts: 3131
- Joined: Thu Nov 30, 2006 9:46 pm
- Location: France - Sevres
- Contact:
Re: Cache Coherency Sanity Check
In fact because of the cache problem, it's really recommended to avoid as most as possible to share data between the main and slave SH2 cpu. So having them working on different thread sounds as the (only ? best ?) solution to use them at their best potential Avoid globals and heap access as much you can I never coded on 32X but i often heard the cache cohenrency cause many troubles when you want to use both CPU at same time !Mask of Destiny wrote:So I'm thinking about porting a multi-threaded program I'm working on (an interpretter for a dataflow language if you're curious) to the 32X (or perhaps the Saturn) and I was trying to think of a sane way to use both processors in a reasonably efficient fashion. I think I have a reasonable solution, but I'd like some feedback to make sure I haven't missed anything obvious. So here it goes:
Threads locked to the processor they started on
Code and stack accessed through cached memory region
Globals and heap accessed through non-cached memory region
The logic behind this being that code is read only and therefore we don't have any coherency problems there. Each stack will only be touched by the single thread it belongs to and since threads can't move between processors only one processor will ever look at a given stack. Globals and heap on the other hand are potentially shared by all threads and manually flushing everything is going to be difficult to do properly and probably not very performant (apart from some special cases where the data is mostly read only, but those I can handle as exceptions if there's enough performance to be gained).
Will this approach work? Does it sound like a good compromise between performance and complexity?
-
- Very interested
- Posts: 616
- Joined: Thu Nov 30, 2006 6:30 am