Multi-processor program structure

Ask anything your want about the 32X Mushroom programming.

Moderator: BigEvilCorporation

ammianus
Very interested
Posts: 124
Joined: Sun Jan 29, 2012 2:10 pm
Location: North America
Contact:

Multi-processor program structure

Post by ammianus » Sun Nov 25, 2012 1:09 pm

So far my experimentation has been using a single 32X SH2 CPU for running my entire demo. Based on the example code I have, the MD and slave SH2 are idling in loops in asm code.

I want to get to the point where I have simple C programs running in the MD and slave CPU. I just don't really have a good sense of how that is done.

Ignoring MD for now...

If I just want to have the slave SH2 start in it's own main() how should I structure my C code? How to have the initialization point to the C function that it should start in?

I am using Chilly Willy's tool chain and here is the code that starts the "slave()" but I don't know where exactly that is being defined nor if I can change it to my own C code.




Code: Select all

scont:
! clear interrupt flags
        mov.l   _slave_int_clr,r1
        mov.w   r0,@-r1     /* PWM INT clear */
        mov.w   r0,@r1
        mov.w   r0,@-r1     /* CMD INT clear */
        mov.w   r0,@r1
        mov.w   r0,@-r1     /* H INT clear */
        mov.w   r0,@r1
        mov.w   r0,@-r1     /* V INT clear */
        mov.w   r0,@r1
        mov.w   r0,@-r1     /* VRES INT clear */
        mov.w   r0,@r1

        mov.l   _slave_stk,r15
! wait for Master SH2 and 68000 to finish init
        mov.l   _slave_sts,r0
        mov.l   _slave_ok,r1
1:
        mov.l   @r0,r2
        nop
        nop
        cmp/eq  r1,r2
        bt      1b

        mov.l   _slave_adapter,r1
        mov     #0x00,r0
        mov.b   r0,@(1,r1)  /* set int enables (different from master despite same address!) */
        mov     #0x20,r0
        ldc     r0,sr       /* allow ints */

! purge cache, turn it on, and run slave()
        mov.l   _slave_cctl,r0
        mov     #0x11,r1
        mov.b   r1,@r0
        mov.l   _slave_go,r0
        jmp     @r0
        nop

        .align   2
_slave_int_clr:
        .long   0x2000401E  /* one word passed last int clr reg */
_slave_stk:
        .long   0x06040000  /* Cold Start SP */
_slave_sts:
        .long   0x20004024
_slave_ok:
        .ascii  "S_OK"
_slave_adapter:
        .long   0x20004000
_slave_cctl:
        .long   0xFFFFFE92
_slave_go:
        .long   _slave

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Sun Nov 25, 2012 5:39 pm

The SH2 compiler adds "_" to all labels, so "_slave" would be "slave" in C. In fact, _slave is just "void slave(void)" as you can see in many of my examples... let's look at the latest, my XM player. The slave function is in hw_32x.c:

Code: Select all

void slave(void)
{
    uint16_t sample, ix;

    // init DMA
    SH2_DMA_SAR0 = 0;
    SH2_DMA_DAR0 = 0;
    SH2_DMA_TCR0 = 0;
    SH2_DMA_CHCR0 = 0;
    SH2_DMA_DRCR0 = 0;
    SH2_DMA_SAR1 = 0;
    SH2_DMA_DAR1 = 0x20004034; // storing a long here will set left and right
    SH2_DMA_TCR1 = 0;
    SH2_DMA_CHCR1 = 0;
    SH2_DMA_DRCR1 = 0;
    SH2_DMA_DMAOR = 1; // enable DMA

    SH2_DMA_VCR1 = 72; // set exception vector for DMA channel 1
    SH2_INT_IPRA = (SH2_INT_IPRA & 0xF0FF) | 0x0F00; // set DMA INT to priority 15

    // init the sound hardware
    MARS_PWM_MONO = 1;
    MARS_PWM_MONO = 1;
    MARS_PWM_MONO = 1;
    if (MARS_VDP_DISPMODE & MARS_NTSC_FORMAT)
        MARS_PWM_CYCLE = (((23011361 << 1)/SAMPLE_RATE + 1) >> 1) + 1; // for NTSC clock
    else
        MARS_PWM_CYCLE = (((22801467 << 1)/SAMPLE_RATE + 1) >> 1) + 1; // for PAL clock
    MARS_PWM_CTRL = 0x0185; // TM = 1, RTP, RMD = right, LMD = left

    sample = SAMPLE_MIN;
    /* ramp up to SAMPLE_CENTER to avoid click in audio (real 32X) */
    while (sample < SAMPLE_CENTER)
    {
        for (ix=0; ix<(SAMPLE_RATE*2)/(SAMPLE_CENTER - SAMPLE_MIN); ix++)
        {
            while (MARS_PWM_MONO & 0x8000) ; // wait while full
            MARS_PWM_MONO = sample;
        }
        sample++;
    }

    // initialize mixer
    MARS_SYS_COMM6 = MIXER_UNLOCKED; // sound subsystem running
    fill_buffer(&snd_buffer[0]); // fill first buffer
    slave_dma1_handler(); // start DMA

    SetSH2SR(2);
    while (1)
    {
        if (MARS_SYS_COMM4 == SSH2_WAITING)
            continue; // wait for command

        // do command in COMM4

        // done
        MARS_SYS_COMM4 = SSH2_WAITING;
    }
}
You can see it inits the DMA hardware inside the SH2, sets up the interrupt handling for DMA TE, inits the PWM audio inside the 32X, does a ramp up on the value of the PWM reg to the center value for samples, fills the first buffer, and starts the first DMA. It then enters a loop where it watches a comm register for a command to do something asynchronous to the audio.

If you just want to experiment with using the Slave SH2, just do something like this:

Code: Select all

void slave(void)
{
// any local vars here... watch the slave sh2 stack size!

// code goes here

}
I normally set the stack size for the slave to be pretty small. In the last XM player example, the mast sh2 stack starts at 0x0603FF00, which only leave 0x100 bytes for the slave stack, which starts at 0x06040000. If you use lots of local variables, which go on the stack, leave more space by making the master sh2 stack start at a lower address. Look for all occurrences of 0x0603FF00 and change them all. This will be in the sh2 crt0.s file.

ammianus
Very interested
Posts: 124
Joined: Sun Jan 29, 2012 2:10 pm
Location: North America
Contact:

Post by ammianus » Tue Nov 27, 2012 12:06 am

Thanks again for the clear explanation. I haven't decided which work I would split out to the slave, but thinking about ob1's SuperVDP method :) .

For now I've found that slave function and moved it out into it's own slave.c just for my own memory's sake.

I assume there is something similar on the MD side too?

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Tue Nov 27, 2012 2:57 am

ammianus wrote:Thanks again for the clear explanation. I haven't decided which work I would split out to the slave, but thinking about ob1's SuperVDP method :) .

For now I've found that slave function and moved it out into it's own slave.c just for my own memory's sake.
Yeah, that would be one way to remember it. :lol:
I assume there is something similar on the MD side too?
Well, in some examples there is, and others there isn't. In most of my 32X stuff, I just copy an assembly language loop from rom into 68000 work ram since that keeps the 68000 off the bus. You want to do that for best speed. Since work ram is small and the 68000 won't be doing much with most 32X programs, I figured assembly was fine. So you'll find the 68000 code in the m68k_crt1.s file... it's pretty straightforward as long as you know assembly. If you wish to use C on the 68K side as well, you want to look at some of my other examples, like the XM player. You'll find a src-md directory in that example, and the 68K side has its own crt0.s and main.c for the primary running of 68K code.

ammianus
Very interested
Posts: 124
Joined: Sun Jan 29, 2012 2:10 pm
Location: North America
Contact:

Post by ammianus » Tue Nov 27, 2012 12:57 pm

By the way does the crt#.s naming convention mean something?

I wasn't sure if that was an abbreviation of "cart"? What do the numbers indicate?

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Tue Nov 27, 2012 5:58 pm

ammianus wrote:By the way does the crt#.s naming convention mean something?

I wasn't sure if that was an abbreviation of "cart"? What do the numbers indicate?
Common Run Time - it's from gcc. They call any initial assembly needed to start an app crt#.s, where 0 is the initial code, and if you need any further code for things, the next is 1, etc. I'm just used to the terms, so I call my initial code files the same. You can call it anything you want... boot.s, startup.s, init.s, fancypants.s... whatever. :lol:

In a similar manner, most people call the file with main() in it main.c. It doesn't NEED to be called that, it's just a habit most people have made.

ammianus
Very interested
Posts: 124
Joined: Sun Jan 29, 2012 2:10 pm
Location: North America
Contact:

Post by ammianus » Sat Dec 08, 2012 1:29 pm

Had a follow up question. I separated out the slave.c function. Everything compiled and seemed to run.

I tried putting some actual logic in the slave() to print something out using the same methods I was using in main() to debug stuff using MD.

Code: Select all

// Slave SH2 support code ----------------------------------------------
#include <stdlib.h>
#include "32x.h"
#include "hw_32x.h"

/*
* Function called from sh2_crt0.s to start the slave CPU
*/
void slave(void)
{
// any local vars here... watch the slave sh2 stack size!
	int spStart = 0;
	int spEnd = 0;
	char debugMem[60];
	spStart = get_stack_pointer();

	// code goes here 
    while (1){
		spEnd = get_stack_pointer();
		sprintf(debugMem,"spStart: %x. spEnd: %x.", spStart, spEnd );
		HwMdPuts(debugMem, 0x2000, 0, 0);
	};
	
	    
}
This does...nothing.

What is the interaction like when I have initialized and am writing to the framebuffer in main, and I want to have the slave() also add something. Do I need to flip framebuffer in slave as well? How to avoid conflicts?

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Sat Dec 08, 2012 5:26 pm

The 32X string functions flip the buffer, draw the text, then flip it back. If you call the text routines in a loop like you do without waiting at least a few ticks, you'll see nothing because the buffer is constantly flipping. In loops where you know the print will happen rapidly, put a delay in of a couple ticks, which will make the text readable (but still flickery).

EDIT: If you also try to print at the same time as the master, you can wind up with one or the other printing in the other framebuffer, and hence not see it's output. Try to avoid printing from both at the same time... one or the other.

ammianus
Very interested
Posts: 124
Joined: Sun Jan 29, 2012 2:10 pm
Location: North America
Contact:

Post by ammianus » Sun Dec 09, 2012 11:23 pm

Isn't HwMdPuts() a Megadrive string function? If so it shouldn't matter when/how either 32X CPU would put something in the COMM, then the MD will do it's thing to output the text?

I'm starting to suspect something else is wrong. I've also tried putting in a delay in the slave's while loop, to no effect.

Thanks though.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Sun Dec 09, 2012 11:33 pm

ammianus wrote:Isn't HwMdPuts() a Megadrive string function? If so it shouldn't matter when/how either 32X CPU would put something in the COMM, then the MD will do it's thing to output the text?

I'm starting to suspect something else is wrong. I've also tried putting in a delay in the slave's while loop, to no effect.

Thanks though.
Oh, right! It is a MD function... which means you can't call it from the slave since MD functions go through the master only. You want to use the 32X puts function to print from the slave.

ammianus
Very interested
Posts: 124
Joined: Sun Jan 29, 2012 2:10 pm
Location: North America
Contact:

.

Post by ammianus » Mon Dec 10, 2012 12:58 am

Chilly Willy wrote:
ammianus wrote:Isn't HwMdPuts() a Megadrive string function? If so it shouldn't matter when/how either 32X CPU would put something in the COMM, then the MD will do it's thing to output the text?

I'm starting to suspect something else is wrong. I've also tried putting in a delay in the slave's while loop, to no effect.

Thanks though.
Oh, right! It is a MD function... which means you can't call it from the slave since MD functions go through the master only. You want to use the 32X puts function to print from the slave.
Ah that would explain it then. Thanks. A lot of limitations on the secondary cpu!

I wonder what kind of work divisions would be practical and efficient?

Main CPU: Game logic/ai/state, coordination of other CPUs
Slave CPU: drawing 32X graphics/sprite plane?

MD: Drawing background planes?

Assuming you didn't care too much about the music, could we offload that processing to the MD's z80?


I wonder what could be done by trying to use 32X as you guys described in the thread about a new SVP.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Re: .

Post by Chilly Willy » Mon Dec 10, 2012 1:47 am

ammianus wrote:Ah that would explain it then. Thanks. A lot of limitations on the secondary cpu!
The limitation is in my code... I really never thought to make the MD code respond to the slave as well as the master. It should be easy enough, I just figured the slave had better things to do, and since the master was running the game code, it would be the one to communicate with the MD side.
I wonder what kind of work divisions would be practical and efficient?
That's the $1,000,000 question when working on the 32X - getting as much out of ALL the processors as you can. My current 32X example code tends to be geared towards "easier" than more efficient - the 68000 does very little most of the time, only reading the pads and updating the ticks in the vertical blank. You can have it do more with the MD side commands related to MD video, but I rarely tend to do that. I mostly use the slave sh2 for sound/music. My interrupt-driven DMA PWM should make it easier to run more on the slave, so you might look at that more closely.

Main CPU: Game logic/ai/state, coordination of other CPUs
Slave CPU: drawing 32X graphics/sprite plane?

MD: Drawing background planes?

Assuming you didn't care too much about the music, could we offload that processing to the MD's z80?
Yes, if you are doing plain FM music, using the Z80 would be good for that. Or the 68000 given it's doing little else. My first examples for the 32X kinda assumed the 68000 would do the music, and the 68000 loop checked the FM timers, calling music routines when the timers expired. I didn't actually have any music code - the routines did nothing, but it was just example code.

I wonder what could be done by trying to use 32X as you guys described in the thread about a new SVP.
Using the 32X as a coprocessor to the main game running on the 68000 is how most of the 32X games out work. The game mostly runs on the 68000, while small snippets of code in the SH2 SDRAM do things like rotate sprites. One of the few games that is not written that way is Doom - the code mostly runs on the SH2 like my examples.

ammianus
Very interested
Posts: 124
Joined: Sun Jan 29, 2012 2:10 pm
Location: North America
Contact:

Re: .

Post by ammianus » Tue Dec 11, 2012 4:07 am

Chilly Willy wrote:
The limitation is in my code... I really never thought to make the MD code respond to the slave as well as the master. It should be easy enough, I just figured the slave had better things to do, and since the master was running the game code, it would be the one to communicate with the MD side.
Thanks for the explanation. I'm trying to draw to framebuffer from the slave now. I am using your hw_32x.h/c functions to init the VDP from the slave with MARS_VDP_MODE_256.

Of course I don't see anything. I don't draw anything from the master, but I do use the MD print functions from that side, so I know it is running at least. VDP Debug from some emulators show nothing in the framebuffer

Is there anything else you need to "do" to enable drawing from the slave? I haven't detected anything in the docs.
Last edited by ammianus on Tue Dec 11, 2012 7:58 pm, edited 1 time in total.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Tue Dec 11, 2012 7:48 pm

If you mean on the 32X side, no, the slave has full access to the frame buffers. You DID init the 32X frame buffers with Hw32xInit(), right? Passing the proper values like my examples?

Are you setting the FM bit somewhere (either the MD code or the SH2 code)? Because the Hw32xInit() waits on FM (to make sure the 68000 isn't using the frame buffer). If neither side is setting FM, calling Hw32xInit() will hang whichever processor calls it.

ammianus
Very interested
Posts: 124
Joined: Sun Jan 29, 2012 2:10 pm
Location: North America
Contact:

Post by ammianus » Tue Dec 11, 2012 9:02 pm

Chilly Willy wrote:If you mean on the 32X side, no, the slave has full access to the frame buffers. You DID init the 32X frame buffers with Hw32xInit(), right? Passing the proper values like my examples?
Yes I believe so. I do it from the slave() however, should I do it from the Master before the slave starts going?
Are you setting the FM bit somewhere (either the MD code or the SH2 code)? Because the Hw32xInit() waits on FM (to make sure the 68000 isn't using the frame buffer). If neither side is setting FM, calling Hw32xInit() will hang whichever processor calls it.
I don't think so, but I can look when I go home. Basically I removed all Hw*() functions from the master code, except for where it does MD print function calls. I guess I could remove that as well.

Otherwise the 68000 should just be running the ASM loop that you have in your examples.

Post Reply