DMA to PWM

Ask anything your want about the 32X Mushroom programming.

Moderator: BigEvilCorporation

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Thu Dec 10, 2009 8:35 am

I've been working on a test app to see if I can get DMA sound working. The code I have works on Fusion, but not Gens/GS, and not on real hardware. The MARS Check rom works on my 32X, and the Master and Slave DMA PWM tests pass just fine, but I can't see what magic they're doing to get it working.

From what I can see, it's waiting on TE - forever. The DMA is enabled, and I'm setting RTP, so it should be getting DREQs. I'm puzzled, to say the least.

Snake
Very interested
Posts: 206
Joined: Sat Sep 13, 2008 1:01 am

Post by Snake » Tue Jan 12, 2010 3:29 am

cut-and-paste some (crappy) code from a test program I wrote:

Code: Select all

	mov.l	#$ffffff90,r0		; set DMAC src
	mov.l	#$22000000,r1
	mov.l	r1,@r0

	mov.l	#$ffffff94,r0		; set DMAC dst
	mov.l	#$20004038,r1		; (PWM mono register)
	mov.l	r1,@r0

	mov.l	#$ffffff98,r0		; set DMAC len
	mov.l	#$20000,r1
	mov.l	r1,@r0

	mov.l	#$ffffff9c,r0		; DMAC control:
	mov.l	@r0,r1			; read (make sure TE is clear)
	mov.l	#$14e1,r1		; and set the various mode bits.
	mov.l	r1,@r0

	mov.l	#$ffffffb0,r0		; DMAC operation:
	mov.l	@r0,r1			; read (to clear various bits)
	mov.l	#$1,r1			; and enable.
	mov.l	r1,@r0

	mov.l	#$20004032,r0		; set PWM frequency
	mov.w	#$0400,r1
	mov.w	r1,@r0

	mov.l	#$20004038,r0		; shove some crap in the PWM fifo
	mov.w	#0,r1			; to make sure it starts requesting new data
	mov.w	r1,@r0
	mov.w	r1,@r0
	mov.w	r1,@r0

	mov.l	#$20004030,r0		; start PWM
	mov.w	#$0185,r1
	mov.w	r1,@r0
Should work...

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Tue Jan 12, 2010 6:35 am

This is what I'm doing in my test - modified slightly given Snake's last post, but it still doesn't work on real hardware. It works in FUSION, but on real hardware, it acts like it isn't getting any DMA requests.

Code: Select all

void slave(void)
{
	// init DMA
    SH2_DMA_SAR0 = 0;
    SH2_DMA_DAR0 = 0;
    SH2_DMA_TCR0 = 0;
    SH2_DMA_CHCR0 = 0;
    SH2_DMA_DRCR0 = 0;
    SH2_DMA_SAR1 = 0;
    SH2_DMA_DAR1 = 0;
    SH2_DMA_TCR1 = 0;
    SH2_DMA_CHCR1 = 0;
    SH2_DMA_DRCR1 = 0;
	SH2_DMA_DMAOR = 0; // disable DMA

    // init the sound hardware
    MARS_PWM_CTRL = 0x0185; // TM = 1, RTP, RMD = right, LMD = left
    if (MARS_VDP_DISPMODE & MARS_NTSC_FORMAT)
        MARS_PWM_CYCLE = 23011361/44100 + 1; // 44.1kHz for NTSC clock
    else
        MARS_PWM_CYCLE = 22801467/44100 + 1; // 44.1kHz for PAL clock

    while (1)
    {
        // only do sound when sound subsytem initialized
        while (MARS_SYS_COMM4 != 0)
        {
			unsigned long tmp;

			if (MARS_SYS_COMM4 == 1)
			{
				// prime the pwm channel to get it requesting data
				MARS_SYS_COMM4 = 2;
				MARS_PWM_MONO = 1;
				MARS_PWM_MONO = 1;
				MARS_PWM_MONO = 1;
			}

            // start DMA on first buffer and fill second
            SH2_DMA_SAR1 = (unsigned long)sndbuf | 0x20000000;
            SH2_DMA_DAR1 = 0x20004034; // storing a long here will set left and right
            SH2_DMA_TCR1 = NUM_SAMPS; // number longs
			tmp = SH2_DMA_CHCR1; // read to make sure TE clear
            SH2_DMA_CHCR1 = 0x18E1; // dest fixed, src incr, size long, ext req, dack mem to dev, dack hi, dack edge, dreq rising edge, cycle-steal, dual addr, intr disabled, clear TE, dma enabled
			tmp = SH2_DMA_DMAOR; // read to clear various bits
			SH2_DMA_DMAOR = 1; // enable DMA

            FillSoundBuff(NUM_SAMPS);

            // wait on DMA
            while (!(SH2_DMA_CHCR1 & 2)) ; // wait on TE
            SH2_DMA_CHCR1 = 0x18E0; // clear TE, dma disabled
			SH2_DMA_DMAOR = 0; // disable DMA

            // start DMA on second buffer and fill first
            SH2_DMA_SAR1 = ((unsigned long)sndbuf + NUM_SAMPS * 2 * 2) | 0x20000000;
            SH2_DMA_DAR1 = 0x20004034; // storing a long here will set left and right
            SH2_DMA_TCR1 = NUM_SAMPS; // number longs
			tmp = SH2_DMA_CHCR1; // read to make sure TE clear
            SH2_DMA_CHCR1 = 0x18E1; // dest fixed, src incr, size long, ext req, dack mem to dev, dack hi, dack edge, dreq rising edge, cycle-steal, dual addr, intr disabled, clear TE, dma enabled
			tmp = SH2_DMA_DMAOR; // read to clear various bits
			SH2_DMA_DMAOR = 1; // enable DMA

            FillSoundBuff(0);

            // wait on DMA
            while (!(SH2_DMA_CHCR1 & 2)) ; // wait on TE
            SH2_DMA_CHCR1 = 0x18E0; // clear TE, dma disabled
			SH2_DMA_DMAOR = 0; // disable DMA
        }
    }
}

Snake
Very interested
Posts: 206
Joined: Sat Sep 13, 2008 1:01 am

Post by Snake » Tue Jan 12, 2010 5:01 pm

Hmm. Well, my test code was for real hardware and did work. Try filling your buffer with random crap before you start - maybe the first dma is happening and it's just getting stuck in your end detection code. I can't see why that wouldn't work, but I've only tried this with interrupts, so I'm not 100% sure.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Tue Jan 12, 2010 5:11 pm

Snake wrote:Hmm. Well, my test code was for real hardware and did work. Try filling your buffer with random crap before you start - maybe the first dma is happening and it's just getting stuck in your end detection code. I can't see why that wouldn't work, but I've only tried this with interrupts, so I'm not 100% sure.
Thanks! I'll give it a try. Maybe try making the DMA int driven as well to see if that changes something. One thing I noticed in SEGA's test code they use in the diagnostic cart is they actually look for a timeout in the loop to check the DMA done. Maybe there's a bug in these SH2s that doesn't always set TE on the end of the transfer. In my code, if they miss just one, it's stuck forever.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Tue Jan 12, 2010 7:05 pm

I tried interrupt driven DMA... same thing. Never ends. To be sure I wasn't getting a transfer error, I added a check for that.

Code: Select all

            while (!(SH2_DMA_CHCR1 & 2) && !(SH2_DMA_DMAOR & 6)) ; // wait on TE, AE, or NMIF
Note - when interrupt driven, it used a check for a flag set in one of the comm registers by the dma interrupt.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Tue Jan 12, 2010 8:48 pm

Okay, my bad... my code was working fine. The problem was I forgot the "volatile" does nothing for caches. :oops:

The slave wasn't ever getting the sample to play. That doesn't affect emulations, but is crucial on real hardware. So some volatile pointer to variables ORd with 0x20000000 cleared things up. :D

Here's my little test app - just press A or B to play a sound.

DMAAudioTest32X.zip

EDIT: the slave() in the above has one bit of test code that isn't needed that I forgot about. You don't need to worry about setting the control reg after priming the PWM. You can do that before. Here's the way I do it (tested to be sure it still works on real hardware):

Code: Select all

void slave(void)
{
	// init DMA
    SH2_DMA_SAR0 = 0;
    SH2_DMA_DAR0 = 0;
    SH2_DMA_TCR0 = 0;
    SH2_DMA_CHCR0 = 0;
    SH2_DMA_DRCR0 = 0;
    SH2_DMA_SAR1 = 0;
    SH2_DMA_DAR1 = 0;
    SH2_DMA_TCR1 = 0;
    SH2_DMA_CHCR1 = 0;
    SH2_DMA_DRCR1 = 0;

    // init the sound hardware
    MARS_PWM_CTRL = 0x0185; // TM = 1, RTP, RMD = right, LMD = left
    if (MARS_VDP_DISPMODE & MARS_NTSC_FORMAT)
        MARS_PWM_CYCLE = 23011361/44100 + 1; // 44.1kHz for NTSC clock
    else
        MARS_PWM_CYCLE = 22801467/44100 + 1; // 44.1kHz for PAL clock

    while (1)
    {
        // only do sound when sound subsytem initialized
        while (MARS_SYS_COMM4 != 0)
        {
			unsigned long tmp;

			if (MARS_SYS_COMM4 == 1)
			{
				MARS_SYS_COMM4 = 2;
				MARS_PWM_MONO = 1;
				MARS_PWM_MONO = 1;
				MARS_PWM_MONO = 1;
			}

            // start DMA on first buffer and fill second
            SH2_DMA_SAR1 = (unsigned long)sndbuf | 0x20000000;
            SH2_DMA_DAR1 = 0x20004034; // storing a long here will set left and right
            SH2_DMA_TCR1 = NUM_SAMPS; // number longs
			tmp = SH2_DMA_CHCR1; // read to make sure TE clear
            SH2_DMA_CHCR1 = 0x18E1; // dest fixed, src incr, size long, ext req, dack mem to dev, dack hi, dack edge, dreq rising edge, cycle-steal, dual addr, intr disabled, clear TE, dma enabled
			tmp = SH2_DMA_DMAOR; // read to clear various bits
			SH2_DMA_DMAOR = 1; // enable DMA

            FillSoundBuff(NUM_SAMPS);

            // wait on DMA
            while (!(SH2_DMA_CHCR1 & 2) && !(SH2_DMA_DMAOR & 6)) ; // wait on TE, AE, or NMIF
            SH2_DMA_CHCR1 = 0x18E0; // clear TE, dma disabled
			SH2_DMA_DMAOR = 0; // disable DMA

            // start DMA on second buffer and fill first
            SH2_DMA_SAR1 = ((unsigned long)sndbuf + NUM_SAMPS * 2 * 2) | 0x20000000;
            SH2_DMA_DAR1 = 0x20004034; // storing a long here will set left and right
            SH2_DMA_TCR1 = NUM_SAMPS; // number longs
			tmp = SH2_DMA_CHCR1; // read to make sure TE clear
            SH2_DMA_CHCR1 = 0x18E1; // dest fixed, src incr, size long, ext req, dack mem to dev, dack hi, dack edge, dreq rising edge, cycle-steal, dual addr, intr disabled, clear TE, dma enabled
			tmp = SH2_DMA_DMAOR; // read to clear various bits
			SH2_DMA_DMAOR = 1; // enable DMA

            FillSoundBuff(0);

            // wait on DMA
            while (!(SH2_DMA_CHCR1 & 2) && !(SH2_DMA_DMAOR & 6)) ; // wait on TE, AE, or NMIF
            SH2_DMA_CHCR1 = 0x18E0; // clear TE, dma disabled
			SH2_DMA_DMAOR = 0; // disable DMA
        }
    }
}
I should probably check if some of the "extra" stuff added (like reading certain regs to clear bits) is really needed. Also, the fillbuffer() could be made more efficient. This was merely a test to figure out how to do DMA audio. Have to say, it sounds nice - I'm playing 9 bit stereo samples at 44100Hz. The source sounds are 16 bit LE stereo samples at 44100 Hz.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Tue Jan 12, 2010 9:26 pm

A little bit of trimming plus a more efficient fillbuffer():

Code: Select all

void FillSoundBuff(int offset)
{
	int ix;
	short samp;
	short *buf = (short *)(((unsigned long)sndbuf + offset * 4) | 0x20000000);
	volatile int *aud_len = (int *)((unsigned int)&gAudioLen | 0x20000000);
	volatile int *aud_buf = (int *)((unsigned int)&gAudioBuf | 0x20000000);
	unsigned char *buffer = (unsigned char *)*aud_buf;
	int iy = *aud_len;

	if (iy >= NUM_SAMPS)
	{
		for (ix=0; ix<NUM_SAMPS; ix++)
		{
			samp = buffer[0] | (buffer[1] << 8);
			buf[0] = (samp >> 7) + 258;
			samp = buffer[2] | (buffer[3] << 8);
			buf[1] = (samp >> 7) + 258;
			buffer += 4;
			buf += 2;
		}
		*aud_buf += NUM_SAMPS * 4;
		*aud_len -= NUM_SAMPS;
	}
	else
	{
		for (ix=0; ix<iy; ix++)
		{
			samp = buffer[0] | (buffer[1] << 8);
			buf[0] = (samp >> 7) + 258;
			samp = buffer[2] | (buffer[3] << 8);
			buf[1] = (samp >> 7) + 258;
			buffer += 4;
			buf += 2;
		}
		for (ix=iy; ix<NUM_SAMPS; ix++)
		{
			buf[0] = 258;
			buf[1] = 258;
			buf += 2;
		}
		*aud_len = 0;
	}
}

void slave(void)
{
	// init DMA
    SH2_DMA_SAR0 = 0;
    SH2_DMA_DAR0 = 0;
    SH2_DMA_TCR0 = 0;
    SH2_DMA_CHCR0 = 0;
    SH2_DMA_DRCR0 = 0;
    SH2_DMA_SAR1 = 0;
    SH2_DMA_DAR1 = 0;
    SH2_DMA_TCR1 = 0;
    SH2_DMA_CHCR1 = 0;
    SH2_DMA_DRCR1 = 0;
	SH2_DMA_DMAOR = 1; // enable DMA

    // init the sound hardware
    MARS_PWM_CTRL = 0x0185; // TM = 1, RTP, RMD = right, LMD = left
    if (MARS_VDP_DISPMODE & MARS_NTSC_FORMAT)
        MARS_PWM_CYCLE = 23011361/44100 + 1; // 44.1kHz for NTSC clock
    else
        MARS_PWM_CYCLE = 22801467/44100 + 1; // 44.1kHz for PAL clock

    while (1)
    {
        // only do sound when sound subsytem initialized
        while (MARS_SYS_COMM4 != 0)
        {
			if (MARS_SYS_COMM4 == 1)
			{
				MARS_SYS_COMM4 = 2;
				MARS_PWM_MONO = 1;
				MARS_PWM_MONO = 1;
				MARS_PWM_MONO = 1;
			}

            // start DMA on first buffer and fill second
            SH2_DMA_SAR1 = (unsigned long)sndbuf | 0x20000000;
            SH2_DMA_DAR1 = 0x20004034; // storing a long here will set left and right
            SH2_DMA_TCR1 = NUM_SAMPS; // number longs
            SH2_DMA_CHCR1 = 0x18E1; // dest fixed, src incr, size long, ext req, dack mem to dev, dack hi, dack edge, dreq rising edge, cycle-steal, dual addr, intr disabled, clear TE, dma enabled

            FillSoundBuff(NUM_SAMPS);

            // wait on DMA
            while (!(SH2_DMA_CHCR1 & 2)) ; // wait on TE
            SH2_DMA_CHCR1 = 0x18E0; // clear TE, dma disabled

            // start DMA on second buffer and fill first
            SH2_DMA_SAR1 = ((unsigned long)sndbuf + NUM_SAMPS * 4) | 0x20000000;
            SH2_DMA_DAR1 = 0x20004034; // storing a long here will set left and right
            SH2_DMA_TCR1 = NUM_SAMPS; // number longs
            SH2_DMA_CHCR1 = 0x18E1; // dest fixed, src incr, size long, ext req, dack mem to dev, dack hi, dack edge, dreq rising edge, cycle-steal, dual addr, intr disabled, clear TE, dma enabled

            FillSoundBuff(0);

            // wait on DMA
            while (!(SH2_DMA_CHCR1 & 2)) ; // wait on TE
            SH2_DMA_CHCR1 = 0x18E0; // clear TE, dma disabled
        }
    }
}

Snake
Very interested
Posts: 206
Joined: Sat Sep 13, 2008 1:01 am

Post by Snake » Tue Jan 12, 2010 9:59 pm

Bah!

Well I'm glad you got it working, but I just wrote you a test app and I guess you don't need it now :lol: :cry: I thought it must be something simple because this really isn't that hard.

I did discover that I've broken support for stereo dma in Fusion at some point (it works but sounds rough). Dunno when I did that but I shall investigate and fix.

[edit] you can trim your code even further. Once you get the signal that the DMA has finished, all you need to do is write source address, length and do the $18E1 thing again. No need to write $18E0 or set the dest address again.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Tue Jan 12, 2010 10:35 pm

Snake wrote:Bah!

Well I'm glad you got it working, but I just wrote you a test app and I guess you don't need it now :lol: :cry: I thought it must be something simple because this really isn't that hard.

I did discover that I've broken support for stereo dma in Fusion at some point (it works but sounds rough). Dunno when I did that but I shall investigate and fix.

[edit] you can trim your code even further. Once you get the signal that the DMA has finished, all you need to do is write source address, length and do the $18E1 thing again. No need to write $18E0 or set the dest address again.
Yeah, I can see that the dest address should be pulled outside the loop as it never changes. As to the 0x18E0, I guess when I write 0x18E1, that clears TE at the same time. I wasn't sure if you were allowed to clear TE and set DE in the same write.

It was funny - I've been doing DMA for quite some time and not realizing it. Usually, I'm good with remembering about the caches, but this time I totally blanked on it.

I noticed that Fusion was a bit rough on the sound, but wasn't sure where the issue came from. It may have to do with the fact that I'm doing LONGWORD DMA to the left channel, counting on the fact that the right channel immediately follows it in the map. So I'd check long stores to the 32X IO to see if it's properly doing a word to left and a word to right.

Snake
Very interested
Posts: 206
Joined: Sat Sep 13, 2008 1:01 am

Post by Snake » Wed Jan 13, 2010 11:56 pm

Yeah, it'll be something silly that i've done by accident. It worked when I originally wrote it, because that's what my test program was for ;) But I guess since nothing ever uses this I didn't notice when it got broken.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Thu Jan 14, 2010 12:19 am

Yeah, it's odd that nothing uses DMA audio when that's clearly what you want. If you look at my double-buffer routine for the DMA audio, It's going to use less CPU time than interrupt driven audio, and gives more time to generate samples than polling the pwm fifo. While polling seems to have less overhead, that overhead is for every sample. The overhead on the DMA double-buffer is per buffer, not per sample.

You have at least 1 audio sample period to respond to the end of the DMA transfer (and perhaps as many as 3), and you have at least #samples-3 sample periods to fill the buffer. It does introduce a latency equal to the #samples * sample period, but for reasonably sized buffers, it's not going to be noticeable.

I'm going to work this into my next update to Wolf32X now that I got all the kinks worked out on my example. I wanted it in there originally, but had trouble getting it to work, so I went with polling the pwm fifo. It was enough time for adding several sound effects together, but nothing else. Double-buffered DMA will give me a lot more time, so I'll probably add MIDI playing to it.

Snake
Very interested
Posts: 206
Joined: Sat Sep 13, 2008 1:01 am

Post by Snake » Thu Jan 14, 2010 12:26 am

It's really annoying. Given that none of the RAM in the 32X is really fast enough, DMA is a complete waste of time - except for this purpose, where its - well - awesome. The perfect solution, and as you say, the PWM quality is outstanding. But instead we get low quality audio because nobody wanted to waste the CPU time doing this any other way.

I believe the reason it's not used in pretty much every game is because it wasn't made at all obvious that you could do this. I'd definitely have used it myself had I known it could be done back then.

Real shame...

Graz
Very interested
Posts: 81
Joined: Thu Aug 23, 2007 12:36 am
Location: Orlando, FL

Post by Graz » Thu Jan 14, 2010 12:45 am

I started with the leaked official docs, and it seemed pretty much obvious to me. I thought it was the recommended way to do things.

Snake
Very interested
Posts: 206
Joined: Sat Sep 13, 2008 1:01 am

Post by Snake » Thu Jan 14, 2010 12:47 am

Graz wrote:I started with the leaked official docs, and it seemed pretty much obvious to me. I thought it was the recommended way to do things.
Well, I can tell you that the docs available at the time didn't make it at all clear :(

Post Reply