DMA to PWM
Moderator: BigEvilCorporation
-
- Very interested
- Posts: 2984
- Joined: Fri Aug 17, 2007 9:33 pm
I've been working on a test app to see if I can get DMA sound working. The code I have works on Fusion, but not Gens/GS, and not on real hardware. The MARS Check rom works on my 32X, and the Master and Slave DMA PWM tests pass just fine, but I can't see what magic they're doing to get it working.
From what I can see, it's waiting on TE - forever. The DMA is enabled, and I'm setting RTP, so it should be getting DREQs. I'm puzzled, to say the least.
From what I can see, it's waiting on TE - forever. The DMA is enabled, and I'm setting RTP, so it should be getting DREQs. I'm puzzled, to say the least.
cut-and-paste some (crappy) code from a test program I wrote:
Should work...
Code: Select all
mov.l #$ffffff90,r0 ; set DMAC src
mov.l #$22000000,r1
mov.l r1,@r0
mov.l #$ffffff94,r0 ; set DMAC dst
mov.l #$20004038,r1 ; (PWM mono register)
mov.l r1,@r0
mov.l #$ffffff98,r0 ; set DMAC len
mov.l #$20000,r1
mov.l r1,@r0
mov.l #$ffffff9c,r0 ; DMAC control:
mov.l @r0,r1 ; read (make sure TE is clear)
mov.l #$14e1,r1 ; and set the various mode bits.
mov.l r1,@r0
mov.l #$ffffffb0,r0 ; DMAC operation:
mov.l @r0,r1 ; read (to clear various bits)
mov.l #$1,r1 ; and enable.
mov.l r1,@r0
mov.l #$20004032,r0 ; set PWM frequency
mov.w #$0400,r1
mov.w r1,@r0
mov.l #$20004038,r0 ; shove some crap in the PWM fifo
mov.w #0,r1 ; to make sure it starts requesting new data
mov.w r1,@r0
mov.w r1,@r0
mov.w r1,@r0
mov.l #$20004030,r0 ; start PWM
mov.w #$0185,r1
mov.w r1,@r0
-
- Very interested
- Posts: 2984
- Joined: Fri Aug 17, 2007 9:33 pm
This is what I'm doing in my test - modified slightly given Snake's last post, but it still doesn't work on real hardware. It works in FUSION, but on real hardware, it acts like it isn't getting any DMA requests.
Code: Select all
void slave(void)
{
// init DMA
SH2_DMA_SAR0 = 0;
SH2_DMA_DAR0 = 0;
SH2_DMA_TCR0 = 0;
SH2_DMA_CHCR0 = 0;
SH2_DMA_DRCR0 = 0;
SH2_DMA_SAR1 = 0;
SH2_DMA_DAR1 = 0;
SH2_DMA_TCR1 = 0;
SH2_DMA_CHCR1 = 0;
SH2_DMA_DRCR1 = 0;
SH2_DMA_DMAOR = 0; // disable DMA
// init the sound hardware
MARS_PWM_CTRL = 0x0185; // TM = 1, RTP, RMD = right, LMD = left
if (MARS_VDP_DISPMODE & MARS_NTSC_FORMAT)
MARS_PWM_CYCLE = 23011361/44100 + 1; // 44.1kHz for NTSC clock
else
MARS_PWM_CYCLE = 22801467/44100 + 1; // 44.1kHz for PAL clock
while (1)
{
// only do sound when sound subsytem initialized
while (MARS_SYS_COMM4 != 0)
{
unsigned long tmp;
if (MARS_SYS_COMM4 == 1)
{
// prime the pwm channel to get it requesting data
MARS_SYS_COMM4 = 2;
MARS_PWM_MONO = 1;
MARS_PWM_MONO = 1;
MARS_PWM_MONO = 1;
}
// start DMA on first buffer and fill second
SH2_DMA_SAR1 = (unsigned long)sndbuf | 0x20000000;
SH2_DMA_DAR1 = 0x20004034; // storing a long here will set left and right
SH2_DMA_TCR1 = NUM_SAMPS; // number longs
tmp = SH2_DMA_CHCR1; // read to make sure TE clear
SH2_DMA_CHCR1 = 0x18E1; // dest fixed, src incr, size long, ext req, dack mem to dev, dack hi, dack edge, dreq rising edge, cycle-steal, dual addr, intr disabled, clear TE, dma enabled
tmp = SH2_DMA_DMAOR; // read to clear various bits
SH2_DMA_DMAOR = 1; // enable DMA
FillSoundBuff(NUM_SAMPS);
// wait on DMA
while (!(SH2_DMA_CHCR1 & 2)) ; // wait on TE
SH2_DMA_CHCR1 = 0x18E0; // clear TE, dma disabled
SH2_DMA_DMAOR = 0; // disable DMA
// start DMA on second buffer and fill first
SH2_DMA_SAR1 = ((unsigned long)sndbuf + NUM_SAMPS * 2 * 2) | 0x20000000;
SH2_DMA_DAR1 = 0x20004034; // storing a long here will set left and right
SH2_DMA_TCR1 = NUM_SAMPS; // number longs
tmp = SH2_DMA_CHCR1; // read to make sure TE clear
SH2_DMA_CHCR1 = 0x18E1; // dest fixed, src incr, size long, ext req, dack mem to dev, dack hi, dack edge, dreq rising edge, cycle-steal, dual addr, intr disabled, clear TE, dma enabled
tmp = SH2_DMA_DMAOR; // read to clear various bits
SH2_DMA_DMAOR = 1; // enable DMA
FillSoundBuff(0);
// wait on DMA
while (!(SH2_DMA_CHCR1 & 2)) ; // wait on TE
SH2_DMA_CHCR1 = 0x18E0; // clear TE, dma disabled
SH2_DMA_DMAOR = 0; // disable DMA
}
}
}
Hmm. Well, my test code was for real hardware and did work. Try filling your buffer with random crap before you start - maybe the first dma is happening and it's just getting stuck in your end detection code. I can't see why that wouldn't work, but I've only tried this with interrupts, so I'm not 100% sure.
-
- Very interested
- Posts: 2984
- Joined: Fri Aug 17, 2007 9:33 pm
Thanks! I'll give it a try. Maybe try making the DMA int driven as well to see if that changes something. One thing I noticed in SEGA's test code they use in the diagnostic cart is they actually look for a timeout in the loop to check the DMA done. Maybe there's a bug in these SH2s that doesn't always set TE on the end of the transfer. In my code, if they miss just one, it's stuck forever.Snake wrote:Hmm. Well, my test code was for real hardware and did work. Try filling your buffer with random crap before you start - maybe the first dma is happening and it's just getting stuck in your end detection code. I can't see why that wouldn't work, but I've only tried this with interrupts, so I'm not 100% sure.
-
- Very interested
- Posts: 2984
- Joined: Fri Aug 17, 2007 9:33 pm
I tried interrupt driven DMA... same thing. Never ends. To be sure I wasn't getting a transfer error, I added a check for that.
Note - when interrupt driven, it used a check for a flag set in one of the comm registers by the dma interrupt.
Code: Select all
while (!(SH2_DMA_CHCR1 & 2) && !(SH2_DMA_DMAOR & 6)) ; // wait on TE, AE, or NMIF
-
- Very interested
- Posts: 2984
- Joined: Fri Aug 17, 2007 9:33 pm
Okay, my bad... my code was working fine. The problem was I forgot the "volatile" does nothing for caches.
The slave wasn't ever getting the sample to play. That doesn't affect emulations, but is crucial on real hardware. So some volatile pointer to variables ORd with 0x20000000 cleared things up.
Here's my little test app - just press A or B to play a sound.
DMAAudioTest32X.zip
EDIT: the slave() in the above has one bit of test code that isn't needed that I forgot about. You don't need to worry about setting the control reg after priming the PWM. You can do that before. Here's the way I do it (tested to be sure it still works on real hardware):
I should probably check if some of the "extra" stuff added (like reading certain regs to clear bits) is really needed. Also, the fillbuffer() could be made more efficient. This was merely a test to figure out how to do DMA audio. Have to say, it sounds nice - I'm playing 9 bit stereo samples at 44100Hz. The source sounds are 16 bit LE stereo samples at 44100 Hz.
The slave wasn't ever getting the sample to play. That doesn't affect emulations, but is crucial on real hardware. So some volatile pointer to variables ORd with 0x20000000 cleared things up.
Here's my little test app - just press A or B to play a sound.
DMAAudioTest32X.zip
EDIT: the slave() in the above has one bit of test code that isn't needed that I forgot about. You don't need to worry about setting the control reg after priming the PWM. You can do that before. Here's the way I do it (tested to be sure it still works on real hardware):
Code: Select all
void slave(void)
{
// init DMA
SH2_DMA_SAR0 = 0;
SH2_DMA_DAR0 = 0;
SH2_DMA_TCR0 = 0;
SH2_DMA_CHCR0 = 0;
SH2_DMA_DRCR0 = 0;
SH2_DMA_SAR1 = 0;
SH2_DMA_DAR1 = 0;
SH2_DMA_TCR1 = 0;
SH2_DMA_CHCR1 = 0;
SH2_DMA_DRCR1 = 0;
// init the sound hardware
MARS_PWM_CTRL = 0x0185; // TM = 1, RTP, RMD = right, LMD = left
if (MARS_VDP_DISPMODE & MARS_NTSC_FORMAT)
MARS_PWM_CYCLE = 23011361/44100 + 1; // 44.1kHz for NTSC clock
else
MARS_PWM_CYCLE = 22801467/44100 + 1; // 44.1kHz for PAL clock
while (1)
{
// only do sound when sound subsytem initialized
while (MARS_SYS_COMM4 != 0)
{
unsigned long tmp;
if (MARS_SYS_COMM4 == 1)
{
MARS_SYS_COMM4 = 2;
MARS_PWM_MONO = 1;
MARS_PWM_MONO = 1;
MARS_PWM_MONO = 1;
}
// start DMA on first buffer and fill second
SH2_DMA_SAR1 = (unsigned long)sndbuf | 0x20000000;
SH2_DMA_DAR1 = 0x20004034; // storing a long here will set left and right
SH2_DMA_TCR1 = NUM_SAMPS; // number longs
tmp = SH2_DMA_CHCR1; // read to make sure TE clear
SH2_DMA_CHCR1 = 0x18E1; // dest fixed, src incr, size long, ext req, dack mem to dev, dack hi, dack edge, dreq rising edge, cycle-steal, dual addr, intr disabled, clear TE, dma enabled
tmp = SH2_DMA_DMAOR; // read to clear various bits
SH2_DMA_DMAOR = 1; // enable DMA
FillSoundBuff(NUM_SAMPS);
// wait on DMA
while (!(SH2_DMA_CHCR1 & 2) && !(SH2_DMA_DMAOR & 6)) ; // wait on TE, AE, or NMIF
SH2_DMA_CHCR1 = 0x18E0; // clear TE, dma disabled
SH2_DMA_DMAOR = 0; // disable DMA
// start DMA on second buffer and fill first
SH2_DMA_SAR1 = ((unsigned long)sndbuf + NUM_SAMPS * 2 * 2) | 0x20000000;
SH2_DMA_DAR1 = 0x20004034; // storing a long here will set left and right
SH2_DMA_TCR1 = NUM_SAMPS; // number longs
tmp = SH2_DMA_CHCR1; // read to make sure TE clear
SH2_DMA_CHCR1 = 0x18E1; // dest fixed, src incr, size long, ext req, dack mem to dev, dack hi, dack edge, dreq rising edge, cycle-steal, dual addr, intr disabled, clear TE, dma enabled
tmp = SH2_DMA_DMAOR; // read to clear various bits
SH2_DMA_DMAOR = 1; // enable DMA
FillSoundBuff(0);
// wait on DMA
while (!(SH2_DMA_CHCR1 & 2) && !(SH2_DMA_DMAOR & 6)) ; // wait on TE, AE, or NMIF
SH2_DMA_CHCR1 = 0x18E0; // clear TE, dma disabled
SH2_DMA_DMAOR = 0; // disable DMA
}
}
}
-
- Very interested
- Posts: 2984
- Joined: Fri Aug 17, 2007 9:33 pm
A little bit of trimming plus a more efficient fillbuffer():
Code: Select all
void FillSoundBuff(int offset)
{
int ix;
short samp;
short *buf = (short *)(((unsigned long)sndbuf + offset * 4) | 0x20000000);
volatile int *aud_len = (int *)((unsigned int)&gAudioLen | 0x20000000);
volatile int *aud_buf = (int *)((unsigned int)&gAudioBuf | 0x20000000);
unsigned char *buffer = (unsigned char *)*aud_buf;
int iy = *aud_len;
if (iy >= NUM_SAMPS)
{
for (ix=0; ix<NUM_SAMPS; ix++)
{
samp = buffer[0] | (buffer[1] << 8);
buf[0] = (samp >> 7) + 258;
samp = buffer[2] | (buffer[3] << 8);
buf[1] = (samp >> 7) + 258;
buffer += 4;
buf += 2;
}
*aud_buf += NUM_SAMPS * 4;
*aud_len -= NUM_SAMPS;
}
else
{
for (ix=0; ix<iy; ix++)
{
samp = buffer[0] | (buffer[1] << 8);
buf[0] = (samp >> 7) + 258;
samp = buffer[2] | (buffer[3] << 8);
buf[1] = (samp >> 7) + 258;
buffer += 4;
buf += 2;
}
for (ix=iy; ix<NUM_SAMPS; ix++)
{
buf[0] = 258;
buf[1] = 258;
buf += 2;
}
*aud_len = 0;
}
}
void slave(void)
{
// init DMA
SH2_DMA_SAR0 = 0;
SH2_DMA_DAR0 = 0;
SH2_DMA_TCR0 = 0;
SH2_DMA_CHCR0 = 0;
SH2_DMA_DRCR0 = 0;
SH2_DMA_SAR1 = 0;
SH2_DMA_DAR1 = 0;
SH2_DMA_TCR1 = 0;
SH2_DMA_CHCR1 = 0;
SH2_DMA_DRCR1 = 0;
SH2_DMA_DMAOR = 1; // enable DMA
// init the sound hardware
MARS_PWM_CTRL = 0x0185; // TM = 1, RTP, RMD = right, LMD = left
if (MARS_VDP_DISPMODE & MARS_NTSC_FORMAT)
MARS_PWM_CYCLE = 23011361/44100 + 1; // 44.1kHz for NTSC clock
else
MARS_PWM_CYCLE = 22801467/44100 + 1; // 44.1kHz for PAL clock
while (1)
{
// only do sound when sound subsytem initialized
while (MARS_SYS_COMM4 != 0)
{
if (MARS_SYS_COMM4 == 1)
{
MARS_SYS_COMM4 = 2;
MARS_PWM_MONO = 1;
MARS_PWM_MONO = 1;
MARS_PWM_MONO = 1;
}
// start DMA on first buffer and fill second
SH2_DMA_SAR1 = (unsigned long)sndbuf | 0x20000000;
SH2_DMA_DAR1 = 0x20004034; // storing a long here will set left and right
SH2_DMA_TCR1 = NUM_SAMPS; // number longs
SH2_DMA_CHCR1 = 0x18E1; // dest fixed, src incr, size long, ext req, dack mem to dev, dack hi, dack edge, dreq rising edge, cycle-steal, dual addr, intr disabled, clear TE, dma enabled
FillSoundBuff(NUM_SAMPS);
// wait on DMA
while (!(SH2_DMA_CHCR1 & 2)) ; // wait on TE
SH2_DMA_CHCR1 = 0x18E0; // clear TE, dma disabled
// start DMA on second buffer and fill first
SH2_DMA_SAR1 = ((unsigned long)sndbuf + NUM_SAMPS * 4) | 0x20000000;
SH2_DMA_DAR1 = 0x20004034; // storing a long here will set left and right
SH2_DMA_TCR1 = NUM_SAMPS; // number longs
SH2_DMA_CHCR1 = 0x18E1; // dest fixed, src incr, size long, ext req, dack mem to dev, dack hi, dack edge, dreq rising edge, cycle-steal, dual addr, intr disabled, clear TE, dma enabled
FillSoundBuff(0);
// wait on DMA
while (!(SH2_DMA_CHCR1 & 2)) ; // wait on TE
SH2_DMA_CHCR1 = 0x18E0; // clear TE, dma disabled
}
}
}
Bah!
Well I'm glad you got it working, but I just wrote you a test app and I guess you don't need it now I thought it must be something simple because this really isn't that hard.
I did discover that I've broken support for stereo dma in Fusion at some point (it works but sounds rough). Dunno when I did that but I shall investigate and fix.
[edit] you can trim your code even further. Once you get the signal that the DMA has finished, all you need to do is write source address, length and do the $18E1 thing again. No need to write $18E0 or set the dest address again.
Well I'm glad you got it working, but I just wrote you a test app and I guess you don't need it now I thought it must be something simple because this really isn't that hard.
I did discover that I've broken support for stereo dma in Fusion at some point (it works but sounds rough). Dunno when I did that but I shall investigate and fix.
[edit] you can trim your code even further. Once you get the signal that the DMA has finished, all you need to do is write source address, length and do the $18E1 thing again. No need to write $18E0 or set the dest address again.
-
- Very interested
- Posts: 2984
- Joined: Fri Aug 17, 2007 9:33 pm
Yeah, I can see that the dest address should be pulled outside the loop as it never changes. As to the 0x18E0, I guess when I write 0x18E1, that clears TE at the same time. I wasn't sure if you were allowed to clear TE and set DE in the same write.Snake wrote:Bah!
Well I'm glad you got it working, but I just wrote you a test app and I guess you don't need it now I thought it must be something simple because this really isn't that hard.
I did discover that I've broken support for stereo dma in Fusion at some point (it works but sounds rough). Dunno when I did that but I shall investigate and fix.
[edit] you can trim your code even further. Once you get the signal that the DMA has finished, all you need to do is write source address, length and do the $18E1 thing again. No need to write $18E0 or set the dest address again.
It was funny - I've been doing DMA for quite some time and not realizing it. Usually, I'm good with remembering about the caches, but this time I totally blanked on it.
I noticed that Fusion was a bit rough on the sound, but wasn't sure where the issue came from. It may have to do with the fact that I'm doing LONGWORD DMA to the left channel, counting on the fact that the right channel immediately follows it in the map. So I'd check long stores to the 32X IO to see if it's properly doing a word to left and a word to right.
-
- Very interested
- Posts: 2984
- Joined: Fri Aug 17, 2007 9:33 pm
Yeah, it's odd that nothing uses DMA audio when that's clearly what you want. If you look at my double-buffer routine for the DMA audio, It's going to use less CPU time than interrupt driven audio, and gives more time to generate samples than polling the pwm fifo. While polling seems to have less overhead, that overhead is for every sample. The overhead on the DMA double-buffer is per buffer, not per sample.
You have at least 1 audio sample period to respond to the end of the DMA transfer (and perhaps as many as 3), and you have at least #samples-3 sample periods to fill the buffer. It does introduce a latency equal to the #samples * sample period, but for reasonably sized buffers, it's not going to be noticeable.
I'm going to work this into my next update to Wolf32X now that I got all the kinks worked out on my example. I wanted it in there originally, but had trouble getting it to work, so I went with polling the pwm fifo. It was enough time for adding several sound effects together, but nothing else. Double-buffered DMA will give me a lot more time, so I'll probably add MIDI playing to it.
You have at least 1 audio sample period to respond to the end of the DMA transfer (and perhaps as many as 3), and you have at least #samples-3 sample periods to fill the buffer. It does introduce a latency equal to the #samples * sample period, but for reasonably sized buffers, it's not going to be noticeable.
I'm going to work this into my next update to Wolf32X now that I got all the kinks worked out on my example. I wanted it in there originally, but had trouble getting it to work, so I went with polling the pwm fifo. It was enough time for adding several sound effects together, but nothing else. Double-buffered DMA will give me a lot more time, so I'll probably add MIDI playing to it.
It's really annoying. Given that none of the RAM in the 32X is really fast enough, DMA is a complete waste of time - except for this purpose, where its - well - awesome. The perfect solution, and as you say, the PWM quality is outstanding. But instead we get low quality audio because nobody wanted to waste the CPU time doing this any other way.
I believe the reason it's not used in pretty much every game is because it wasn't made at all obvious that you could do this. I'd definitely have used it myself had I known it could be done back then.
Real shame...
I believe the reason it's not used in pretty much every game is because it wasn't made at all obvious that you could do this. I'd definitely have used it myself had I known it could be done back then.
Real shame...