68K to SH2 DMA FIFO
Moderator: BigEvilCorporation
-
- Interested
- Posts: 24
- Joined: Wed Feb 03, 2010 12:53 am
- Location: Grimsby, England
68K to SH2 DMA FIFO
So, I'm working on a 32X project and have been for a little while now, and I need a method to transfer a large amount of data (Too large for the communication registers, possibly up to around 600 bytes) to the SH2 from the 68k.
I read that there is a DMA channel for 68k-SH2 transfers but I have no idea how this works, could this be explained? I've read the 32X hardware documentation but it hasn't really helped me that much,
I read that there is a DMA channel for 68k-SH2 transfers but I have no idea how this works, could this be explained? I've read the 32X hardware documentation but it hasn't really helped me that much,
-
- Very interested
- Posts: 2984
- Joined: Fri Aug 17, 2007 9:33 pm
The "DMA" channel is built into the interface chip in the 32X. The MD side stores to a FIFO register, the interface chip assert DREQ0 to the SH2 (either one), DMA channel 0 then fetches the data and stores it to ram.
At least, that's the way it's SUPPOSED to work. I've not found a way to use the MD-to-32X DMA that doesn't lose data. It just doesn't work the way it's supposed to.
In the end, I used the COMM registers to pass all the data from the MD to the 32X. It's pretty fast. I pass up to 128KB this way. You can find my code in the CD32X MOD player. It also has the DMA code, but it's disabled due to it not working.
Another thing you could do - pass the data using the frame buffer. Switch the VDP to the MD side, store the data in the frame buffer, then switch it back to the 32X side. Don't forget that if you write the frame buffer using bytes, you cannot write 0x00. If you try, it leaves the data already in the frame buffer as it is. Basically, the frame buffer works like the overwrite buffer when stored as bytes - the overwrite buffer works on words.
At least, that's the way it's SUPPOSED to work. I've not found a way to use the MD-to-32X DMA that doesn't lose data. It just doesn't work the way it's supposed to.
In the end, I used the COMM registers to pass all the data from the MD to the 32X. It's pretty fast. I pass up to 128KB this way. You can find my code in the CD32X MOD player. It also has the DMA code, but it's disabled due to it not working.
Another thing you could do - pass the data using the frame buffer. Switch the VDP to the MD side, store the data in the frame buffer, then switch it back to the 32X side. Don't forget that if you write the frame buffer using bytes, you cannot write 0x00. If you try, it leaves the data already in the frame buffer as it is. Basically, the frame buffer works like the overwrite buffer when stored as bytes - the overwrite buffer works on words.
In my SuperVDP, I upload the palette from Genesis to 32X. Here's how it runs.
68k side :
32x side :
On the Slave side, I've only done some cache purge, but I guess that's not the point here ;)
68k side :
Code: Select all
move.l #$FF4000,a0 ; Source address (Genny side)
move.l #$0603BE00,d0 ; Destination address (32x memory map)
move #256,d1 ; 256 16-bit words to write
bsr DMAupload
...
DMAupload:
move.l a1,-(a7) ; put a1 on the stack
move #0,$A15106 ; Abort DMA
move.l d0,$A1510C ; Set 68K to SH DREQ Destination Address Register
move.w d1,$A15110 ; Set 68K to SH DREQ Length Register
move.w #$3,$A15102 ; Interrupt both CPU : they need to purge cache
moveq #5,d0
waitIntAck:
dbra d0,waitIntAck ; wait 5 68k cycles
lsr.w #2,d1 ; each DMA loop sends 4 values
subq #1,d1 ; DBRA exits when -1
move.l #$A15112,a1 ; a1 holds FIFO
move.b #4,$A15107 ; Set CPU Write (68k writes data in FIFO)
DMAloop:
move.w (a0)+,(a1)
move.w (a0)+,(a1)
move.w (a0)+,(a1)
move.w (a0)+,(a1)
dbra d1,DMAloop ; this instruction takes 10 cycles. It's enough for the FIFO to empty
move.l (a7)+,a1 ; retrieve a1 from the stack
rts
Code: Select all
CMD_M:
MOV.L R0,@-R15 ; put registers on the stack
MOV.L R5,@-R15
MOV.L R6,@-R15
MOV.L R7,@-R15
MOV.L R8,@-R15
* Init DMAC Registers - I use DMA channel 0 (FFFFFF80h)
MOV #0,R0
MOV #$80,R8 ; CHCR0 = FFFFFF8Ch
MOV.L R0,@($C,R8) ; Abort current DMA
STC GBR,R0
ADD #$12,R0 ; DREQ_FIFO = 20004012h
MOV.L R0,@R8 ; Set Source Address Register
STC GBR,R0
MOV.L @($C,R0),R0 ; DREQ_DEST = 2000400Ch - Indirection
MOV.L SDRAM,R5 ; Read Destination address register the Genny has said
ADD R5,R0 ; higher byte is cleared. we have to manually add 0x06000000
MOV R0,R6 ; Save for later cache purge
MOV.L R0,@(4,R8) ; Set actual Dest Address Register
STC GBR,R0
MOV.W @($10,R0),R0 ; DREQ_LEN = 20004010h - Indirection. Read number of writes the Genny asked
MOV R0,R7 ; Save for later purge
MOV.L R0,@(8,R8) ; Set Transfer Count Register
MOV.W cmdDREQ,R0
MOV.L R0,@($C,R8) ; Start DMA
MOV #0,R0
MOV.W R0,@($1A,GBR) ; Clears CMD Int - no more INT pending
* All right, DMA is done, now we got to purge the addresses written by DMAC
* To purge an address, you simply write #0 to 0xC0000000 || address
; R7 = DREQ LEN = nombre de 16-bit words à transférer
SHLL R7 ; R7 = number of bytes to wrtie
SHLR2 R7
SHLR2 R7 ; R7 /= 16, we purge 16 bytes each
EXTU.W R7,R7
MOV #$F0,R0 ; hide 4 weak bits ...
AND R0,R6 ; ... because we purge 16 bytes each
MOV.L PURGE,R0
OR R0,R6 ; Dest |= PURGE
MOV #0,R0
MOV.L R0,@R6 ; you purge even if you've transferred less than 16 bytes
CMP/PL R7
BF noDMAM ; if (R7<=0), DT will loop. We avoid this situation
purgeLoopM:
ADD #16,R6
DT R7
BF/S purgeLoopM
MOV.L R0,@R6
noDMAM:
MOV.L @R15+,R8
MOV.L @R15+,R7 ; pop registers from stack
MOV.L @R15+,R6
MOV.L @R15+,R5
MOV.L @R15+,R0
RTE
NOP
.align 4
SDRAM: DC.L $06000000
PURGE: DC.L $40000000
cmdDREQ: DC.W $44E1
-
- Interested
- Posts: 24
- Joined: Wed Feb 03, 2010 12:53 am
- Location: Grimsby, England
-
- Very interested
- Posts: 2984
- Joined: Fri Aug 17, 2007 9:33 pm
-
- Very interested
- Posts: 2984
- Joined: Fri Aug 17, 2007 9:33 pm
I need to disassemble a number of 32X games to look for things actual working programs do. This is one of those things to check into more closely.
We know from the tech bulletins that the SCD word ram <> 32X DMA is buggy, but it doesn't say anything about the 68K <> 32X DMA other than the original 1.0 version dev chipset doesn't support arbitrary lengths - you need to use less than 256 words. That was supposedly fixed by v1.1, and the production models all use 2.0a rev chips (or later).
There's something going on here that's not in the example code we have and not documented, rather like the SCD PCM chip handling - all the sample code is wrong. I had to disassemble working SCD code to see how you REALLY handle the PCM chip. So even when we have example code from SEGA, it isn't always working code.
We know from the tech bulletins that the SCD word ram <> 32X DMA is buggy, but it doesn't say anything about the 68K <> 32X DMA other than the original 1.0 version dev chipset doesn't support arbitrary lengths - you need to use less than 256 words. That was supposedly fixed by v1.1, and the production models all use 2.0a rev chips (or later).
There's something going on here that's not in the example code we have and not documented, rather like the SCD PCM chip handling - all the sample code is wrong. I had to disassemble working SCD code to see how you REALLY handle the PCM chip. So even when we have example code from SEGA, it isn't always working code.
-
- Interested
- Posts: 24
- Joined: Wed Feb 03, 2010 12:53 am
- Location: Grimsby, England
Opened Knuckles Chaotix in IDA PRO and searched for accesses to the FIFO Register. (Trying to find examples of how games use it)
Found this
68K:
SH2
Which is called by this interrupt handler
I haven't looked into other games yet.
EDIT: I just did a quick search for $A15112 across my 32X rom set in Hex Workshop, the only games I had hits in are Chaotix, Primal Rage, and Virtua Racing, so it doesn't seem that it was used much at all. Maybe because of documentation being incorrect?
Found this
68K:
Code: Select all
ROM:00883202 sub_883202:
ROM:00883202 movea.w ($FFFFD01E).w,a1
ROM:00883206 lea ($FFFFD45E).w,a0
ROM:0088320A move.w a0,($FFFFD01E).w
ROM:0088320E clr.w (a0)+
ROM:00883210 move.l a0,d7
ROM:00883212 sub.l a1,d7
ROM:00883214 addq.w #7,d7
ROM:00883216 andi.w #$FFF8,d7
ROM:0088321A lsr.w #1,d7
ROM:0088321C lea ($A15112).l,a0
ROM:00883222 move sr,d6
ROM:00883224 move #$2700,sr
ROM:00883228 move.w d7,-2(a0)
ROM:0088322C move.b #4,-$B(a0)
ROM:00883232 move.b #1,($A15103).l
ROM:0088323A
ROM:0088323A loc_88323A: ; CODE XREF: sub_883202+40j
ROM:0088323A btst #0,($A15103).l
ROM:00883242 bne.s loc_88323A
ROM:00883244 lsr.w #2,d7
ROM:00883246 subq.w #1,d7
ROM:00883248
ROM:00883248 loc_883248: ; CODE XREF: sub_883202+54j
ROM:00883248 move.w (a1)+,(a0)
ROM:0088324A move.w (a1)+,(a0)
ROM:0088324C move.w (a1)+,(a0)
ROM:0088324E move.w (a1)+,(a0)
ROM:00883250
ROM:00883250 loc_883250: ; CODE XREF: sub_883202+52j
ROM:00883250 tst.b -$B(a0)
ROM:00883254 bmi.s loc_883250
ROM:00883256 dbf d7,loc_883248
ROM:0088325A st ($FFFFFCE7).w
ROM:0088325E move d6,sr
ROM:00883260 rts
Code: Select all
ROM:06001334 sub_6001334: ; DATA XREF: ROM:060001E8o
ROM:06001334 mov.w word_6001364, r1 ; h'FFFFFE10
ROM:06001336 mov.b @(7,r1), r0
ROM:06001338 xor #2, r0
ROM:0600133A mov.b r0, @(7,r1)
ROM:0600133C mov.l dword_600136C, r0 ; h'2000401A
ROM:0600133E mov.w r0, @r0
ROM:06001340 mov #-h'80, r1
ROM:06001342 mov.w word_6001366, r0 ; h'44E0
ROM:06001344 mov.l r0, @(h'C,r1)
ROM:06001346 mov.l dword_6001370, r0 ; h'20004012
ROM:06001348 mov.l r0, @(0,r1)
ROM:0600134A mov.l off_6001374, r0 ; off_6003814
ROM:0600134C mov.l @r0, r0
ROM:0600134E mov.l r0, @(4,r1)
ROM:06001350 mov.l dword_6001378, r0 ; h'20004010
ROM:06001352 mov.w @r0, r0
ROM:06001354 mov.l r0, @(8,r1)
ROM:06001356 mov.l @(h'C,r1), r0
ROM:06001358 mov.w word_6001368, r0 ; h'44E1
ROM:0600135A mov.l r0, @(h'C,r1)
ROM:0600135C mov.l @(h'30,r1), r0
ROM:0600135E mov.l dword_600137C, r0 ; 1
ROM:06001360 rts
ROM:06001362 mov.l r0, @(h'30,r1)
ROM:06001362 ; End of function sub_6001334
ROM:06001362
ROM:06001362 ; ---------------------------------------------------------------------------
ROM:06001364 word_6001364: .data.w h'FE10 ; DATA XREF: sub_6001334r
ROM:06001366 word_6001366: .data.w h'44E0 ; DATA XREF: sub_6001334+Er
ROM:06001368 word_6001368: .data.w h'44E1 ; DATA XREF: sub_6001334+24r
ROM:0600136A .data.b 0
ROM:0600136B .data.b 0
ROM:0600136C dword_600136C: .data.l h'2000401A ; DATA XREF: sub_6001334+8r
ROM:06001370 dword_6001370: .data.l h'20004012 ; DATA XREF: sub_6001334+12r
ROM:06001374 off_6001374: .data.l off_6003814 ; DATA XREF: sub_6001334+16r
ROM:06001378 dword_6001378: .data.l h'20004010 ; DATA XREF: sub_6001334+1Cr
ROM:0600137C dword_600137C: .data.l 1 ; DATA XREF: sub_6001334+2Ar
Code: Select all
ROM:060001B0 sub_60001B0: ; DATA XREF: ROM:06000100o
ROM:060001B0 ; ROM:06000104o ...
ROM:060001B0 mov.l r0, @-r15
ROM:060001B2 stc sr, r0
ROM:060001B4 mov.l r1, @-r15
ROM:060001B6 shlr2 r0
ROM:060001B8 mov.l off_60001D4, r1 ; off_60001D8
ROM:060001BA shlr r0
ROM:060001BC and #h'1C, r0
ROM:060001BE mov.l @(r0,r1), r1
ROM:060001C0 sts.l pr, @-r15
ROM:060001C2 mov.w word_60001D2, r0 ; h'F0
ROM:060001C4 jsr @r1
ROM:060001C6 ldc r0, sr
ROM:060001C8 lds.l @r15+, pr
ROM:060001CA mov.l @r15+, r1
ROM:060001CC mov.l @r15+, r0
ROM:060001CE rte
ROM:060001D0 nop
ROM:060001D0 ; End of function sub_60001B0
ROM:060001D0
ROM:060001D0 ; ---------------------------------------------------------------------------
ROM:060001D2 word_60001D2: .data.w h'F0 ; DATA XREF: sub_60001B0+12r
ROM:060001D4 off_60001D4: .data.l off_60001D8 ; DATA XREF: sub_60001B0+8r
ROM:060001D8 off_60001D8: .data.l sub_6000240 ; DATA XREF: sub_60001B0+8o
ROM:060001D8 ; ROM:off_60001D4o
ROM:060001DC .data.l sub_6000240
ROM:060001E0 .data.l sub_6000240
ROM:060001E4 .data.l sub_6000240
ROM:060001E8 .data.l sub_6001334
ROM:060001EC .data.l Error
ROM:060001F0 off_60001F0: .data.l unk_6001280
ROM:060001F0
ROM:060001F4 .data.l unk_6001250
ROM:060001F8
EDIT: I just did a quick search for $A15112 across my 32X rom set in Hex Workshop, the only games I had hits in are Chaotix, Primal Rage, and Virtua Racing, so it doesn't seem that it was used much at all. Maybe because of documentation being incorrect?
-
- Very interested
- Posts: 2984
- Joined: Fri Aug 17, 2007 9:33 pm
There's probably some EXACT procedure you have to use to make it work properly. As I found on my experiments, just simply following the docs isn't enough to get it to work right. I'll look over this code for the differences to what I do and see if I can't figure out the crucial difference that make it work.
Saw this and had some free time, so I put my Dreamcast skills to use. Here is a rather direct pseudo-C translation of the SH2 code:
And here is a somewhat cleaned up version.
Hopefully, I haven't made any mistakes.
Code: Select all
void sub_6001334()
{
static int dword_600137C = 1; //might be externally modified? or written funny for debugging
volatile char *r1a = 0xFFFFFE10;
r1a[7] = r1a[7] ^ 2; //write to timer compare control reg (toggle timer a match output level)
*(*volatile short)0x2000401A = 0x401a; //does write to cmd irq clear reg (any value works)
int r0;
volatile int *r1 = 0xFFFFFF80;
r1[3] = 0x000044E0; //write to dma ch0 control reg
r1[0] = 0x20004012; //write to dma ch0 src addr reg
r1[1] = *(int*)off_6003814; //write to dma ch0 dst addr reg
r1[2] = *(short*)0x20004010; //write to dma ch0 xfr length reg
r0 = r1[3]; //dummy read from dma ch0 control reg
r1[3] = 0x000044E1; //write to dma ch0 control reg
r0 = r1[12]; //dummy read
r1[12] = dword_600137C; //write to shared dma op reg (dma enable, fixed priority, no address error)
}
typedef void funcpntr_t ();
void sub_60001B0() //irq
{
const static funcpntr_t table[] = {
sub_6000240, sub_6000240, sub_6000240, sub_6000240,
sub_6001334, Error, unk_6001280, unk_6001250 };
unsigned int temp = (get_sr() >> 3) & 0x1c;
set_sr(0xf0);
table[temp >> 2]();
}
Code: Select all
void sub_6001334_clean()
{
*tcr ^= 2; //write to timer compare control reg (toggle timer a match output level)
cmd_irq_clear();
/* 16-bit (accessed as 32-bit) dma ch0 control reg value explained, from highest bit to lowest:
destination address incremented (dm = binary 01)
source address fixed (sm = binary 00)
2-byte units transfered (ts = binary 01)
module request mode (ar = 0)
send dack during read cycle (am = 0)
dack is active-high (al = 1)
dreq is edge sensistive (ds = 1)
dreq is detected by edge rise (dl = 1)
cycle-steal mode (tb = 0)
dual address mode (ta = 0)
dma irq disabled (ie = 0)
transfer end bit clear, (te = 0) signals dma as successfully completed.
to clear, first read chcr0 when te = 1, then write 0 to te bit
cannot start new transfers when bit is set
transfer enable bit (de = 0/1)
*/
*chcr0 = 0x000044E0; //write to dma ch0 control reg /w dma disabled
*sar0 = fifo_addr; //set dma ch0 src addr reg to fifo
*dar0 = *(int*)off_6003814; //write global pointer to dma ch0 dst addr reg
*tcr0 = *dreqlength; //write 68 to sh dma request length to dma ch0 xfr length reg
(void)*chcr0; //dummy read from dma ch0 control reg to clear te bit
*chcr0 = 0x000044E1; //write to dma ch0 control reg /w dma enabled
(void)*dmacr; //dummy read from shared dma op reg to clear address error and nmi bits
*dmacr = 1; //write to shared dma op reg (dma enable, fixed priority, clear address error and nmi)
}
typedef void funcpntr_t ();
void sub_60001B0_clean() //irq
{
const static funcpntr_t table[] = {
sub_6000240, sub_6000240, sub_6000240, sub_6000240,
sub_6001334, Error, unk_6001280, unk_6001250 };
unsigned int temp = get_irq_mask() % 8;
set_sr(0xf0); //irq mask = 15
table[temp]();
}
-
- Very interested
- Posts: 2984
- Joined: Fri Aug 17, 2007 9:33 pm
I looked at Night Trap 32X and found the xfer code in the nt.bin file. It rounds the length up to 4 words (same as Chaotix code, and the max number of words in the FIFO), and then goes into a loop where it sends a max of $400 bytes at a time. It sends all $400 (or less if less data) at once, and then waits for the SH2 to set the COMM register to "GOOD" if it got all the data, "AGIN" if it failed, or times out and returns an error. So apparently, there's a max amount of data you can send before it fails completely, but even then, it may fail anyway. I imagine $400 was their compromise between seeing a repeat command and sending enough data for good xfer rates.
If I had to guess, I'd think the Chaotix perhaps does the limiting and repeat check in the code that calls the xfer routine documented in the previous post. There are few games that use this feature, so I'm guessing it's really pretty buggy.
I'm going to try using the SH2 to read the FIFO and store the data and see if that's reliable. Using the DMA certainly isn't.
EDIT: There doesn't seem to be a way to tell if there's any data ready at the FIFO on the SH2 side. Oh well. Guess I'll try to alter my code to do limited lengths and automatic retries and see how well it works.
If I had to guess, I'd think the Chaotix perhaps does the limiting and repeat check in the code that calls the xfer routine documented in the previous post. There are few games that use this feature, so I'm guessing it's really pretty buggy.
I'm going to try using the SH2 to read the FIFO and store the data and see if that's reliable. Using the DMA certainly isn't.
EDIT: There doesn't seem to be a way to tell if there's any data ready at the FIFO on the SH2 side. Oh well. Guess I'll try to alter my code to do limited lengths and automatic retries and see how well it works.