68K to SH2 DMA FIFO

Ask anything your want about the 32X Mushroom programming.

Moderator: BigEvilCorporation

Post Reply
SoullessSentinel
Interested
Posts: 24
Joined: Wed Feb 03, 2010 12:53 am
Location: Grimsby, England

68K to SH2 DMA FIFO

Post by SoullessSentinel » Thu May 23, 2013 11:45 pm

So, I'm working on a 32X project and have been for a little while now, and I need a method to transfer a large amount of data (Too large for the communication registers, possibly up to around 600 bytes) to the SH2 from the 68k.

I read that there is a DMA channel for 68k-SH2 transfers but I have no idea how this works, could this be explained? I've read the 32X hardware documentation but it hasn't really helped me that much,

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Fri May 24, 2013 3:31 am

The "DMA" channel is built into the interface chip in the 32X. The MD side stores to a FIFO register, the interface chip assert DREQ0 to the SH2 (either one), DMA channel 0 then fetches the data and stores it to ram.

At least, that's the way it's SUPPOSED to work. I've not found a way to use the MD-to-32X DMA that doesn't lose data. It just doesn't work the way it's supposed to.

In the end, I used the COMM registers to pass all the data from the MD to the 32X. It's pretty fast. I pass up to 128KB this way. You can find my code in the CD32X MOD player. It also has the DMA code, but it's disabled due to it not working.

Another thing you could do - pass the data using the frame buffer. Switch the VDP to the MD side, store the data in the frame buffer, then switch it back to the 32X side. Don't forget that if you write the frame buffer using bytes, you cannot write 0x00. If you try, it leaves the data already in the frame buffer as it is. Basically, the frame buffer works like the overwrite buffer when stored as bytes - the overwrite buffer works on words.

ob1
Very interested
Posts: 463
Joined: Wed Dec 06, 2006 9:01 am
Location: Aix-en-Provence, France

Post by ob1 » Fri May 24, 2013 12:46 pm

In my SuperVDP, I upload the palette from Genesis to 32X. Here's how it runs.

68k side :

Code: Select all

	move.l	#$FF4000,a0          ; Source address (Genny side)
	move.l	#$0603BE00,d0             ; Destination address (32x memory map)
	move	#256,d1                    ; 256 16-bit words to write
	bsr	DMAupload
...
DMAupload:
	move.l	a1,-(a7)            ; put a1 on the stack

	move	#0,$A15106		; Abort DMA
	move.l	d0,$A1510C		; Set 68K to SH DREQ Destination Address Register
	move.w	d1,$A15110		; Set 68K to SH DREQ Length Register

 	move.w	#$3,$A15102		; Interrupt both CPU : they need to purge cache

	moveq	#5,d0
waitIntAck:
	dbra	d0,waitIntAck		; wait 5 68k cycles

	lsr.w	#2,d1			; each DMA loop sends 4 values
	subq	#1,d1			; DBRA exits when -1
	move.l	#$A15112,a1		; a1 holds FIFO
	move.b	#4,$A15107		; Set CPU Write (68k writes data in FIFO)
DMAloop:
	move.w	(a0)+,(a1)
	move.w	(a0)+,(a1)
	move.w	(a0)+,(a1)
	move.w	(a0)+,(a1)
	dbra	d1,DMAloop		; this instruction takes 10 cycles. It's enough for the FIFO to empty

	move.l	(a7)+,a1         ; retrieve a1 from the stack
	rts
32x side :

Code: Select all

CMD_M:
	MOV.L	R0,@-R15     ; put registers on the stack
	MOV.L	R5,@-R15
	MOV.L	R6,@-R15
	MOV.L	R7,@-R15
	MOV.L	R8,@-R15

* Init DMAC Registers - I use DMA channel 0 (FFFFFF80h)
	MOV	#0,R0

	MOV	#$80,R8		; CHCR0 = FFFFFF8Ch
	MOV.L	R0,@($C,R8)	; Abort current DMA

	STC	GBR,R0
	ADD	#$12,R0		; DREQ_FIFO = 20004012h
	MOV.L	R0,@R8		; Set Source Address Register

	STC	GBR,R0
	MOV.L	@($C,R0),R0	; DREQ_DEST = 2000400Ch - Indirection
	MOV.L	SDRAM,R5         ; Read Destination address register the Genny has said
	ADD	R5,R0                       ; higher byte is cleared. we have to manually add 0x06000000
	MOV	R0,R6		; Save for later cache purge
	MOV.L	R0,@(4,R8)	; Set actual Dest Address Register

	STC	GBR,R0
	MOV.W	@($10,R0),R0	; DREQ_LEN = 20004010h - Indirection. Read number of writes the Genny asked
	MOV	R0,R7		; Save for later purge
	MOV.L	R0,@(8,R8)	; Set Transfer Count Register

	 MOV.W	cmdDREQ,R0
	 MOV.L	R0,@($C,R8)	; Start DMA

	MOV	#0,R0
	MOV.W	R0,@($1A,GBR)	; Clears CMD Int - no more INT pending

* All right, DMA is done, now we got to purge the addresses written by DMAC
* To purge an address, you simply write #0 to 0xC0000000 || address

				; R7 = DREQ LEN = nombre de 16-bit words à transférer
	SHLL	R7		; R7 = number of bytes to wrtie
	SHLR2	R7
	SHLR2	R7		; R7 /= 16, we purge 16 bytes each
	EXTU.W	R7,R7
	MOV	#$F0,R0		; hide 4 weak bits ...
	AND	R0,R6		; ... because we purge 16 bytes each
	MOV.L	PURGE,R0
	OR	R0,R6		; Dest |= PURGE


	MOV	#0,R0
	MOV.L	R0,@R6		; you purge even if you've transferred less than 16 bytes
	CMP/PL	R7
	BF	noDMAM		; if (R7<=0), DT will loop. We avoid this situation
purgeLoopM:
	ADD	#16,R6
	DT	R7
	BF/S	purgeLoopM
	MOV.L	R0,@R6
noDMAM:


	MOV.L	@R15+,R8
	MOV.L	@R15+,R7       ; pop registers from stack
	MOV.L	@R15+,R6
	MOV.L	@R15+,R5
	MOV.L	@R15+,R0
	RTE
	NOP

	.align	4
SDRAM:		DC.L	$06000000
PURGE:		DC.L	$40000000
cmdDREQ:	DC.W	$44E1
On the Slave side, I've only done some cache purge, but I guess that's not the point here ;)

SoullessSentinel
Interested
Posts: 24
Joined: Wed Feb 03, 2010 12:53 am
Location: Grimsby, England

Post by SoullessSentinel » Fri May 24, 2013 3:37 pm

ob1, has this been verified on real hardware, as Chilly Willy's post above states that he couldn't get the FIFO DMA to function without some data loss?

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Fri May 24, 2013 6:04 pm

I have not been able to get the SuperVDP demo to work on real hardware - it just sits on a black screen. The issue is the DMA - while the DMA works on an emulator, it loses data on a real 32X and hence winds up stuck waiting for the DMA to finish.

ob1
Very interested
Posts: 463
Joined: Wed Dec 06, 2006 9:01 am
Location: Aix-en-Provence, France

Post by ob1 » Sun May 26, 2013 11:08 am

Unfortunately, I guess Chilly is right : I've never tested all this stuff on real hardware, and I'm afraid it won't run on actual 32x.

I can't say which part doesn't run though, since, there are a lot of DMA operations here.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Sun May 26, 2013 5:53 pm

I need to disassemble a number of 32X games to look for things actual working programs do. This is one of those things to check into more closely.

We know from the tech bulletins that the SCD word ram <> 32X DMA is buggy, but it doesn't say anything about the 68K <> 32X DMA other than the original 1.0 version dev chipset doesn't support arbitrary lengths - you need to use less than 256 words. That was supposedly fixed by v1.1, and the production models all use 2.0a rev chips (or later).

There's something going on here that's not in the example code we have and not documented, rather like the SCD PCM chip handling - all the sample code is wrong. I had to disassemble working SCD code to see how you REALLY handle the PCM chip. So even when we have example code from SEGA, it isn't always working code.

SoullessSentinel
Interested
Posts: 24
Joined: Wed Feb 03, 2010 12:53 am
Location: Grimsby, England

Post by SoullessSentinel » Mon Jul 01, 2013 11:27 am

Opened Knuckles Chaotix in IDA PRO and searched for accesses to the FIFO Register. (Trying to find examples of how games use it)

Found this

68K:

Code: Select all

ROM:00883202 sub_883202:
ROM:00883202                 movea.w ($FFFFD01E).w,a1
ROM:00883206                 lea     ($FFFFD45E).w,a0
ROM:0088320A                 move.w  a0,($FFFFD01E).w
ROM:0088320E                 clr.w   (a0)+
ROM:00883210                 move.l  a0,d7
ROM:00883212                 sub.l   a1,d7
ROM:00883214                 addq.w  #7,d7
ROM:00883216                 andi.w  #$FFF8,d7
ROM:0088321A                 lsr.w   #1,d7
ROM:0088321C                 lea     ($A15112).l,a0
ROM:00883222                 move    sr,d6
ROM:00883224                 move    #$2700,sr     
ROM:00883228                 move.w  d7,-2(a0)
ROM:0088322C                 move.b  #4,-$B(a0)
ROM:00883232                 move.b  #1,($A15103).l
ROM:0088323A
ROM:0088323A loc_88323A:                             ; CODE XREF: sub_883202+40j
ROM:0088323A                 btst    #0,($A15103).l
ROM:00883242                 bne.s   loc_88323A
ROM:00883244                 lsr.w   #2,d7
ROM:00883246                 subq.w  #1,d7
ROM:00883248
ROM:00883248 loc_883248:                             ; CODE XREF: sub_883202+54j
ROM:00883248                 move.w  (a1)+,(a0)
ROM:0088324A                 move.w  (a1)+,(a0)
ROM:0088324C                 move.w  (a1)+,(a0)
ROM:0088324E                 move.w  (a1)+,(a0)
ROM:00883250
ROM:00883250 loc_883250:                             ; CODE XREF: sub_883202+52j
ROM:00883250                 tst.b   -$B(a0)
ROM:00883254                 bmi.s   loc_883250
ROM:00883256                 dbf     d7,loc_883248
ROM:0088325A                 st      ($FFFFFCE7).w
ROM:0088325E                 move    d6,sr
ROM:00883260                 rts
SH2

Code: Select all

ROM:06001334 sub_6001334:                            ; DATA XREF: ROM:060001E8o
ROM:06001334                 mov.w   word_6001364, r1 ; h'FFFFFE10
ROM:06001336                 mov.b   @(7,r1), r0
ROM:06001338                 xor     #2, r0
ROM:0600133A                 mov.b   r0, @(7,r1)
ROM:0600133C                 mov.l   dword_600136C, r0 ; h'2000401A
ROM:0600133E                 mov.w   r0, @r0
ROM:06001340                 mov     #-h'80, r1
ROM:06001342                 mov.w   word_6001366, r0 ; h'44E0
ROM:06001344                 mov.l   r0, @(h'C,r1)
ROM:06001346                 mov.l   dword_6001370, r0 ; h'20004012
ROM:06001348                 mov.l   r0, @(0,r1)
ROM:0600134A                 mov.l   off_6001374, r0 ; off_6003814
ROM:0600134C                 mov.l   @r0, r0
ROM:0600134E                 mov.l   r0, @(4,r1)
ROM:06001350                 mov.l   dword_6001378, r0 ; h'20004010
ROM:06001352                 mov.w   @r0, r0
ROM:06001354                 mov.l   r0, @(8,r1)
ROM:06001356                 mov.l   @(h'C,r1), r0
ROM:06001358                 mov.w   word_6001368, r0 ; h'44E1
ROM:0600135A                 mov.l   r0, @(h'C,r1)
ROM:0600135C                 mov.l   @(h'30,r1), r0
ROM:0600135E                 mov.l   dword_600137C, r0 ; 1
ROM:06001360                 rts
ROM:06001362                 mov.l   r0, @(h'30,r1)
ROM:06001362 ; End of function sub_6001334
ROM:06001362
ROM:06001362 ; ---------------------------------------------------------------------------
ROM:06001364 word_6001364:   .data.w h'FE10          ; DATA XREF: sub_6001334r
ROM:06001366 word_6001366:   .data.w h'44E0          ; DATA XREF: sub_6001334+Er
ROM:06001368 word_6001368:   .data.w h'44E1          ; DATA XREF: sub_6001334+24r
ROM:0600136A                 .data.b    0
ROM:0600136B                 .data.b    0
ROM:0600136C dword_600136C:  .data.l h'2000401A      ; DATA XREF: sub_6001334+8r
ROM:06001370 dword_6001370:  .data.l h'20004012      ; DATA XREF: sub_6001334+12r
ROM:06001374 off_6001374:    .data.l off_6003814     ; DATA XREF: sub_6001334+16r
ROM:06001378 dword_6001378:  .data.l h'20004010      ; DATA XREF: sub_6001334+1Cr
ROM:0600137C dword_600137C:  .data.l 1               ; DATA XREF: sub_6001334+2Ar
Which is called by this interrupt handler

Code: Select all

ROM:060001B0 sub_60001B0:                            ; DATA XREF: ROM:06000100o
ROM:060001B0                                         ; ROM:06000104o ...
ROM:060001B0                 mov.l   r0, @-r15
ROM:060001B2                 stc     sr, r0
ROM:060001B4                 mov.l   r1, @-r15
ROM:060001B6                 shlr2   r0
ROM:060001B8                 mov.l   off_60001D4, r1 ; off_60001D8
ROM:060001BA                 shlr    r0
ROM:060001BC                 and     #h'1C, r0
ROM:060001BE                 mov.l   @(r0,r1), r1
ROM:060001C0                 sts.l   pr, @-r15
ROM:060001C2                 mov.w   word_60001D2, r0 ; h'F0
ROM:060001C4                 jsr     @r1  
ROM:060001C6                 ldc     r0, sr
ROM:060001C8                 lds.l   @r15+, pr
ROM:060001CA                 mov.l   @r15+, r1
ROM:060001CC                 mov.l   @r15+, r0
ROM:060001CE                 rte
ROM:060001D0                 nop
ROM:060001D0 ; End of function sub_60001B0
ROM:060001D0
ROM:060001D0 ; ---------------------------------------------------------------------------
ROM:060001D2 word_60001D2:   .data.w h'F0            ; DATA XREF: sub_60001B0+12r
ROM:060001D4 off_60001D4:    .data.l off_60001D8     ; DATA XREF: sub_60001B0+8r
ROM:060001D8 off_60001D8:    .data.l sub_6000240     ; DATA XREF: sub_60001B0+8o
ROM:060001D8                                         ; ROM:off_60001D4o
ROM:060001DC                 .data.l sub_6000240
ROM:060001E0                 .data.l sub_6000240
ROM:060001E4                 .data.l sub_6000240
ROM:060001E8                 .data.l sub_6001334
ROM:060001EC                 .data.l Error
ROM:060001F0 off_60001F0:    .data.l unk_6001280
ROM:060001F0        
ROM:060001F4                 .data.l unk_6001250
ROM:060001F8
I haven't looked into other games yet.

EDIT: I just did a quick search for $A15112 across my 32X rom set in Hex Workshop, the only games I had hits in are Chaotix, Primal Rage, and Virtua Racing, so it doesn't seem that it was used much at all. Maybe because of documentation being incorrect?

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Tue Jul 02, 2013 4:57 am

There's probably some EXACT procedure you have to use to make it work properly. As I found on my experiments, just simply following the docs isn't enough to get it to work right. I'll look over this code for the differences to what I do and see if I can't figure out the crucial difference that make it work.

TapamN
Interested
Posts: 15
Joined: Mon Apr 25, 2011 1:05 am

Post by TapamN » Fri Jul 05, 2013 1:12 am

Saw this and had some free time, so I put my Dreamcast skills to use. Here is a rather direct pseudo-C translation of the SH2 code:

Code: Select all

void sub_6001334()
{
	static int dword_600137C = 1; //might be externally modified? or written funny for debugging
	
	volatile char *r1a = 0xFFFFFE10;
	r1a[7] = r1a[7] ^ 2; //write to timer compare control reg (toggle timer a match output level)

	*(*volatile short)0x2000401A = 0x401a;	//does write to cmd irq clear reg (any value works)

	int r0;
	volatile int *r1 = 0xFFFFFF80;
	r1[3] = 0x000044E0; //write to dma ch0 control reg
	r1[0] = 0x20004012; //write to dma ch0 src addr reg
	r1[1] = *(int*)off_6003814; //write to dma ch0 dst addr reg
	r1[2] = *(short*)0x20004010; //write to dma ch0 xfr length reg
	r0 = r1[3];	//dummy read from dma ch0 control reg
	r1[3] = 0x000044E1; //write to dma ch0 control reg
	r0 = r1[12];	//dummy read
	r1[12] = dword_600137C; //write to shared dma op reg (dma enable, fixed priority, no address error)
}

typedef void funcpntr_t ();

void sub_60001B0()	//irq
{
	const static funcpntr_t table[] = {
		sub_6000240, sub_6000240, sub_6000240, sub_6000240,
		sub_6001334, Error, unk_6001280, unk_6001250 };
	unsigned int temp = (get_sr() >> 3) & 0x1c;
	set_sr(0xf0);
	table[temp >> 2]();
}
And here is a somewhat cleaned up version.

Code: Select all

void sub_6001334_clean()
{
	*tcr ^= 2; //write to timer compare control reg (toggle timer a match output level)

	cmd_irq_clear();

	/* 16-bit (accessed as 32-bit) dma ch0 control reg value explained, from highest bit to lowest:
		destination address incremented (dm = binary 01)
		source address fixed (sm = binary 00)
		2-byte units transfered (ts = binary 01)
		module request mode (ar = 0)
		send dack during read cycle (am = 0)
		dack is active-high (al = 1)
		dreq is edge sensistive (ds = 1)
		dreq is detected by edge rise (dl = 1)
		cycle-steal mode (tb = 0)
		dual address mode (ta = 0)
		dma irq disabled (ie = 0)
		transfer end bit clear, (te = 0) signals dma as successfully completed.
			to clear, first read chcr0 when te = 1, then write 0 to te bit
			cannot start new transfers when bit is set
		transfer enable bit (de = 0/1)
	*/
	*chcr0 = 0x000044E0; //write to dma ch0 control reg /w dma disabled
	*sar0 = fifo_addr; //set dma ch0 src addr reg to fifo
	*dar0 = *(int*)off_6003814; //write global pointer to dma ch0 dst addr reg
	*tcr0 = *dreqlength; //write 68 to sh dma request length to dma ch0 xfr length reg

	(void)*chcr0;	//dummy read from dma ch0 control reg to clear te bit
	*chcr0 = 0x000044E1; //write to dma ch0 control reg /w dma enabled

	(void)*dmacr;	//dummy read from shared dma op reg to clear address error and nmi bits
	*dmacr = 1; //write to shared dma op reg (dma enable, fixed priority, clear address error and nmi)
}

typedef void funcpntr_t ();

void sub_60001B0_clean()	//irq
{
	const static funcpntr_t table[] = {
		sub_6000240, sub_6000240, sub_6000240, sub_6000240,
		sub_6001334, Error, unk_6001280, unk_6001250 };

	unsigned int temp = get_irq_mask() % 8;
	set_sr(0xf0);	//irq mask = 15
	table[temp]();
}
Hopefully, I haven't made any mistakes.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Sun Aug 10, 2014 11:11 pm

I looked at Night Trap 32X and found the xfer code in the nt.bin file. It rounds the length up to 4 words (same as Chaotix code, and the max number of words in the FIFO), and then goes into a loop where it sends a max of $400 bytes at a time. It sends all $400 (or less if less data) at once, and then waits for the SH2 to set the COMM register to "GOOD" if it got all the data, "AGIN" if it failed, or times out and returns an error. So apparently, there's a max amount of data you can send before it fails completely, but even then, it may fail anyway. I imagine $400 was their compromise between seeing a repeat command and sending enough data for good xfer rates.

If I had to guess, I'd think the Chaotix perhaps does the limiting and repeat check in the code that calls the xfer routine documented in the previous post. There are few games that use this feature, so I'm guessing it's really pretty buggy.

I'm going to try using the SH2 to read the FIFO and store the data and see if that's reliable. Using the DMA certainly isn't.

EDIT: There doesn't seem to be a way to tell if there's any data ready at the FIFO on the SH2 side. Oh well. Guess I'll try to alter my code to do limited lengths and automatic retries and see how well it works.

Post Reply