I did so for the 32X, just looking at FB_CTRL_REG (2000 410Ah), bit 15.
Here's the 1rst routine I used to do so :
Code: Select all
waitForVBLK:
* mov #0,R11
mov.l REG_VDP,R1
mov.w VALUE_8000h,R2
whileDISP:
* add #1,R11
mov.w @($A,R1),R0 ; while (FB_CTRL_REG & 8000h == 0) ;
and R2,R0
cmp/eq #0,R0
bt whileDISP
* mov R11,R10
rts
nop ; Executes NOP before branching
In an emtpy main, this routine rolls 23k times a frame (nearly 1Mops, seems quite slow to me btw). I then remebered the delayed branch, used to fill the pipeline (see the SH2 programming manual, chapter 7). And here it is :
Code: Select all
waitForVBLK:
mov.l REG_VDP,R1
mov.w VALUE_8000h,R2
mov.w @($A,R1),R0 ; while (FB_CTRL_REG & 8000h == 0) ;
whileDISP:
and R2,R0
cmp/eq #0,R0
bt/s whileDISP
mov.w @($A,R1),R0 ; Executes MOV before branching - Feed pipeline
rts
nop ; Executes NOP before branching
By the way, the compiler you can find on segaxtreme.net does not seem to use delayed branch.