Code: Select all
01 02 03 04 05
06 07 08 09 10
11 12 13 14 15
This subroutine seems to go very fast. Under Fusion, I managed to load a 320x224 bitmap into VRAM in about two frames. The function is very sensible to size changes: a 256x192 bitmap loads about twice as fast. Again, this is under Fusion. No idea about real hardware, but it shouldn't differ too much since Fusion has FIFO emulation.
Also, before you ask, I didn't use (a0,a2) or similar because I'm not sure if doing that is valid on the 68000. Please somebody confirm me this, because in that case dealing with the precalculated table will be a lot easier (just so you know, I'm making a table and storing it into registers, and that's extremely crazy ).
Parameters:
- d0.b: bitmap width (in tiles)
- d1.b: bitmap height (in tiles)
- d2.w: index of first tile in VRAM (0..2047)
- a0.l: pointer to the bitmap (68k address, even).
Code: Select all
VLBitmap2Scroll:
movem.l d0-a1, -(sp)
andi.l #2047, d2
lsl.l #7, d2
lsr.w #2, d2
ori.w #$4000, d2
swap d2
move.l d2, ($C00004).l
lsl.w #1, d0
andi.l #$FFFF, d0
lsr.w #3, d1
subq.w #1, d1
lea ($C00000).l, a1
move.l d0, d4
lsl.l #3, d4
subq.l #4, d4
move.w d0, d2
add.w d2, d2
move.w d0, d4
add.w d2, d4
move.w d2, d5
add.w d5, d5
move.w d5, d6
add.w d0, d6
move.w d5, d7
add.w d2, d7
swap d1
move.w d7, d1
add.w d0, d1
swap d1
VLBitmap2ScrollVLoop:
move.w d0, d3
add.w d3, d3
add.w d3, d3
subq.w #1, d3
swap d1
VLBitmap2ScrollHLoop:
move.l (a0), (a1)
move.l (a0,d0.w), (a1)
move.l (a0,d2.w), (a1)
move.l (a0,d4.w), (a1)
move.l (a0,d5.w), (a1)
move.l (a0,d6.w), (a1)
move.l (a0,d7.w), (a1)
move.l (a0,d1.w), (a1)
addq.l #4, a0
dbf d3, VLBitmap2ScrollHLoop
swap d1
add.l d7, a0
add.l d0, a0
dbf d1, VLBitmap2ScrollVLoop
movem.l (sp)+, d0-a1
rts
PS: loading a 320x224 bitmap in about two frames is loading 560 tiles in a single frame, right? How many of them get loaded inside VBlank? Even yet, it's a lot, and I'm not loading that amount of tiles using DMA, which in theory doubles the amount of loadable tiles in VBlank and loads at the same speed as a 68k loop in active scan
EDIT: before I forget, this subroutine does not use RAM at all. In fact the only memory accesses it does are accessing the bitmap and the VDP So I guess that helps with performance?