Aggregating Code Snippets

Ask anything your want about Megadrive/Genesis programming.

Moderator: BigEvilCorporation

Post Reply
walker7
Interested
Posts: 45
Joined: Tue Jul 24, 2012 6:27 am

Aggregating Code Snippets

Post by walker7 » Sat Mar 04, 2017 4:16 am

In response to Mask of Destiny's post about aggregating community research, I'd like to aggregate all the interesting code snippets I've posted.


The one takes a set of colors (in 9-bit strings, bbbgggrrr format) that are compressed, and decompresses them.

Code: Select all

Load_CompColors:
;==============================================================================
; INPUT:	a0	= POINTS TO: List of colors
;		a1	= POINTS TO: Destination
;		d0.w	= # of Colors - 1
;		d1.w	= Offset
;==============================================================================
	;----------------------------------------------------------------------
	; Initialize Registers.
	;----------------------------------------------------------------------
	movem.l	d0-d5/a0,-(a7)		; -- Save Registers.
	mulu	#9,d1
	moveq	#7,d2			; Initialize Bit Count.
	and.w	d1,d2
	neg.w	d2
	addq.w	#8,d2
	lsr.w	#3,d1
	add.w	d1,a0
	lsl.l	#8,d3
	move.b	(a0)+,d3
	;----------------------------------------------------------------------
	; If there are at least 9 bits in the Accumulator, unpack the color.
	; Otherwise, read the next byte.
	;----------------------------------------------------------------------
.2:	cmpi.w	#9,d2
	bcc.b	.1
	lsl.l	#8,d3
	move.b	(a0)+,d3
	addq.w	#8,d2
	bra.b	.2
.1:	;----------------------------------------------------------------------
	; Unpack a color.
	;----------------------------------------------------------------------
	move.l	d3,d4
	move.w	d2,d5
	subi.w	#9,d5
	move.w	d5,d2
	lsr.l	d5,d4			; d4.w = %-------bbbgggrrr
	lsl.w	#2,d4			; d4.w = %-----bbbgggrrr..
	lsr.b	#1,d4			; d4.w = %-----bbb.gggrrr.
	lsl.w	#4,d4			; d4.w = %-bbb.gggrrr.....
	lsr.b	#1,d4			; d4.w = %-bbb.ggg.rrr....
	lsr.w	#3,d4			; d4.w = %...-bbb.ggg.rrr.
	andi.w	#$EEE,d4		; d4.w = %....bbb.ggg.rrr.
	move.w	d4,(a1)+		; Store color.
	dbra	d0,.2			; Loop for next color.
	movem.l	(a7)+,d0-d5/a0		; ++ Restore Registers.
	rts
---------------------------------------------------------------------------

This piece of code draws a rectangle of VRAM tiles, all of which are the same. It assumes you have a 64*64-tile plane size.

Code: Select all

Rectangle:
;==============================================================================
; INPUT:	d0.5-0	= X Size
;		d1.5-0	= Y Size
;		d2.5-0	= X Starting Position
;		d3.5-0	= Y Starting Position
;		d4.2-0	= VRAM Bloc ($2000 boundaries)
;		d5.w	= Starting tile
; OUTPUT:	A rectangle of a certain tile onscreen.
; NOTES:	This is best used for rectangles whose parameters may be
; variable.
;==============================================================================
	movem.l	d0-d6/a0-a1,-(a7)	
	lea	$C00004,a0
	lea	-4(a0),a1
	moveq	#$3F,d6
	and.w	d6,d0
	and.w	d6,d1
	and.w	d6,d2
	and.w	d6,d3
	moveq	#$07,d6
	and.l	d6,d4
	lsl.w	#6,d4
	or.w	d3,d4
	lsl.w	#6,d4
	or.w	d2,d4
	lsl.l	#3,d4
	lsr.w	#2,d4
	swap	d4
	bset	#30,d4		
.2:	move.w	d0,-(a7)
	move.l	d4,(a0)
.1:	move.w	d5,(a1)
	dbra	d0,.1
	move.w	(a7)+,d0
	addi.l	#1<<23,d4
	dbra	d1,.2
	movem.l	(a7)+,d0-d6/a0-a1
	rts
---------------------------------------------------------------------------

Same as above, but the data is immediate as opposed to being in registers. It requires immediate data, which is 8 bytes long:
  • One byte for number of tiles across, minus 1
  • One byte for number of tiles down, minus 1
  • One byte for starting X position, minus 1
  • One byte for starting Y position, minus 1
  • Two bytes for VRAM bloc. It can range from 0-7 (0 = $0000, 1 = $2000, etc.). Even with this range, it's still two bytes because the next value needs to be aligned to an even address.
  • Two bytes for VRAM tile.

Code: Select all

Rectangle_Imm:
;==============================================================================
; INPUT:	Immediate Data
; OUTPUT:	A rectangle of a certain tile onscreen.
; NOTES:	The immediate data is in this format:
;	Byte  #1	X Size (0-63, for 1-64 tiles)
;	Byte  #2	Y Size (0-63, for 1-64 tiles)
;	Byte  #3	X Position
;	Byte  #4	Y Position
;	Bytes #5-#6	VRAM Bloc (0-7; is 2 bytes for byte-alignment reasons)
;	Bytes #7-#8	VRAM Tile
; NOTES:	This is best used for rectangles whose parameters are constant.
;==============================================================================
	move.l	a0,-(a7)	; -- Save Register.
	move.l	4(a7),a0	; Load JSR address.
	move.b	(a0)+,d0	; Load X Size in d0.
	move.b	(a0)+,d1	; Load Y Size in d1.
	move.b	(a0)+,d2	; Load X Position in d2.
	move.b	(a0)+,d3	; Load Y Position in d3.
	move.w	(a0)+,d4	; Load VRAM Bloc in d4.
	move.w	(a0)+,d5	; Load VRAM Tile in d5.
	bsr.b	Rectangle	; Draw Rectangle.
	move.l	(a7)+,a0	; ++ Restore Register.
	addq.l	#8,(a7)		; Adjust return address.
	rts
---------------------------------------------------------------------------

Another rectangle-drawing snippet, except this time, the VRAM tile increments by 1 after every tile.

Code: Select all

Rectangle_Inc:
;==============================================================================
; INPUT:	d0.5-0	= X Size
;		d1.5-0	= Y Size
;		d2.5-0	= X Starting Position
;		d3.5-0	= Y Starting Position
;		d4.2-0	= VRAM Section ($2000 boundaries)
;		d5.w	= VRAM Tile Offset
; OUTPUT:	A rectangle of a certain tile onscreen, with the tile
;		incrementing.
; NOTES:	This is best used for rectangles whose parameters may be
; variable.
;==============================================================================
	movem.l	d0-d6/a0-a1,-(a7)
	lea	$C00004,a0
	lea	-4(a0),a1
	moveq	#$3F,d6
	and.w	d6,d0
	and.w	d6,d1
	and.w	d6,d2
	and.w	d6,d3
	moveq	#$07,d6
	and.l	d6,d4
	lsl.w	#6,d4
	or.w	d3,d4
	lsl.w	#6,d4
	or.w	d2,d4
	lsl.l	#3,d4
	lsr.w	#2,d4
	swap	d4
	bset	#30,d4	
.2:	move.w	d0,-(a7)
	move.l	d4,(a0)
.1:	move.w	d5,(a1)
	addq.w	#1,d5
	dbra	d0,.1
	move.w	(a7)+,d0
	addi.l	#1<<23,d4
	dbra	d1,.2
	movem.l	(a7)+,d0-d6/a0-a1
	rts
---------------------------------------------------------------------------

Same as above, but with immediate data:

Code: Select all

Rectangle_IncI:
;==============================================================================
; INPUT:	Immediate Data
; OUTPUT:	A rectangle of a certain tile onscreen, with the tile
;		incrementing.
; NOTES:	The immediate data is in this format:
;	Byte  #1	X Size (0-63, for 1-64 tiles)
;	Byte  #2	Y Size (0-63, for 1-64 tiles)
;	Byte  #3	X Position
;	Byte  #4	Y Position
;	Bytes #5-#6	VRAM Bloc (0-7; is 2 bytes for byte-alignment reasons)
;	Bytes #7-#8	VRAM Tile
; This is best used for rectangles whose parameters are constant.
;==============================================================================
	move.l	a0,-(a7)	; -- Save Register.
	move.l	4(a7),a0	; Load JSR address.
	move.b	(a0)+,d0	; Load X Size in d0.
	move.b	(a0)+,d1	; Load Y Size in d1.
	move.b	(a0)+,d2	; Load X Position in d2.
	move.b	(a0)+,d3	; Load Y Position in d3.
	move.w	(a0)+,d4	; Load VRAM Bloc in d4.
	move.w	(a0)+,d5	; Load VRAM Tile in d5.
	bsr.b	Rectangle_Inc	; Draw Rectangle.
	move.l	(a7)+,a0	; ++ Restore Register.
	addq.l	#8,(a7)		; Adjust return address.
	rts
---------------------------------------------------------------------------

Here's another rectangle-drawing method, but this time, using a pointer to determine the tile mappings. In this case, the input in d5 determines what to add to the data before displaying it.

Code: Select all

Rectangle_Mapping:
;==============================================================================
; INPUT:	a2	= POINTER: Mappings
;		d0.5-0	= X Size
;		d1.5-0	= Y Size
;		d2.5-0	= X Starting Position
;		d3.5-0	= Y Starting Position
;		d4.2-0	= VRAM Section ($2000 boundaries)
;		d5.w	= Starting tile
; OUTPUT:	A rectangle of a certain tile onscreen, with the tiles based
;		on 16-bit mappings.
;==============================================================================
	movem.l	d0-d6/a0-a2,-(a7)
	lea	$C00004,a0
	lea	-4(a0),a1
	moveq	#$3F,d6
	and.w	d6,d0
	and.w	d6,d1
	and.w	d6,d2
	and.w	d6,d3
	moveq	#$07,d6
	and.l	d6,d4
	lsl.w	#6,d4
	or.w	d3,d4
	lsl.w	#6,d4
	or.w	d2,d4
	lsl.l	#3,d4
	lsr.w	#2,d4
	swap	d4
	bset	#30,d4	
.2:	move.w	d0,-(a7)
	move.l	d4,(a0)
.1:	move.w	(a2)+,d6
	add.w	d5,d6
	move.w	d6,(a1)
	dbra	d0,.1
	move.w	(a7)+,d0
	addi.l	#1<<23,d4
	dbra	d1,.2
	movem.l	(a7)+,d0-d6/a0-a2
	rts
---------------------------------------------------------------------------

Same as above, but the data is immediate. In addition to the usual 8 bytes of data, there are an additional 4 bytes that tell where to find the mappings.

Code: Select all

Rectangle_MapI:
;==============================================================================
; INPUT:	Immediate Data
; OUTPUT:	A rectangle of a certain tile onscreen, with the tiles based
;		on 16-bit mappings.
; NOTES:	The immediate data is in this format:
;	Byte  #1	X Size (0-63, for 1-64 tiles)
;	Byte  #2	Y Size (0-63, for 1-64 tiles)
;	Byte  #3	X Position
;	Byte  #4	Y Position
;	Bytes #5-#6	VRAM Bloc (0-7; is 2 bytes for byte-alignment reasons)
;	Bytes #7-#8	VRAM Tile Offset
; 	Bytes #9-#12	Mapping Address	
; This is best used for rectangles whose parameters are constant, and the
; mappings are uncompressed.
;==============================================================================
	move.l	a0,-(a7)		; -- Save Register.
	move.l	4(a7),a0		; Load JSR address.
	move.b	(a0)+,d0		; Load X Size in d0.
	move.b	(a0)+,d1		; Load Y Size in d1.
	move.b	(a0)+,d2		; Load X Position in d2.
	move.b	(a0)+,d3		; Load Y Position in d3.
	move.w	(a0)+,d4		; Load VRAM Bloc in d4.
	move.w	(a0)+,d5		; Load VRAM Tile Offset in d5.
	move.l	(a0)+,a2		; Load Mapping Address in a2.
	bsr.b	Rectangle_Mapping	; Draw Rectangle.
	move.l	(a7)+,a0		; ++ Restore Register.
	addq.l	#8,(a7)			; Adjust return address.
	addq.l	#4,(a7)			;
	rts
---------------------------------------------------------------------------

Still the same as above, this piece of code uses immediate data, but instead of compressed data, it uses a pointer to Enigma-compressed mapping data.

Code: Select all

Rectangle_MapEI:
;==============================================================================
; INPUT:	Immediate Data
; OUTPUT:	A rectangle of a certain tile onscreen, with the tiles based
;		on 16-bit mappings.
; NOTES:	The immediate data is in this format:
;	Byte  #1	X Size (0-63, for 1-64 tiles)
;	Byte  #2	Y Size (0-63, for 1-64 tiles)
;	Byte  #3	X Position
;	Byte  #4	Y Position
;	Bytes #5-#6	VRAM Bloc (0-7; is 2 bytes for byte-alignment reasons)
;	Bytes #7-#8	VRAM Tile Offset
; 	Bytes #9-#12	Enigma-Compressed Mapping Address	
; This is best used for rectangles whose parameters are constant, and the
; mappings are Enigma-compressed.
;==============================================================================
	movem.l	a0-a1/a3,-(a7)		; -- Save Register.
	move.l	12(a7),a3		; Load JSR address.	
	move.l	8(a3),a0		; Load Mapping Address.
	lea	__TileBuffer,a1		; POINT TO: Tile Buffer.
	move.l	a1,a2			; Copy Mapping Address.
	move.w	6(a3),d0		; Load VRAM Tile for Enigma Decompression.
	jsr	EniDec			; Decompress the Mappings.		
	move.b	(a3)+,d0		; Load X Size in d0.
	move.b	(a3)+,d1		; Load Y Size in d1.
	move.b	(a3)+,d2		; Load X Position in d2.
	move.b	(a3)+,d3		; Load Y Position in d3.
	move.w	(a3)+,d4		; Load VRAM Bloc in d4.
	moveq	#0,d5			; Set VRAM Tile Offset to 0.
	bsr.b	Rectangle_Mapping	; Draw Rectangle.
	movem.l	(a7)+,a0-a1/a3		; ++ Restore Register.
	addq.l	#8,(a7)			; Adjust RTS address.
	addq.l	#4,(a7)			;
	rts
---------------------------------------------------------------------------

This piece of code converts a number to a range index. It uses immediate data for input.
  • The first two bytes tell how many bytes follow.
  • The remaining bytes are simply the data.

Code: Select all

Range_Check:
;==============================================================================
; INPUT:	d0.b	= Number
; OUTPUT:	d0	= Range
;==============================================================================
	movem.l	d1-d2/a0,-(a7)		; -- Save Registers.
	move.l	12(a7),a0		; Load JSR address from stack.
	move.w	(a0)+,d1		; Load Range count as Loop Counter.
	moveq	#0,d2			; Initialize Output.
	bra.b	.3			; Skip loop if Loop Counter is 0.
.2:	cmp.b	(a0,d2.w),d0		; Compare to next Range.
	bcs.b	.1			; If the Input is >= Range...
	addq.w	#1,d2			; ...increment Output by 1.
.3:	dbra	d1,.2			; Loop for next Range.
.1:	move.l	d2,d0			; Arrange Output.
	add.w	-2(a0),a0		; Add Range Count to JSR address.
	move.w	a0,d1			; Load bit 0 of JSR address into a register.
	lsr.w	#1,d1			; Shift it right one bit.
	bcc.b	.4			; If C is set (JSR address was odd)...
	addq.l	#1,a0			; ...increment JSR address by 1.
.4:	move.l	a0,12(a7)		; Update JSR address on stack.
	movem.l	(a7)+,d1-d2/a0		; ++ Restore Registers.
	rts
---------------------------------------------------------------------------

This one does an immediate table lookup.

Code: Select all

Indexed_Mapping:
	movem.l	d1/a0,-(a7)		; -- Save Registers.
	move.l	8(a7),a0		; Load JSR address from stack.
	move.w	(a0)+,d1		; Load Element count.
	cmp.w	d1,d0			; Compare Index to Element count.
	bcs.b	.1			; If Index >= Element count:
	move.w	d1,d0			; Make Index = Element count...
	subq.w	#1,d0			; ...and subtract 1 from it.
.1:	move.b	0(a0,d0.w),d0		; Load the appropriate Element from the list.
	andi.l	#$FF,d0			; CONVERT: 8-bit Output --> 32-bit Output.
	add.w	d1,a0			; Add Element count to JSR Address.
	move.w	a0,d1			; Load bit 0 of JSR address into a register.
	lsr.w	#1,d1			; Shift it right one bit.
	bcc.b	.2			; If C is set (JSR address was odd)...
	addq.l	#1,a0			; ...increment JSR address by 1.
.2:	move.l	a0,8(a7)		; Update JSR address on stack.
	movem.l	(a7)+,d1/a0		; ++ Restore Registers.
	rts
---------------------------------------------------------------------------

This one converts a 32-bit number a converts it to a 10-digit string. Each digit in the string is represented using one byte which ranges from $00-$09.

Code: Select all

Num_to_UBCD:
;==============================================================================
; INPUT:	d0	= Number
; OUTPUT:	a0	= POINTS TO: Output
; NOTES:	This takes a 32-bit number and converts it to a 10-digit
; number in unpacked BCD. Each digit is one byte, and each byte ranges from
; $00-$09.
;==============================================================================
	movem.l	d1-d2/a0-a1,-(a7)		; -- Save registers.
	lea	Log10_TABLE(pc),a1		; Point to power-of-10 table.
	moveq	#9,d3				; Initialize Loop Counter.
.2:	move.l	(a1)+,d2
	moveq	#-1,d1
.1:	addq.b	#1,d1
	sub.l	d2,d0
	bcc.b	.1
	move.b	d1,(a0)+
	add.l	d2,d0
	dbra	d3,.2
	movem.l	(a7)+,d1-d2/a0-a1		; ++ Restore registers.
	rts

Log10_TABLE:
;==============================================================================
; This is a power-of-10 table.
;==============================================================================
	dc.l	1000000000, 100000000, 10000000, 1000000, 100000
	dc.l	10000, 1000, 100, 10, 1
---------------------------------------------------------------------------

This one does the same thing, except it produces an ASCII string (bytes range from $30-$39):

Code: Select all

Num_to_ASCII:
;==============================================================================
; INPUT:	d0	= Number
; OUTPUT:	a0	= POINTS TO: Output
; NOTES:	This takes a 32-bit number and converts it to a 10-digit
; number in ASCII. Each digit is one byte, and each byte ranges from $30-$39.
;==============================================================================
	move.l	d3,-(a7)		; -- Save register.
	bsr.b	Num_to_UBCD		; Convert number to unpacked BCD.
	moveq	#9,d3			; Initialize Loop Counter.
.1:	addi.b	#'0',0(a0,d3.w)		; Adjust digit for ASCII.
	dbra	d3,.1			; Loop for next Digit.
	move.l	(a7)+,d3		; ++ Restore register.
	rts
---------------------------------------------------------------------------

Multiply a 32-bit number by 10:

Code: Select all

add.l	d0,d0
move.l	d0,d1
lsl.l	#2,d0
add.l	d1,d0
NOTE: LSL.L #2 is faster than two ADD.L's.

---------------------------------------------------------------------------

Copy the C flag to the X flag:

Code: Select all

scs	d0
add.b	d0,d0
---------------------------------------------------------------------------

Signum of a 32-bit integer (posted by ehaliewicz):

Code: Select all

add.l	d0,d0
subx.l	d1,d1
negx.l	d0
addx.l	d1,d1

I might post some more later on. Coming up:
  • A routine that acts like several MOVEQ's.
  • A routine that does an LFSR iteration (several sizes).
  • A routine that uses the CCR to index into a jump table.
Last edited by walker7 on Mon Mar 06, 2017 2:45 am, edited 1 time in total.
When programming, you can do it if you put your mind to it.

walker7
Interested
Posts: 45
Joined: Tue Jul 24, 2012 6:27 am

Re: Aggregating Code Snippets

Post by walker7 » Mon Mar 06, 2017 1:28 am

This piece of code takes the XNZVC bits of the SR and uses them as an index into a jump table. In this jump table, each branch target is a BRA.B instruction.

Code: Select all

Jump_SR:
	move.w	sr,-(a7)
	move.l	d0,-(a7)
	move.w	4(a7),d0
	andi.w	#$1F,d0
	ext.l	d0
	add.w	d0,d0
	add.l	d0,6(a7)
	addq.l	#6,a7
	rts
If you want to use BRA.W, the code would become:

Code: Select all

Jump_SR:
	move.w	sr,-(a7)
	move.l	d0,-(a7)
	move.w	4(a7),d0
	andi.w	#$1F,d0
	ext.l	d0
	lsl.w	#2,d0
	add.l	d0,6(a7)
	addq.l	#6,a7
	rts
---------------------------------------------------------------------------

This one acts like an immediate MOVEQ for multiple registers. I tell you, this requires a lot of clever stack manipulation. Just wait 'til you see this one! After the call to the instruction, the parameters are as follows:
  • The first byte represents which registers to use. The leftmost bit represents d0, and the rightmost bit represents d7.
  • Each byte tells what value to assign to that register, in order.
  • The input must be padded to an even number of bytes.
This piece of code is 80 bytes long. It is best if it is placed in the first 8K of code ($000000-$007FFF), since you could call it using a short jump instruction. Its worth using if you would use lots of MOVEQ instructions at once, and many times, in your code.

Code: Select all

Multi_MOVEQ:
	lea	-18(a7),a7
	movem.l	d6-d7/a5-a6,-(a7)
	lea	$0022(a7),a5	
	move.l	(a5),a6	
	moveq	#7,d7
	moveq	#0,d6
	move.b	(a6)+,d6	
	move.l	a5,a7
	move.w	#$4EF9,-(a7)	
.1:	move.w	#$4E71,-(a7)
	lsr.b	#1,d6
	bcc.b	.2
	move.w	#$7000,(a7)
	add.w	d7,d7
	or.b	d7,(a7)
	lsr.w	#1,d7
	move.b	(a6)+,1(a7)
.2:	dbra	d7,.1
	move.w	a6,d6
	lsr.w	#1,d6
	bcc.b	.3
	addq.l	#1,a6
.3:	move.l	a6,$0012(a7)
	lea	-$0010(a7),a7
	movem.l	(a7)+,d6-d7/a5-a6
	lea	$0016(a7),a7	
	jmp	-$0016(a7)
For example, say that the input is:
1C 00 FF 33
  • The 1C at the beginning means that registers d3, d4, and d5 will be filled.
  • The 00 gives d3 a value of $00000000.
  • The FF gives d4 a value of $FFFFFFFF.
  • The 33 gives d5 a value of $00000033.
When programming, you can do it if you put your mind to it.

MintyTheCat
Very interested
Posts: 484
Joined: Sat Mar 05, 2011 11:11 pm
Location: Berlin, Germany

Re: Aggregating Code Snippets

Post by MintyTheCat » Wed Mar 08, 2017 8:03 am

This site desperately needs a Wiki.
UMDK Fanboy

walker7
Interested
Posts: 45
Joined: Tue Jul 24, 2012 6:27 am

Re: Aggregating Code Snippets

Post by walker7 » Wed Mar 08, 2017 3:14 pm

MintyTheCat wrote:This site desperately needs a Wiki.
Good idea. After all, there is an nesdev wiki, so maybe we can call it gendev or mddev (md for Mega Drive).

We could put all our YM2612 knowledge, code snippets, and other Genesis research stuff on there.
When programming, you can do it if you put your mind to it.

Flygon
Very interested
Posts: 60
Joined: Mon Sep 28, 2009 11:26 am
Contact:

Re: Aggregating Code Snippets

Post by Flygon » Thu Mar 09, 2017 5:12 am

There was a Mega Drive Wiki being made, but interest seemed to tail off, somewhat.

ComradeOj
Interested
Posts: 27
Joined: Sun Jun 28, 2015 4:18 pm
Contact:

Re: Aggregating Code Snippets

Post by ComradeOj » Mon Mar 20, 2017 10:36 pm

Here is a little snippet I wrote some time ago, and use a lot.

It calculates the command to send to the VDP to do a VRAM write.
For example, you would write $41800001 to the VDP control port. to access $4A80 in VRAM. With this code, just set the target address into D0, call the routine, and the VDP command will be ready to go in D0 when it returns.

It's faster to just pre-calculate your VDP writes, rather than have the 68k do it every time. I get really lazy sometimes though. It can also be tweaked to calculate reads instead of writes.

Code: Select all

calc_vram:
		move.l d1,-(sp)
		move.l d0,d1
		andi.w #$C000,d1 ;get first two bits only
		lsr.w #$7,d1     ;shift 14 spaces to move it to the end
		lsr.w #$7,d1     ;ditto
		andi.w #$3FFF,d0 ;clear all but first two bits
		eor.w #$4000,d0  ;attach vram write bit
		swap d0          ;move d0 to high word
		eor.w d1,d0      ;smash the two halves together	
		move.l (sp)+,d1
		rts
Visit my web site at http://www.mode5.net/!

flamewing
Very interested
Posts: 56
Joined: Tue Sep 23, 2014 2:39 pm
Location: France

Re: Aggregating Code Snippets

Post by flamewing » Tue Mar 21, 2017 12:54 am

Here are some highly optimized versions of VRAM command words:

Code: Select all

; Input: d0 = target VRAM address
; Output: d0 = VDP command longword for write to said address
VRAM_Write:
    lsl.l   #2,d0   ; Move high bits into (word-swapped) position, accidentally moving everything else
    addq.w  #1,d0   ; Add upper access type bits
    ror.w   #2,d0   ; Put upper access type bits into place, also moving all other bits into their correct (word-swapped) places
    swap    d0      ; Put all bits in proper places
    andi.w  #3,d0   ; Strip whatever junk was in upper word of reg
    rts

; Input: d0 = target VRAM address
; Output: d0 = VDP command longword for read from said address
VRAM_Read:
    lsl.l   #2,d0   ; Move high bits into (word-swapped) position, accidentally moving everything else
    ror.w   #2,d0   ; Put upper access type bits into place, also moving all other bits into their correct (word-swapped) places
    swap    d0      ; Put all bits in proper places
    andi.w  #3,d0   ; Strip whatever junk was in upper word of reg
    rts

; Input: d0 = target VRAM address
; Output: d0 = VDP command longword for DMA to said address
VRAM_DMA:
    lsl.l   #2,d0   ; Move high bits into (word-swapped) position, accidentally moving everything else
    addq.w  #1,d0   ; Add upper access type bits
    ror.w   #2,d0   ; Put upper access type bits into place, also moving all other bits into their correct (word-swapped) places
    swap    d0      ; Put all bits in proper places
    andi.w  #3,d0   ; Strip whatever junk was in upper word of reg
    tas.b   d0      ; Add in the DMA flag -- tas fails on memory, but works on registers
    rts
It is easy to combine them all into a macro with parameters and use in-place (without a bsr/jsr and the rts), as well as to add support for CRAM and VSRAM; but since I don't want to deal with the hassle of figuring out the macro syntax for other assemblers, I will leave as an exercise :D

The andi instructions are not needed if you know that the top half of d0 is zero, by the way. Removing the andi, and using it as inline macros to avoid the bsr/jsr and rts are the only ways to make these faster.

Post Reply