Optimizing background tiles drawn to framebuffer

Ask anything your want about the 32X Mushroom programming.

Moderator: BigEvilCorporation

TapamN
Interested
Posts: 15
Joined: Mon Apr 25, 2011 1:05 am

Post by TapamN » Mon Jun 04, 2012 6:48 pm

Mirroring should have negligible overhead. Here's are versions of the copy function that writes the data in reverse order. Use the first one if you're using 16 bit color, and the second one if you're using 8 bit color.

Code: Select all

       ! void _word_8byte_copy_wordreverse(short *dst, short *src, int count)
        .align  4
        .global _word_8byte_copy_wordreverse
_word_8byte_copy_wordreverse:
        mov     r6,r0	!adjust the dst pointer so we start at the end
        shll2   r0
        shll    r0
        add     r0,r4	!dst = dst + count * 8
1:      mov.w   @r5+,r0
        dt      r6
        mov.w   @r5+,r1
        mov.w   @r5+,r2
        mov.w   @r5+,r3
        mov.w   r0,@-r4
        mov.w   r1,@-r4
        mov.w   r2,@-r4
        bf/s    1b
        mov.w   r3,@-r4
        rts
        nop 
	
	
        ! void _word_8byte_copy_bytereverse(short *dst, short *src, int count)
        .align  4
        .global _word_8byte_copy_bytereverse
_word_8byte_copy_bytereverse:
        mov     r6,r0	!adjust the dst pointer so we start at the end
        shll2   r0
        shll    r0
        add     r0,r4	!dst = dst + count * 8
1:      mov.w   @r5+,r0
        dt      r6
        mov.w   @r5+,r1
        mov.w   @r5+,r2
        mov.w   @r5+,r3
        swap.b  r0,r0
        mov.w   r0,@-r4
        swap.b  r1,r1
        mov.w   r1,@-r4
        swap.b  r2,r2
        mov.w   r2,@-r4
        swap.b  r3,r3
        bf/s    1b
        mov.w   r3,@-r4
        rts
        nop 
What all does your draw function actually do?

ammianus
Very interested
Posts: 124
Joined: Sun Jan 29, 2012 2:10 pm
Location: North America
Contact:

Post by ammianus » Mon Jun 04, 2012 9:13 pm

TapamN wrote:Mirroring should have negligible overhead. Here's are versions of the copy function that writes the data in reverse order. Use the first one if you're using 16 bit color, and the second one if you're using 8 bit color.
...
What all does your draw function actually do?
Cool thanks.

So what I am doing is drawing a rectangle from a source image of arbitrary size to an any location on the framebuffer (8-bit color).

I have a couple four loops like this:

Code: Select all

for ( y < image_height; y++ )
{
    for (x < image_width; x+=8 )
    {
         //read 8 bytes from src memory
         //manipulate 8 bytes if mirrored
         //if < 8 bytes to draw to FB in this row, zero out remaining bytes of "overflow" 
         //write 8 bytes to *FB
    }
    //increment *FB to start of next row
}
What I have to figure out is the the border cases where I only need 4-pixels lets say in the current row from the source, I really need to reverse the 4 pixels and then zero out the remaining 4 bytes before writing to the FB.

TapamN
Interested
Posts: 15
Joined: Mon Apr 25, 2011 1:05 am

Post by TapamN » Sun Jun 10, 2012 5:53 pm

Is the only reason you copy the pixels to a temporary buffer to do clipping? You set the pixels to transparent to prevent them from from appearing on the other side of the screen? You can probably just copy straight to the framebuffer. For the parts that need clipping, adjust the source, destination, and length of the copies.

For left edge clipping, you always have the destination start be the left edge of the screen. You need to figure out how much of the tile is hanging off the left edge; this is the negative X of the left edge of the tile. Then add that as an offset to the start of the source to shift over where it gets its pixels from, and subtract the same amount from the length of the copy. You can simplify out the negation of the X position and just reverse the adds and subtracts.

For right edge clipping, just subtract the number of off screen pixels from the length. The number of off screen pixels is (Screen width - X + Tile width). Since screen and tile width don't change, you can combine them into a right clipping edge value and do (Right clipping edge - X). Source and destination are as normal.

Here's and example left and right edge X clipper in C. It draws one row of pixels, passed in with tilestart, at a given X and Y. X has to be even (word aligned) or the copy functions will do unaligned accesses, and at least part of the tile has to be on screen; don't call the function at all if the tile is completely off screen.

Code: Select all

#define tilewidth        (16)                   /* tile width in pixels */
#define fbwidth	      (320)                   /* framebuffer width in pixels */
#define fbheight        (224)                   /* framebuffer height in pixels */
#define fbrightclip     (fbwidth - tilewidth)   /* right side clipping edge in pixels */
#define framebuffer     ((short*)XXXXXXX)       /* pointer to framebuffer */

void draw_tile_row(int fbx, int fby, short *tilerowstart)
{
        char *linestart = (char*)framebuffer + fby*fbwidth;
        if (fbx < 0) {
                // Left edge of tile  goes off the left edge of the screen
                fast_wmemcpy((short*)linestart, (short*)((char*)tilerowstart - fbx), tilewidth + fbx);
        } else if (fbx > fbrightclip) {
                // Right edge of tile goes off the right edge of the screen.
                int wordsoffscreen =  fbrightclip - fbx;
                fast_wmemcpy((short*)(linestart + fbx), tilerowstart, tilewidth - wordsoffscreen);
        } else {
                // Tile is completely on screen
                word_8byte_copy((short*)(linestart + fbx), tilerowstart, tilewidth/8);
        }
}
I haven't tried compiling it or anything, so I don't know if it works right; but I think it should show the general idea.

You'll also want to have the clipping detection outside of the innermost loops, rather than doing it for every tile. You can cut down on the majority of checks by processing your tile map like this:

Code: Select all

draw top row with clipping
draw bottom row with clipping
for each row not touching the top and bottom edges {
        draw left edge tile with clipping
        fast block draw middle unclipped tiles
        draw right edge tile with clipping
}
If you really need to get the most speed possible, you can write special versions of the clipping code for each edge and corner of the screen, to minimize the number of conditionals. You can process all possible clipping conditions for the entire screen in two checks, instead of doing multiple checks per tile, and reuse clipping calculations with multiple tiles, at the cost of greatly complicating the code. Don't bother with this method unless the above isn't fast enough. Here's what it would look like, though:

Code: Select all

if left and right edges need clipping {
        if top and bottom rows need clipping {
                draw top left tile with top left clipping
                draw top right tile with top right clipping
                fast block draw top tiles with top clipping
                draw bottom left tile with bottom left clipping
                draw bottom right tile with bottom right clipping
                fast block draw bottom tiles with bottom clipping
        }
        for each row without y clipping {
                draw left edge tile with left clipping
                fast block draw middle unclipped tiles
                draw right edge tile with right clipping
        }
} else {
        if top and bottom rows need clipping {
                fast block draw top tiles with top clipping
                fast block draw bottom tiles with bottom clipping
        }
        for each row without y clipping {
                fast block draw middle unclipped tiles
        }
}
Here's a single word reverse pixel copy routine for clipping flipped tiles, since one hasn't been posted yet.

Code: Select all

.align  4
.global _fast_wmemcpy_bytereverse
_fast_wmemcpy_bytereverse:
        add     r6,r4
        add     r6,r4      !dst = dst + count * 2
1:      mov.w   @r5+,r3
        dt      r6
        swap.b  r3,r3
        bf/s    1b
        mov.w   r3,@-r4
        rts
        nop
Actually, it might be possible to go without clipping if the line table format is flexible enough to let you use a 336 by 240 pixel frame buffer and use the line table and screen shift to adjust a 320 by 224 window. If each word in the line table is converted into a pointer into the frame buffer RAM by a left shift, it should be no problem to do so, but with the docs I have I can't figure out if it's really possible.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Sun Jun 10, 2012 7:07 pm

Yes, the line table would allow you to "overdraw" the lines to avoid clipping on the sides. There's enough ram in the frame buffer that in 256 color mode you can make the buffer 640x224 and use the line table to scroll around in that buffer. Clearly, if all you want is to avoid clipping, you just need an extra 4 words on each side of the screen, making the buffer 336x224.

The extra ram (in 256 color mode) and the line table are specifically there to make the 32X layer easy/fast to scroll for shmups or platformers.

ammianus
Very interested
Posts: 124
Joined: Sun Jan 29, 2012 2:10 pm
Location: North America
Contact:

Post by ammianus » Sun Jun 10, 2012 8:47 pm

Oh thanks guys, I could have sworn when I started working this morning as my usual Sunday hobby I didn't see your replies :?, so in fact I have everything "working" in a sort of "good enough" way

See video here:
http://youtu.be/XehJassXCm8

I got the mirroring, edge clipping and ability to scroll the floor pretty easily (been exploring with using the other parts of the FB and moving the line scroll around for some cool effects).
Is the only reason you copy the pixels to a temporary buffer to do clipping? You set the pixels to transparent to prevent them from from appearing on the other side of the screen?
Yes, clipping of screen or clipping of the image if it's pixel width is not divisible by 8. that and originally I was writing in reverse by writing each byte backwards.
You can probably just copy straight to the framebuffer. For the parts that need clipping, adjust the source, destination, and length of the copies.
Since the functions we were discussing in this forum were talking about how to optimize the reads and writes, I think we settled on that 4 word / 8 byte function, it doesn't seem possible to use those as is if I wanted shorter than 8 byte lengths while clipping edge of image or edge of screen.

I guess I could just enforce that all of my images in my game have dimensions that are multiples of 8 to simplify my life and only have to worry about screen clipping, when I started this I didn't think I would have to fight so hard to draw a bunch of rectangles so I was making it flexible and generic :)

Here is my "good enough" method which I'll probably rethink based on your advice.

Code: Select all

/*
* Draws an image to position on MARS framebuffer allowing you to flip the image using mirror param.
* 
* @param spriteBuffer - pointer to starting position of image data
* @param x - x pixel coordinate of top-left corner of the image 
* @param y - y pixel coordinate of top-left corner of the image
* @param xWidth - vertical size of image to be drawn in pixels
* @param yWidth - horizontal size of image to be drawn in pixels
* @param mirror - 0 for normal 1 for flipped along y-axis
* @param checkTransparency - 0 for not checked ie overwrite every pixel, including with zero, 1 for checking
* @param screenWrap - 0 for no screenWrap, 1 for screen wrapping  
*/
void drawSpriteMaster(const vu8 *spriteBuffer, const vu16 x, const vu16 y, const int xWidth, const int yWidth, const int mirror, const int checkTransparency, const int screenWrap)
{

	//MARS_VDP_MODE_256
	//each byte represents the color in CRAM for each pixel.
	vu8 *frameBuffer8 = NULL;//(vu8*) &MARS_FRAMEBUFFER;
	vu16 xOff;
	int bufCnt=0;
	int rowPos=0;
	int xCount=0;
	int xOverflow=0;
	
	const uint16 lineTableEnd = 0x100;
	int fbOff;
	int p=0;
	//TODO this is always 1 (for 8 byte segments using the word_8byte copy functions
	const int pixelWriteBufferSizeWords =  1;//PIXEL_WRITE_BUFFER_SIZE_B/2;
	
	//overwrite buffer - ie zero is not written - what you need if you want transparency on sprites
	//TODO might have some problems if over screen edges?
	if(checkTransparency == IS_TRANSPARENT){
		frameBuffer8 = (vu8* ) &MARS_OVERWRITE_IMG;
	}else{
		frameBuffer8 = (vu8* ) &MARS_FRAMEBUFFER;
	}

	//offset the number of pixels in each line to start to draw the image
	xOff = x;
	//move the framebuffer offset to start of the visible framebuffer?? 
	//Line table is 256 words ie 256 * 2 bytes
	fbOff = lineTableEnd * 2;// - ( PIXEL_WRITE_BUFFER_SIZE_B - 1 );
	//y-offset for top of sprite to correct line in framebuffer
	fbOff = fbOff + (y * SCREEN_WIDTH);
	//x-offset from start of first line
	fbOff = fbOff + xOff;
	//draw spriteBuffer to the framebuffer
	//drawWidth = 0;
	bufCnt = 0;
	//colPos = 0;
	//yCount = 0;
	xCount = 0;
	rowPos = 0;
	//loop for all the rows
	for (rowPos = 0; rowPos < yWidth; rowPos++)
	{
		p = 0;
		
		if(mirror == IS_MIRRORED){
			//increment a row
			bufCnt = bufCnt + xWidth;
		}
		
		//for the row iterate over the columns
		for(xCount = 0; xCount < xWidth; xCount+=PIXEL_WRITE_BUFFER_SIZE_B)
		{		
			xOverflow = 0;
			//if mirror is 1 that tells us to flip the column
			if(mirror == IS_MIRRORED){
				//copy the next 8 bytes in reverse
				word_8byte_copy_bytereverse((void *)&pixelWords, (void *)(&spriteBuffer[bufCnt-(xCount+PIXEL_WRITE_BUFFER_SIZE_B)]), pixelWriteBufferSizeWords);
				p = PIXEL_WRITE_BUFFER_SIZE_B-1;
				
			}else{
				//copy the next 8 bytes
				word_8byte_copy((void *)&pixelWords, (void *)(&spriteBuffer[bufCnt+xCount]), pixelWriteBufferSizeWords);
				p = PIXEL_WRITE_BUFFER_SIZE_B-1;
				
			}
			
			//don't draw on this line if past the width of the image
			if(xCount + PIXEL_WRITE_BUFFER_SIZE_B > xWidth){
					xOverflow = xWidth - xCount;
					//zero out any other pixel data
					for(p = xOverflow; p < PIXEL_WRITE_BUFFER_SIZE_B; p++){
						pixelWords[p] = 0;
					}
			}
			
			//don't draw if you've gone over the screenwidth
			if(screenWrap == 0){
				if(xOff + xCount + PIXEL_WRITE_BUFFER_SIZE_B > SCREEN_WIDTH){
					xOverflow = SCREEN_WIDTH - (xOff+xCount);
					//zero out any other pixel data
					for(p = xOverflow; p < PIXEL_WRITE_BUFFER_SIZE_B; p++){
						pixelWords[p] = 0;
					}
					//advance up to the end of this row
					//xCount = xWidth;
				}
			}
			
			//write to framebuffer four words at a time
			word_8byte_copy((void *)(frameBuffer8+fbOff), (void *)&pixelWords, pixelWriteBufferSizeWords);
			
			p = 0;

			//increment to next position in FrameBuffer
			fbOff += PIXEL_WRITE_BUFFER_SIZE_B;//- xOverflow;

		}//end for xCount
		
		//increment a row if not "reversed" 
		if(mirror != IS_MIRRORED){
			bufCnt = bufCnt + xWidth;
		}

		//reset the "line" in framebuffer if past the width of the image
		fbOff = fbOff + (SCREEN_WIDTH - ((xWidth) + xOff)) + xOff;
		
	}//end for rowPos
	//write any "leftover pixels? shouldn't happen
	
}
Thanks again for the tips guys

ammianus
Very interested
Posts: 124
Joined: Sun Jan 29, 2012 2:10 pm
Location: North America
Contact:

Post by ammianus » Sun Jun 10, 2012 8:56 pm

Chilly Willy wrote:Clearly, if all you want is to avoid clipping, you just need an extra 4 words on each side of the screen, making the buffer 336x224.
Good idea

So the line table I populate would reference to my "line" + 4 words. I don't need to put anything in that boundary areas unless I go over the limit of the screen edge and therefore I don't have to worry about the clipping.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Sun Jun 10, 2012 11:20 pm

ammianus wrote:
Chilly Willy wrote:Clearly, if all you want is to avoid clipping, you just need an extra 4 words on each side of the screen, making the buffer 336x224.
Good idea

So the line table I populate would reference to my "line" + 4 words. I don't need to put anything in that boundary areas unless I go over the limit of the screen edge and therefore I don't have to worry about the clipping.
Each entry in the line table is the number of words from the start of the vram to a group of 320 pixels. So for 4 words on each side of each line, the entries would be

256 + 4
256 + 4 + 168
256 + 4 + 168 + 168
256 + 4 + 168 + 168 + 168
etc.

That's using the "standard" starting offset in the frame buffer of 512 bytes. You can add an offset to each line for scrolling, and combine the overdraw clipping with the horizontal scrolling.

ammianus
Very interested
Posts: 124
Joined: Sun Jan 29, 2012 2:10 pm
Location: North America
Contact:

Post by ammianus » Sun Jul 15, 2012 2:48 pm

Ok so I went on a really long vacation / road trip. I had some time to work on my game but not enough time to reply to this thread.

Your tips worked great.

I did the extra 8bytes on either side thing.
256 + 4
256 + 4 + 168
256 + 4 + 168 + 168
256 + 4 + 168 + 168 + 168
etc.

I also started writing everything directly to the FB, no intermediate array or processing. I am still using the word_8byte_copy function, to the cached address FB address and from the cached SDRAM addresses.

This all resulted in a huge jump in performance, for drawing and scrolling the floor tiles. Note I am still redrawing the entire background with each iteration of the loop, I haven't even tried just redrawing some background where the character was.

Resulting video:
http://youtu.be/VeM5PqBVr8g


I am working on the glitch where any sprite I try to draw that is partially off the left side of the screen doesn't get drawn at all. Probably some logic error I had when I didn't want draw things off the screen.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Sun Jul 15, 2012 6:32 pm

You've come a long way from your first demo. You're doing more and it's still much faster. :D

Post Reply