Genny and 3D

Ask anything your want about Megadrive/Genesis programming.

Moderator: BigEvilCorporation

peekpoke
Interested
Posts: 37
Joined: Fri Feb 01, 2013 1:11 am

Post by peekpoke » Fri Feb 22, 2013 11:53 am

Yoohoo!! Thx! Your method working! Got it working in 3.4.6, dunno why it didnt work yesterday - but i think, its just shows how is important well sleep xD

I just got first test run - for now it seems to give +1 stable fps, but im sure its only beginning.

update after few minor "tweaks", its even +2 fps, and much smoothly movement than before! Here is binary:

raytest4.bin
raytest4fs.bin "fullscreen" version (using SGDK bitmap routines - seems even more smooth!!)


Thats, how it works for me:

Code: Select all

        while( heightY )
        {
            register u16 c = texturePtr[ textureY & ~63 ];
	    setBMP(screenX, offsetY, c);
	    offsetY += 1;
            textureY += textureInc;
            heightY--;
        }
But still, for 3.4.6 cant use "pointer array" (sry, dunno how its called propertly :( ), so construction like this:

Code: Select all

const u8 *textures[] =
{
    0,
    tex_wood,
    tex_bricks,
    tex_bird,
    tex_bricks2
};

...
blablabla
...

register u16 *texturePtr = &textures[ slice->textureId ][ slice->textureOffset ];
not works, instead, i need to point to exact texture array, for ex (tex_bricks):

Code: Select all

register u16 *texturePtr = &tex_bricks[ slice->textureOffset ];
This one works.
update seems that its somehow connected to data initialization, done by crt0? because, this construction works:

Code: Select all

u16 *textures[5];

blabla

void main(){
textures[0] = 0;
textures[1] = &tex_wood;
textures[2] = &tex_bricks;
textures[3] = &tex_bird;
textures[4] = &tex_bricks2;
Btw, next candidate on tweak is this ugly modulo for angle i used to keep angle in 0-360:
costab[(((angle % 359) + 359) % 359)]; //somehow just "% 359" not enough in this environment and gives wrong modulo

I suppose, that its produce very ugly asm code (to be true, didnt looked), and its happens three times per each one vertical line - need to transform my sincos arrays to some range which is pow 2 aligned, so instead of it use just one logical AND. for ex:
costab[(angle & 511)]; //costab precalc array for -2pi/2pi as range 0-512.
update did it - it give little boost but also few minor rendering artifacts:
raytest5.bin
raytest5fs.bin "fullscreen" version (5 to 12 FPS!)

Also need to try to somehow force drawSlice and my setBMP to be inline. As my other test (just simple draw 64x64 texture on screen) showed, that doing, what setBMP does, inside of drawing cycle (instead of calling setBMP), gives +4 fps. But setting setBMP to inline didnt gave same boost, so, i suppose, gcc ignored this inline...

Thank you very much!

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Fri Feb 22, 2013 7:03 pm

Nice work... and apparently that old gcc is going to give you fits. :D

Yes, there are still optimizations to be made before you get to assembly. Since you have setBMP() in the middle of the loop, try optimizing that and making it inline.

peekpoke
Interested
Posts: 37
Joined: Fri Feb 01, 2013 1:11 am

Post by peekpoke » Sat Feb 23, 2013 9:40 am

Thank you! But to be true, it would not be possible without ur help! Now will try to find some other ways to optimize and try to remove rendering bugs.

And look forward to see ur new direct color dma cd demo :D

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Sat Feb 23, 2013 7:08 pm

I've almost got mine working... it renders the level, but has problems when you try moving. Have to fix a few bugs...

peekpoke
Interested
Posts: 37
Joined: Fri Feb 01, 2013 1:11 am

Post by peekpoke » Sat Feb 23, 2013 8:44 pm

Something with collision detection? I didnt tried to enable/test those in LCDWolf, so cant even suggest anything on this theme :(

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Sat Feb 23, 2013 10:11 pm

peekpoke wrote:Something with collision detection? I didnt tried to enable/test those in LCDWolf, so cant even suggest anything on this theme :(
No, there's no collision detection (still commented out) - it's abug in the raycasting - you get columns when you shouldn't. I already fixed an issue with the interpolation and slice drawing - that was messed up. Here's the fix for that

Code: Select all

        castRay( px, py, pa, 0, &sliceA );

        for( x=4; x<=128; x+=4 )
        {
            castRay( px, py, pa, x, &sliceB );

            if( shouldInterpolate( &sliceA , &sliceB ) )
            {
                sliceX[1].textureId     = sliceA.textureId;
                sliceX[1].textureOffset = ( sliceA.textureOffset + sliceB.textureOffset ) >> 1 ;
                sliceX[1].sliceHeight   = ( sliceA.sliceHeight   + sliceB.sliceHeight   ) >> 1 ;
            }
            else castRay( px, py, pa, x - 2, &sliceX[1] );

            sliceX[0].textureId     =   sliceA.textureId;
            sliceX[0].textureOffset = ( sliceA.textureOffset + sliceX[1].textureOffset ) >> 1 ;
            sliceX[0].sliceHeight   = ( sliceA.sliceHeight   + sliceX[1].sliceHeight   ) >> 1 ;

            sliceX[2].textureId     =   sliceX[1].textureId;
            sliceX[2].textureOffset = ( sliceX[1].textureOffset + sliceB.textureOffset ) >> 1 ;
            sliceX[2].sliceHeight   = ( sliceX[1].sliceHeight   + sliceB.sliceHeight   ) >> 1 ;

            drawSlice( x - 4, &sliceA );
            drawSlice( x - 3, &sliceX[0] );
            drawSlice( x - 2, &sliceX[1] );
            drawSlice( x - 1, &sliceX[2] );

            sliceA = sliceB;
        }
Note that rendering goes from 4 to 128 inclusive, and draws at x-4 to x-1, which corresponds to 0 to 127. Before, column 0 was skipped, as were columns 125, 126, and 127. You can see that in the demo I posted elsewhere - one column on the left missing, and three on the right. Also, he did his ray cast for sliceX[1] in the wrong place: (x - 4) or exactly the same as sliceA. I changed it to (x - 2) and now you don't get bad edges between blocks (obvious in the starting position on the left).

It still has a bug where columns appear when they shouldn't (or much closer than they should be). The posted code is certainly pretty buggy.

peekpoke
Interested
Posts: 37
Joined: Fri Feb 01, 2013 1:11 am

Post by peekpoke » Sun Feb 24, 2013 8:56 am

Thank you! I confirm, after your fix it looks definitely much better! About those "ghost" columns appearing - on first thought, maybe it have something with wrong distance calculation, something here:

dd = sqrt( dx*dx + dy*dy ) * cos( offsetAngle );

Why i think it, because when i just switch between realtime sqrt calculation and precalc sqrt lut, those "ghost" columns look different (there are much more of them if i use lut).

Will try to look at this right now, but dont expect that ill succeed at this, my skills are very low xD

Btw, there is another way to calc distance:

dd = ABS(dx)/cos(offsetAngle) = ABS(dy)/sin(offsetAngle)

(found it in Ray-Casting Tutorial by Permadi, here)

Will try to play with it now.

btw2, also, if im not wrong, those ghost columns appears only at every ~90 degrees? I suppose fix is somewhere near...

Definitely, bug is related to precalc sincos values at quadrant edges, and the bug affected part here (i.e. the bug affect x,y calculations - every time "ghost" column appeared, calculated x,y, dd values looks crazy and out of line with nearby columns x,y,dd values):

Code: Select all

        do
        {            
            xNext_x = blockX <<5;
            if( cosine > 0 ) xNext_x += BLOCK_SIZE;
            xNext_x -= x;
            xNext_l = xNext_x * oneOverCosine;
            
            yNext_y = blockY <<5;
            if( sine > 0 ) yNext_y += BLOCK_SIZE;
            yNext_y -= y;
            yNext_l = yNext_y * oneOverSine;
            		
            if( xNext_l < yNext_l )
            {
                yAxisWall = 1;
                xNext_y = (xNext_l * sine) >> 11;
                blockX += cosine > 0 ? 1 : -1 ;
                x += xNext_x;
                y += xNext_y;
            }	
            else
            {
                yAxisWall = 0;
                yNext_x = (yNext_l * cosine) >> 11;
                blockY += sine > 0 ? 1 : -1 ;
                x += yNext_x;
                y += yNext_y;
            }
            
            result->textureId = ( level[ (u16)( blockX + MAP_WIDTH * blockY ) ] );
		}
        while( result->textureId == 0 );
At first, i thought it have something related to div/0 (i.e. oneOverSine and oveOverCosine) - but after quick test (just changed 0s in sincos tables), seems that no.

btw3 (totally useless), i also did port of LCDWolf to pc/allegro, checked it right now, it doesnt have any "ghost" walls as in gen/scd version - but i didnt switch out float and rt sincos calculations - so, maybe you right again (as always) - some fixpoint overflow (in fixpoint version).

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Sun Feb 24, 2013 6:49 pm

Yes, I noticed the ghost columns are at 90 degree intervales, too. It must be related to the sin/cos and fixed point representation. I'm thinking it's related to the 1/cosine and 1/sine terms...

Other than that ghost column issue (I like that term), it works well. I like the interpolated columns - you only raycast 33 to 65 columns, then interpolate all the rest so depending on the level and how close you are to a wall, you'll only raycast 33 to 40-some columns.

EDIT: I tried using a table for 1/sine and 1/cosine setup like how the original float code worked:

Code: Select all

        for( i=0; i<403; i++ )
        {
            float cosine = cos( (float)i / 64.0 );
            float sine   = sin( (float)i / 64.0 );

            if( sine   <  0.0001f && sine   >= 0.0f ) sine   =  0.0001f;
            if( sine   > -0.0001f && sine   <  0.0f ) sine   = -0.0001f;
            if( cosine <  0.0001f && cosine >= 0.0f ) cosine =  0.0001f;
            if( cosine > -0.0001f && cosine <  0.0f ) cosine = -0.0001f;

            float oneOverCosine = 1.0f / cosine;
            float oneOverSine   = 1.0f / sine;

            sintab[i] = (fixed_t)(sine * 65536.0);
            costab[i] = (fixed_t)(cosine * 65536.0);
            oostab[i] = (fixed_t)(oneOverSine * 65536.0);
            ooctab[i] = (fixed_t)(oneOverCosine * 65536.0);
        }
Makes no difference, so it's not the sine/cosine and inverse.

EDIT 2: Found it! The distance is overflowing 16.16 numbers. I changed the line like this

Code: Select all

//      fixed_t dd = FIX_MUL( FIX_SQRT( FIX_MUL( dx, dx ) + FIX_MUL( dy, dy ) ), FIX_COS( offsetAngle ) );
        fixed_t dd = FIX_MUL( I2F( iSqrt(( F2I(dx) * F2I(dx) ) + ( F2I(dy) * F2I(dy) ) ) ), FIX_COS( offsetAngle ) );
and the ghost columns go away. Of course NOW it crashes if the wall gets too far away, so now something else is overflowing.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Sun Feb 24, 2013 11:26 pm

Okay, I cut the 1/sin 1/cos to 500 max, maybe the distance calc 24.8, and it seems to work fine now. This is a good example of what you can go through when trying to port something for the PC to an old integer system - it's not quite as simple as changing floats to fixed point numbers...

Anywho, the latest demo arc is in the SCD forum in my thread for DCD mode demos.

peekpoke
Interested
Posts: 37
Joined: Fri Feb 01, 2013 1:11 am

Post by peekpoke » Mon Feb 25, 2013 8:21 am

Perfect! Thank you very much! To be true, I was sure that u will find this bug. As for myself, tried yesterday many silly things, but nothing worked - at some moment, i replaced iSqrt() on another implementation - and for a first view, it was looked that bug disappeared - i even make update to my post. But then i noticed, that only sliceheight became ok, but still textures for those columns was taken from other position xD

Now its possible to create nice homebrew raycast fps games for genny/scd - its so cool! Thank you!

And its very interesting to try to find some more ways to optimise this raycaster.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Mon Feb 25, 2013 8:30 am

peekpoke wrote:Perfect! Thank you very much! To be true, I was sure that u will find this bug. As for myself, tried yesterday many silly things, but nothing worked - at some moment, i replaced iSqrt() on another implementation - and for a first view, it was looked that bug disappeared - i even make update to my post. But then i noticed, that only sliceheight became ok, but still textures for those columns was taken from other position xD

Now its possible to create nice homebrew raycast fps games for genny/scd - its so cool! Thank you!

And its very interesting to try to find some more ways to optimise this raycaster.
There's plenty of room for improvement - I only use assembly for the fixed point multiply, and only some of the C has been redone for better performance, mainly the drawSlice() routine.

You might have noticed in the other thread that I added collision detection with the walls. I replaced his code with something better that works. :D Since you can no longer walk through walls, I also made sure you could reach every area in the map.

I was showing my brother the latest demo, and he was impressed by the speed it was getting. Not bad for a measly 12.5 MHz 68000. 8)

Anywho, if you can work the changes I made into your MD demo, I'd love to see how that works. It shouldn't be too much slower with a 7.6 MHz 68000. You won't be doubling the pixels, but you do need to worry about cell order and transferring data to the vram.

peekpoke
Interested
Posts: 37
Joined: Fri Feb 01, 2013 1:11 am

Post by peekpoke » Mon Feb 25, 2013 8:54 am

Trying to implement all your changes into my md demo right now :) As about gfx output, seems that Stef's bitmap routines is rly very awesome, because, just changing my own setBMP() on Stef's setPixel() gives noticeable boost. So, first priority for me now, just to clean a mess in my code, and implement all your changes.

Dunno how to do it better - continue on 3.4.6/sgdk, or move to 4.6.2+recompiled sgdk libs - but anyway, ill try to do it as fast as possible (very hope, that ill finish it today evening), and post result here!

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Post by Stef » Mon Feb 25, 2013 9:07 am

If that can help, my "work in progress" version of SGDK has faster bitmap mode (but with some restrictions as fixed size and forced extended vblank).
It actually provides 256x160 pixels bitmap with 20 FPS speed in NTSC and 25 FPS in PAL :)
Active period is 160 lines only and can be used for bitmap rendering while blank period is used for bitmap transfer to VRAM (with tile conversion).
In your case I don't think you reach the 20 FPS limit so it can really suit to your needs :)

peekpoke
Interested
Posts: 37
Joined: Fri Feb 01, 2013 1:11 am

Post by peekpoke » Mon Feb 25, 2013 1:46 pm

Ok, got something worth to show. No collision detection implemented yet. No more ghost walls. Many-many fixes by Chilly Willy. Only integer math, only hardcore xD (no float, no fixed point used at all). Used gcc 4.6.2 with recompiled sgdk (GenesisDev04) libs. Fps ranges from min 5 (close to wall) to max 10.

raytest6fs.bin

update
Collision detection by Chilly Willy - implemented! (copypasted, if say more correctly :P :lol: )

raytest7fs.bin



Chilly Willy, u will laught maybe - but i tried hard to somehow implement ur ghost wall fix to my code, even wanted to transfer all my math to fixed point, and copypaste most of your code - but thought that it have no sense, because it will became same as ur sega cd demo. So, i tried - but nothing works, i almost really gave up :lol: and then, as last try, just changed this:

Code: Select all

        s16 xNext_x, xNext_y, xNext_l;
        s16 yNext_x, yNext_y, yNext_l;
on this:

Code: Select all

        u16 xNext_x, xNext_y, xNext_l;
        u16 yNext_x, yNext_y, yNext_l;
And ghost walls disappeared!! :shock:
Ill send u full source code in pm.

Stef, Thank you very much! Would be very glad to test ur wip version. Is it publically available?

So, what ill try to do next: implement Chilly Willy's collision detection.

So, what i think is possible to do next: I removed all divide in code, but there is still few muls, so, at least, it may be somehow optimised (maybe asm inline or something, like Chilly Willy used in his version).

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Post by Stef » Mon Feb 25, 2013 6:06 pm

Unfortunately it is not yet available and i have some serious internet connection troubles so i won't be able to release it soon.
But think it would be nice to give a shoot when it will be released, just to see if it helps :)

Post Reply