Page 1 of 2

Game of Life demo

Posted: Tue Dec 24, 2013 1:19 am
by ehaliewicz
So, I made a game of life simulation with GCC and the SGDK.
(thanks stef!)

Nothing too fancy, but it works on real hardware :D

Image


https://dl.dropboxusercontent.com/u/358 ... yte-v1.bin

Controls
- Start - toggle simulation
- D-pad - when paused, move cursor
- A - when paused, toggle a cell
- B - when paused, randomize grid
- C - when paused, clear grid

I have a few ideas for improvement and optimization that I will hopefully be able to roll out into a new version pretty soon.

Posted: Tue Dec 24, 2013 2:03 pm
by Christuserloeser
always nice to see new software for MD! Looking forward to the update :D

Posted: Tue Dec 24, 2013 2:14 pm
by TmEE co.(TM)
Very cool !

Update - game of life v2

Posted: Fri Dec 27, 2013 12:15 am
by ehaliewicz
Update has arrived :)

https://dl.dropboxusercontent.com/u/358 ... yte-v2.bin

I've improved the controls, added a grid, and a generation counter.
The resolution is also quadrupled, so it's quite a bit slower.

I still have an different algorithm to try, which I hope will make it run fullspeed. It's tricky to implement though, so we'll see when it's done.

Controls

A: When paused, toggle cell, when simulating, toggle grid display.
B: When paused, randomize grid, when simulating, toggle generation counter.
C: Clears grid when paused
D-Pad: Moves cursor when paused

Glider gun
Image

Random cells
Image

Edit: This has not yet been tested on hardware, so let me know if it doesn't work properly.

EDIT 2: Made a performance improvement by skipping multiple empty cells at once rather than single cells (by fetching 4 1-byte cells at once in the main loop)
Also, A and B are switched while simulating.
https://dl.dropboxusercontent.com/u/358 ... e-v2.2.bin

Posted: Fri Dec 27, 2013 7:50 pm
by r57shell
too slow :(. where is vsync? :(
is there source available?

Posted: Fri Dec 27, 2013 7:55 pm
by ehaliewicz
r57shell wrote:too slow :(. where is vsync? :(
is there source available?
Did you try the updated version? It's a bit faster.

As for vsync, when the screen is really hectic updating takes longer than a VBLANK (at least, I think that's the issue) so you'll always have tearing.

I can't really do much about the speed without some fairly hairy optimization which I'm currently working on.


I'll post the source after cleaning it up a bit.

Posted: Fri Dec 27, 2013 8:46 pm
by r57shell
You can try prebuild table of tranlations, calculation of new state will be like:
C[x][y] = T[a][c];
where x, y - cell position. and a,b,c bitmasks of 3 rows.
or somehow in another way.
so, you don't need then make loop for 9 cells, and make calculations.
I don't know now how to make it in best way...
But, prebuild is definitely better, than mere calculations.

Posted: Fri Dec 27, 2013 10:28 pm
by ehaliewicz
r57shell wrote:You can try prebuild table of tranlations, calculation of new state will be like:
C[x][y] = T[a][c];
where x, y - cell position. and a,b,c bitmasks of 3 rows.
or somehow in another way.
so, you don't need then make loop for 9 cells, and make calculations.
I don't know now how to make it in best way...
But, prebuild is definitely better, than mere calculations.


A simple lookup table doesn't speed it up because the bottleneck is the fact that I do two passes over the arrays for each generation. One for a memcopy, the second for processing.

It skips over empty space quickly and keeps each cell's alive neighbor count in the same byte as the cell's state so I don't need to do more than one memory read to check each 4 cells.
I only write to memory when a cell changes, in which case I update the alive neighbor counts for all neighboring cells.

The only way I can think of speeding up this algorithm up significantly is by interleaving the memory copy with the cell processing, but it's tricky to get right, so instead I'm changing it to keep a list of all changed cells to process. That way I can avoid copying and only process cells that need to be processed.

Posted: Fri Dec 27, 2013 10:47 pm
by r57shell
Do you use lookup table to draw?
And, you don't need to do copy.
You can make something like this

Code: Select all

for (int x=1; x < width+1; ++x)
    CountsNew[x][0]=0;
for (int y=1; y < height+1; ++y)
    CountsNew[0][y]=0;
for (int y=1; y < height+1; ++y)
    for (int x=1; x < width+1; ++x)
    {
        CountsNew[x+1][y+1]=0;
        if(there_will_be_cell(x,y))
            add_count_around(x,y);
    }
and then swap Counts, and CountsNew pointers (different addreses)

Posted: Fri Dec 27, 2013 11:00 pm
by ehaliewicz
r57shell wrote:Do you use lookup table to draw?
And, you don't need to do copy.
You can make something like this

Code: Select all

for (int x=1; x < width+1; ++x)
    CountsNew[x][0]=0;
for (int y=1; y < height+1; ++y)
    CountsNew[0][y]=0;
for (int y=1; y < height+1; ++y)
    for (int x=1; x < width+1; ++x)
    {
        CountsNew[x+1][y+1]=0;
        if(there_will_be_cell(x,y))
            add_count_around(x,y);
    }
and then swap Counts, and CountsNew pointers (different addreses)
I need to copy the array because I don't reset the counts to 0. I don't fully recalculate the counts each pass, I increment and decrement them as cells turn on and off.

I do use a lookup table for drawing 4 cells per each 8x8 tile.

Posted: Fri Dec 27, 2013 11:05 pm
by r57shell
ok then only lookup table will help.
hmm! you can inc/dec by adding long.
*(long *)&Counts[x][y-1] += 0x01010100
*(long *)&Counts[x][y-1] += 0xFFFFFF00

Posted: Fri Dec 27, 2013 11:07 pm
by ehaliewicz
r57shell wrote:ok then only lookup table will help.
Haha, well, my solution already has most of the benefit of the lookup table anyway.

A lookup table requires a bunch of reads (and shifts/ands to create the bitmap) to check cells, I only require one read (and a couple of compares) to check a cell, then a bunch of writes if it needs to change.

So it's a tradeoff either way.

r57shell wrote:*(long *)&Counts[x-1][y] += 0x01010100
*(long *)&Counts[x-1][y] += 0xFEFEFE00
That's great :D thanks

Posted: Fri Dec 27, 2013 11:14 pm
by r57shell
Adding long can be not faster though. I don't really know.
Updated previous post, and I'm still thinking that bits->bits lookup is possible.
Do you use DMA to write name table into VRAM? I don't know which way is faster.

Posted: Fri Dec 27, 2013 11:23 pm
by ehaliewicz
r57shell wrote:Adding long can be not faster though. I don't really know.
Updated previous post, and I'm still thinking that bits->bits lookup is possible.
Do you use DMA to write name table into VRAM? I don't know which way is faster.
I don't do any DMA except for perhaps when loading tiles on startup..
I don't think it makes much of a difference because I'm only drawing one tile at a time anyway.

Thanks for the ideas though, I'm still fiddling around with it and I'll post again once it's faster :)

Posted: Sat Jan 04, 2014 11:42 pm
by ehaliewicz
Ok, so I've figured out what the bottleneck was.

It's not the memcopy, or even the cell processing, but the drawing.

I restructured the program to use DMA to draw the whole screen at once, so that's very quick now.

I have to speed up the algorithm (or use a different one) if I want it to be faster now.


Edit:
https://dl.dropboxusercontent.com/u/358 ... e-v2.4.bin

No more tearing!