GLide 32x
Moderator: BigEvilCorporation
GLide 32x
Although I have real difficulties seeing my code running on the 32X (debugger, anyone ?), I have a silly idea in the head : a rendering library, let's call it GLide, for the 32X.
link : http://www.alasir.com/software/glide/index.html
The 68k would put data on the Comm Port, and each SH2 would poll it for available data. And each SH2 would run a primitive. So, not exactly SLI, as I've already read here, but I think that way would be faster.
Except, synchronization. If the 2 SH2 are already doing something, and the 68k has several primitives to give, it has to wait for one SH2 to finish, to empty the CommPort, for being able to put another primitive.
More over, if one SH2 is defining a Vertex the other SH2 has to use (for a triangle for example), the 2nd SH2 has to wait for the 1rst to finish its job.
By the way, how to poll the CommPort ? Here's a way to optimize the access for a SH2 :
- tell everyone (68k and the other SH2) I am reading the CommPort
- read cache (long) CommPort(0)
- read cache (long) CommPort(4)
...
- read cache (long) CommPort(12)
- tell everyone (68k and the other SH2) the CommPort is available
- run my job
- invalidate CommPort cache
- poll CommPort
The CommPort reading would be even faster if the CommPort was in SDRAM : the hardware manual states the SDRAM is optimized for cache line reads. Thus, each read is done by 16 bytes. So it would need only 1 (one) read to fill the cache !!! Alas, SDRAM is not CommPort.
Wait n' see.
link : http://www.alasir.com/software/glide/index.html
The 68k would put data on the Comm Port, and each SH2 would poll it for available data. And each SH2 would run a primitive. So, not exactly SLI, as I've already read here, but I think that way would be faster.
Except, synchronization. If the 2 SH2 are already doing something, and the 68k has several primitives to give, it has to wait for one SH2 to finish, to empty the CommPort, for being able to put another primitive.
More over, if one SH2 is defining a Vertex the other SH2 has to use (for a triangle for example), the 2nd SH2 has to wait for the 1rst to finish its job.
By the way, how to poll the CommPort ? Here's a way to optimize the access for a SH2 :
- tell everyone (68k and the other SH2) I am reading the CommPort
- read cache (long) CommPort(0)
- read cache (long) CommPort(4)
...
- read cache (long) CommPort(12)
- tell everyone (68k and the other SH2) the CommPort is available
- run my job
- invalidate CommPort cache
- poll CommPort
The CommPort reading would be even faster if the CommPort was in SDRAM : the hardware manual states the SDRAM is optimized for cache line reads. Thus, each read is done by 16 bytes. So it would need only 1 (one) read to fill the cache !!! Alas, SDRAM is not CommPort.
Wait n' see.
I'm working hard on rasterization.
I've found this site
http://www.devmaster.net/articles/softw ... /part3.php
It is quite handy, but not as easy as I would have guessed. So it's quite hard, for sure. For example, I haven't decided yet if I'd go for SLI or each CPU drawing a primitive.
Or sorting ! I have to sort each primitive from far to close.
And what about optimizing ? Hidden Surface Removal anyone ?
But I've got some answers.
FIFO is not the way. I'd use CommPort.
I can use either Packed Color or Direct Color.
I've learned the 32x is said to draw 50kpolys/s. That's huge. And very few in the mean time (~2kpoly/frame).
While I haven't written a single line, not even a
add #1,r1
I'm learning, and learning, and learning !!!! But I'm not sure my brain will handle everything ! :D
My final aim, for now, and if you know me you know I can change my mind faster than telling it, is to mame something that'd look Dragon Quest VIII.
No play, no gain.
I've found this site
http://www.devmaster.net/articles/softw ... /part3.php
It is quite handy, but not as easy as I would have guessed. So it's quite hard, for sure. For example, I haven't decided yet if I'd go for SLI or each CPU drawing a primitive.
Or sorting ! I have to sort each primitive from far to close.
And what about optimizing ? Hidden Surface Removal anyone ?
But I've got some answers.
FIFO is not the way. I'd use CommPort.
I can use either Packed Color or Direct Color.
I've learned the 32x is said to draw 50kpolys/s. That's huge. And very few in the mean time (~2kpoly/frame).
While I haven't written a single line, not even a
add #1,r1
I'm learning, and learning, and learning !!!! But I'm not sure my brain will handle everything ! :D
My final aim, for now, and if you know me you know I can change my mind faster than telling it, is to mame something that'd look Dragon Quest VIII.
No play, no gain.
"I can use either Packed Color or Direct Color. "
I think packed color can be super fast Games like Stellar Assault seems to use it and it looks great (bit grainy but perfect for a space shooter).
Direct color can be super cool too... you may restrict the screen width to 16/9 to be sure of the framebuffer space and drawing speed
As for dragon quest, you'll use scalled 2D for objecs/characters?
A last question, will your 32x engine work in automomy? I mean, the 68K would handle the gameplay and the 32x all the silly stuff?
Mad guy
I think packed color can be super fast Games like Stellar Assault seems to use it and it looks great (bit grainy but perfect for a space shooter).
Direct color can be super cool too... you may restrict the screen width to 16/9 to be sure of the framebuffer space and drawing speed
As for dragon quest, you'll use scalled 2D for objecs/characters?
A last question, will your 32x engine work in automomy? I mean, the 68K would handle the gameplay and the 32x all the silly stuff?
Mad guy
320x200... Well, I always wondered if having a width multiple of 8 (256pixels or 512 in 32x case) in the framebuffer would speed up a lot the vertical seeking of the cpu's...
You can always use the genesis display to "borderize" the 32x display.
In this case, 256*224 pixels would be great for the buffer.
Or it can also be 512*127 pixels buffer (half the vertical size)...
Well, its probably just me who is nuts about 8 multipes.
About the cellshaded characters.. wow WOW 2D sprites are still a solution if you cannot reach decent framerate
You can always use the genesis display to "borderize" the 32x display.
In this case, 256*224 pixels would be great for the buffer.
Or it can also be 512*127 pixels buffer (half the vertical size)...
Well, its probably just me who is nuts about 8 multipes.
About the cellshaded characters.. wow WOW 2D sprites are still a solution if you cannot reach decent framerate
Rasterization
Here's my way of rasterization. Let's suppose a triangle primitve. It's basically made of 3 points, or vertices A, B and C. This 3 points define 3 edges. They are [AB], [BC], and [CA]. First of all, let's compute the slope for theses edges. But this slope is the reverse of what I've learned in high-school.
I've learned y = ax + b, a and b constant, and "a" being (yM-yN)/(xM-xN) for any M and N.
Here, with computer, what I can do very easily is draw a horizontal line. So, for a given "y", I have to get the beginning "x" and the ending "x". Thus, my slope, is (xM-xN)/(yM-yN).
Step 1
For the [AB] segment, the slope is (xB-xA)/(yB-yA). Then, for every "y" between yA and yB, "x" is equal to xA + slope*y. I get a set of points.
Do the same for [BC] and [CA] (Step 1-b and 1-c)
Step 2
Then, for each segment, I take the corresponding set of points.
For every point M on this segment, I search a point in the 2 others segments, that has the same "y", and with abscisse being greater than xM.
If such a point exist, I draw a line between these 2 points.
And so on with the 2 others segments. (Step 2-b and 2-c)
That's a lot of job.
Does anybody have a better way of rasterization, or book/site to advise me ?
Nevertheless, there are ways of optimizing. Drawing an horizontal line, for example, can be done by hardware with 32X VDP FILL. But then, you've got to synchronize, or implement a stack. The most notable optimization would be to use the 2 SH2 in a SLI way, master computing even lines, and slave computing odd ones. But drawing lines (the whole step 2 thing) isn't that computive intensive. Quoique. Moreover, defining set points in step 1would fit really nicely in private RAM, then a primitive/CPU would also be nice.
Lot of ways of thinking, lot of ways of writing ...
I've learned y = ax + b, a and b constant, and "a" being (yM-yN)/(xM-xN) for any M and N.
Here, with computer, what I can do very easily is draw a horizontal line. So, for a given "y", I have to get the beginning "x" and the ending "x". Thus, my slope, is (xM-xN)/(yM-yN).
Step 1
For the [AB] segment, the slope is (xB-xA)/(yB-yA). Then, for every "y" between yA and yB, "x" is equal to xA + slope*y. I get a set of points.
Do the same for [BC] and [CA] (Step 1-b and 1-c)
Step 2
Then, for each segment, I take the corresponding set of points.
For every point M on this segment, I search a point in the 2 others segments, that has the same "y", and with abscisse being greater than xM.
If such a point exist, I draw a line between these 2 points.
And so on with the 2 others segments. (Step 2-b and 2-c)
That's a lot of job.
Does anybody have a better way of rasterization, or book/site to advise me ?
Nevertheless, there are ways of optimizing. Drawing an horizontal line, for example, can be done by hardware with 32X VDP FILL. But then, you've got to synchronize, or implement a stack. The most notable optimization would be to use the 2 SH2 in a SLI way, master computing even lines, and slave computing odd ones. But drawing lines (the whole step 2 thing) isn't that computive intensive. Quoique. Moreover, defining set points in step 1would fit really nicely in private RAM, then a primitive/CPU would also be nice.
Lot of ways of thinking, lot of ways of writing ...
-
- Very interested
- Posts: 3131
- Joined: Thu Nov 30, 2006 9:46 pm
- Location: France - Sevres
- Contact:
Re: Rasterization
Imo, the best way of rasterize a triangle : rasterize horizontal lines between [AB] and [AC] then between [BC] and [AC] where :ob1 wrote:Here's my way of rasterization. Let's suppose a triangle primitve. It's basically made of 3 points, or vertices A, B and C. This 3 points define 3 edges. They are [AB], [BC], and [CA]. First of all, let's compute the slope for theses edges. But this slope is the reverse of what I've learned in high-school.
I've learned y = ax + b, a and b constant, and "a" being (yM-yN)/(xM-xN) for any M and N.
Here, with computer, what I can do very easily is draw a horizontal line. So, for a given "y", I have to get the beginning "x" and the ending "x". Thus, my slope, is (xM-xN)/(yM-yN).
Step 1
For the [AB] segment, the slope is (xB-xA)/(yB-yA). Then, for every "y" between yA and yB, "x" is equal to xA + slope*y. I get a set of points.
Do the same for [BC] and [CA] (Step 1-b and 1-c)
Step 2
Then, for each segment, I take the corresponding set of points.
For every point M on this segment, I search a point in the 2 others segments, that has the same "y", and with abscisse being greater than xM.
If such a point exist, I draw a line between these 2 points.
And so on with the 2 others segments. (Step 2-b and 2-c)
That's a lot of job.
Does anybody have a better way of rasterization, or book/site to advise me ?
Nevertheless, there are ways of optimizing. Drawing an horizontal line, for example, can be done by hardware with 32X VDP FILL. But then, you've got to synchronize, or implement a stack. The most notable optimization would be to use the 2 SH2 in a SLI way, master computing even lines, and slave computing odd ones. But drawing lines (the whole step 2 thing) isn't that computive intensive. Quoique. Moreover, defining set points in step 1would fit really nicely in private RAM, then a primitive/CPU would also be nice.
Lot of ways of thinking, lot of ways of writing ...
- A is the top Y most point
- B is the middle Y point
- C is the bottom Y most point
Use bresenham line algo type to determine you [AB] [AC] and [BC] segment.
I did some polygon draw routine in C in my mini dev kit, check the vdp_bmpx.c and vdp_bmpw.h files in the lib directory, maybe it can help you :
http://www.spritesmind.net/_GenDev/foru ... 14&start=0
Last edited by Stef on Tue Feb 13, 2007 10:43 pm, edited 1 time in total.
Re: Rasterization
Great ressource !!! Thank you !!!Stef wrote:Use bresenham line algo
http://en.wikipedia.org/wiki/Bresenham's_line_algorithm
Not so new. It was invented in 1962 !!! I guess the veleoppers back in 1993-94 knew it ;)
And for our skills, wait for my first demo !!! For now, you may think I have some, but it is in theory only :D
I still haven't written a single line of code yet ;)
Not so new. It was invented in 1962 !!! I guess the veleoppers back in 1993-94 knew it ;)
And for our skills, wait for my first demo !!! For now, you may think I have some, but it is in theory only :D
I still haven't written a single line of code yet ;)
http://freespace.virgin.net/hugo.elias/ ... x_main.htm
Great ressource (yet another)
Great ressource (yet another)
-
- Very interested
- Posts: 3131
- Joined: Thu Nov 30, 2006 9:46 pm
- Location: France - Sevres
- Contact:
The bresenham line algo has always be used in drawing code as it's fast and simple. There is a lot of tricks to optimise it here and there. In fact it was just to give a start point to ob1 as he was searching stuff on line draw. There are already tons of good stuff about fast triangle rendering and we have very small chance to invent something new today about it
The 32X packed color mode can help here, it's used in Virtua Racing as Virtua Fighter...
The 32X packed color mode can help here, it's used in Virtua Racing as Virtua Fighter...
vey nice resource indeed !http://freespace.virgin.net/hugo.elias/ ... x_main.htm
Great ressource (yet another)
My biggest nightmare
Hey, I've been thinking the whole week-end about my rasterization engine. It's getting bigger, but, as for now, I still have a nightmare. See :
Does anyone know how I could handle z datas ?
Does anyone know how I could handle z datas ?