I failed.
Twice.
At least.
But there's no reason not to share a post mortem ^^
So,
here is how I compute the slopes of a triangle
Rasteror how to draw a triangle without grDrawTriangle, a FPU, or even a DIV!
Slopes
Code: Select all
O
+> x

 + A (160, 50)
V / \
 \
y  \
/ \
 \
 \
/ \
 \
+ B (130, 100)
\ \
\ \
\ \
\ \
+ C (260, 150)
On each next line y, I'm going to have 2 points whose xcoordinate will be
 x1, on the [AB] segment
 x2, on the [AC] segment
x1 = xA + p * (y  yA), p being the reciprocal slope of the [AB] segment
x2 = xA + q * (y  yC), q being the reciprocal slope of the [AC] segment
Thus, p = (yB  yA) / (xB  xA)
Problem: I can't compute any decimal value on the 32X (and division is slow, ~20 cycles).
Introduction to fixedpoint
Let's consider [AB]
Code: Select all
O
+> x

 + A (160, 50)
V /

y 
/


/

+ B (130, 100)
Let p = (yB  yA) / (xB  xA) = (100  50) / (130  160) = 50 / (30) = 1.666 ~ 1  .65625 =  1  21/32
So, starting from xA, I will reach xB, as long as, for each next y, I remove 1 (the floor part), AND, 21 line of 32, I also remove 1.
Question is: on which 21 lines from 32 will I also remove 1 ?
Fixedpoint arithmetic
Let's consider 5/32 = .15625.
On the first next line, I'll add nothing. So, x value will be the floor part, and that's all.
On the second line, I'll add the floor part, AND 1 * 0.15625 = 0.15625, which is rounded to 0.
On the third line, I'll add the floor part, AND 2 * 0.15625 = 0.3125, which is rounded to 0.
On the fourth line, I'll add the floor part, AND 3 * 0.15625 = 0.46875, which is rounded to 0.
On the fifth line, I'll add the floor part, AND 4 * 0.15625 = 0.625, which is rounded to 1.
...
On the 32nd line, I'll add the floor part, AND 31 * 0.15625 = 4.84375, which is rounded to 5.
What we will focus on is the rounded part, and especially WHEN did it change.
On the first line, the rounded part was 0.
On the second line, the rounded part was 0.
On the third line, the rounded part was 0.
On the fourth line, the rounded part was 0.
On the fifth line, the rounded part was 1, we increased by 1.
...
And so on.
So,
On the first line, I didn't add anything.
On the second line, I didn't add anything.
On the third line, I didn't add anything.
On the fourth line, I didn't add anything.
On the fifth line, I did add 1.
...
And so on.
Finally, for 5/32, I get
Line 0, I add 0
Line 1, I add 0
Line 2, I add 0
Line 3, I add 0
Line 4, I add 1
Line 5, I add 0
Line 6, I add 0
Line 7, I add 0
Line 8, I add 0
Line 9, I add 0
Line 10, I add 1
Line 11, I add 0
...
Line 28, I add 0
Line 29, I add 1
Line 30, I add 0
Line 31, I add 0
Let's note that for 5/32, and considering 32 lines, I added 1 exactly 5 times.
If I had just 10 lines to raster, I would have add 1 once (line 0 to line 9).
Which may be slightly different from 10 * 5/ 32, but it's the rounding error.
What if I got more than 32 lines to draw, you ask ?
Let's just start from line 0 again!
Implementation
Of course, I could have gone to 6bits precision (1/64), or even more.
And I could have computed for, eg 21/64, and the 64 times I would have incremented, or not.
But I didn't need the extraprecision.
Let's now consider 5bit precision, which tells me 32 times whether or not I have to increment.
32 times true or false.
32 times a binary value.
32 times one bit.
So, a 32 bit value.
Sticking to 5/32, it's easy to write its increments horizontally:
0 0 0 0 1 0 0 0 0 0 1 0 0 ... 0 0 1 0 0
, then wrapping it all together, binary style:
0000100000100 ... 00100b
, then put it in a 32bit register, hexastyle:
08208104h
, then rotate it multiple times to the left
Code: Select all
++ ++
 T <+ 08208104 <+
++  ++ 
 
++
the MSb is moved into T,
I then move T into a register with the MOVT instruction,
finally, I add this register to my xA register.
No test, no lookuptable, plus, 2 main advantages:
 reduced memory footprint
 I don't have to bother for the modulus, ROTL does this for me!
#Wrapping it all together
Here are the values for 5bit precision.
I add to adjust the value above 15/32 to alias some roundig errors.
If your decimal value is between 0 and 1/32 ( 0.03125), use 0.
If your decimal value is between 1/32 ( 0.03125) and 2/32 (0.0625), use $00008000.
If your decimal value is between 2/32 (0.0625) and 3/32 (0.09375), use $00800080.
If your decimal value is between 3/32 (0.09375) and 4/32 (0.125), use $08080808.
If your decimal value is between 4/32 (0.125) and 5/32 (0.15625), use $08208104.
...
And so on.
DC.L $00000000,$00008000,$00800080,$02008010
DC.L $08080808,$08208104,$10821082,$11088422
DC.L $22222222,$22448891,$24892489,$24929249
DC.L $29292929,$294A94A5,$2A952A95,$2AAA9555
DC.L $55555555,$5555AAAB,$55AAD5AB,$56B5AD6B
DC.L $5ADADADB,$5B6DB6DB,$5BB6DDB7,$5DDBBB77
DC.L $6EEEEEEF,$6F77BDEF,$6FBEF7DF,$77EFDFBF
DC.L $7BFDFEFF,$7EFFEFFF,$7FEFFFFF,$7FFFFFFF
## How to do a real DIV without, floatingpoint nor DIV
In your ROM, just put a lookup table, eg 512 x 512. Let's say that the dividend will be on "lines", and divisors on "columns".
Let's say that each value will be, eg 9 bits for the integer part, and 5 bits for the "floating" part.
The "floating" part will be expressed in 32th of 1, eg 5 for 0.15625.
Each value will use 16 bits (2 bits for padding, 9 for the integer part, and 5 for the floating part), or 2bytes.
I'll then use the "floating" part as an offset for the rotate table above.
Thus, 216 / 60 (= 3.6) is located at line 216, column 60, or, (216 * 512 + 60) * 2 = 221 304.
I will find there 0x0073
Code: Select all
+++++
 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 
++++
padd integer part how many 32th
Drawing a triangle
So, for each 2Dtriangle (we'll see the 3D > 2D projection another time), I sorted the vertices (well ... pixels), A being the uppermost, then B being the leftmost.
I compute the slopes (with the fixeddecimal part, remember?), and I figure two pixels starting from A : x1 and x2.
I increment the line, and add the left slope integer part to x1, and the right slope integer part to x2.
I rotate the left slope decimal part and if T is true, I increment (or decrement) x1 by 1 pixel.
I rotate the right slope decimal part and if T is true, I increment (or decrement) x2 by 1 pixel.
... till I'm at yB (or yC).
This is the upper part of my triangle.
Then, I compute the last slope, and to the same for the remaining of the triangle. That would be the lower part of my triangle.
Problem ?
It's sooooo damn slow.
While I'm not even projecting, rotating, or lighting, and even while I'm only drawing half of a triangle, I can barely reach 8 triangles @60fps. I'm pretty sure Virtua Racing was better :/
Anyway,
for anyone interested, and just for everyone to notice I'm still alive, here it is :
https://www.dropbox.com/sh/ojhwag0hvz2k ... Hcw1a?dl=0
As a bonus, a binary table to convert 16bits to 8bits. Could be useful for Zbuffer