SpritesMind.Net

Posted: **Fri Aug 23, 2013 2:37 am**

This is an incredible demo of raycasting on a stock MD. Great job! You did a great job of converting 286 code into 68000 code. And the texture conversion is awesome, too.

Posted: **Fri Aug 23, 2013 4:54 am**

Thanks to all comments again.

You said you don't need to double buffer, this is because you can upload the whole screen in VRAM in a single frame ? If that is the case i assume you prepare your screen buffer so it is directly organized as tiles so you can ue the full DMA speed to upload in VRAM ?

Yes, the screen buffer is organized as tiles (starts from top to down then next column and so on) so i upload the whole screen in a single frame. First i disable the display, do the DMA and then enable the display again. When the display is disabled or during Vblank, DMA can move 205 bytes/scanline, so with 82 lines will be enough (82 * 205 = 16810).

About the sound: i made the song with tfm music maker and the 68000 is used to play the music with a tfc replayer that i made in asm, but using only 5 channels, thus i use the sixth channel to make two-channel samples with the z80. The samples are compressed 2 to 1 and is played directly in the z80 ram, the samples are played at about 6900hz and each channel can use 3584 bytes (about 1 second),the driver uses less than 1k. I did it this way because with DMA, the samples do not sound right, because you have to stop the z80 during DMA.
I have not tested this on real hardware, if anyone can do would be great.

Posted: **Fri Aug 23, 2013 6:14 am**

I posted a photo of it on real hardware.

Sound is messed up, samples play right though.

Posted: **Fri Aug 23, 2013 7:42 am**

gasega68k wrote: Yes, the screen buffer is organized as tiles (starts from top to down then next column and so on) so i upload the whole screen in a single frame. First i disable the display, do the DMA and then enable the display again. When the display is disabled or during Vblank, DMA can move 205 bytes/scanline, so with 82 lines will be enough (82 * 205 = 16810).

I guess you can somehow do direct write in tiled organization because of the way the raytracer work. I tried to find a way to organize my bitmap buffer in the SGDK bitmap engine but i really can't find anything working efficiently :-/ Modifying the organization would require too much overhead in bitmap drawing functions as line or polygon filling so i have to manually transform the bitmap in cell during the vram upload (software transfert). At least this has the advantage of not screwing up the Z80

About the sound: i made the song with tfm music maker and the 68000 is used to play the music with a tfc replayer that i made in asm, but using only 5 channels, thus i use the sixth channel to make two-channel samples with the z80. The samples are compressed 2 to 1 and is played directly in the z80 ram, the samples are played at about 6900hz and each channel can use 3584 bytes (about 1 second),the driver uses less than 1k. I did it this way because with DMA, the samples do not sound right, because you have to stop the z80 during DMA.
I have not tested this on real hardware, if anyone can do would be great.

Oh i understand of course, the sound is pretty good given you have these sample limitations ! I developed a specific driver for the bad apple demo which take care of the DMA contention but it would not work in your case as you extend the vblank (i used vint on Z80 side to detect start of blank period). But now i'm thinking about it, if you limit your frame rate to 20 FPS max (which is smooth enough and it seems to never goes higher anyway) you can divide your DMA on 3 frames and then have complete screen display and an easier DMA contention with Z80... but that would require double buffering then :-/ ok, not a good idea :p

Posted: **Fri Aug 23, 2013 3:44 pm**

Wow, this demo is great! Best raycaster on the MD? It looks way better than Duke Nukem 3D!

Posted: **Sat Aug 24, 2013 7:44 pm**

Well, I'm here again. I'm working on the pushable walls, the player's hand/weapon and some other things (objects?). Very soon I will post an update of my demo.

About Duke Nukem 3D, I think the problem was the palette that they used for the walls and enemies (too many gray).

Posted: **Sat Aug 24, 2013 8:25 pm**

I agree, the engine is not bad but the palette is really awful ! Your palette choice seems to be really good in comparison

Posted: **Sat Aug 24, 2013 10:54 pm**

Stef wrote:
kool kitty89 wrote: For 16 colors, the maximum number of unique combinations (including 2 of the same color -ie the base 16 colors) would be 136, not 128. Though depending on the palette actually chosen (and the colors trying to be approximated) you can get as few as 64 useful psudo colors. (it's also limited to 120 colors max if none of the base 16 colors is useful -ie the max number of dithered pseudocolors)
Hey Kool Kitty, nice to see you there

I was not sure at all about my 128 number, i always impress me about how i can suck when it comes to simple math calculation. How does you arrive to 136 ? I counted 128 as i just eliminate swapped chunk (01 is same than 10) but as we don't eliminate 00 11 .. FF is does 136 ?

Combination/permutation math . . . which I don't entirely know by heart, but it's easy enough to look up. Plus there's things like this:
http://www.mathsisfun.com/combinatorics ... lator.html

16 values to choose from, 2 chosen, order not important (so combinations and not all permutations), and repetition allowed. (ie the base 16 colors are included in the final result)

So it's (16+1)*(16/2)= 136 (now, for actual color use, some examples could still include redundant psudo colors that blend to similar approximate 12-bit color values, but 136 is the ideal maximum)
Also, 16 of those colors must be 9-bit only, since they're the base colors used in the palette. (you get 120 pseudo 12-bit colors)

Raw permutations are easy here: 16 values, 2 chosen, order important, repetition allowed, so just 16^2=256.
And that's what you could get for actual composite video artifact colors where the dot clock is an integer multiple of colorburst -like 7.16, 10.74, or 14.32 MHz NTSC- and where dithering the same 2 colors will result in different artifacts depending on order -like on CGA and Apple II; that's also why you get the weird oscillating chroma artifacts in NTSC using H40 on the MD, 6.71 MHz is totally out of sync with 3.58 MHz colorburst and so pixels artifact in an inconsistent manner (5.37 MHz isn't an integer multiple, but still 1.5x colorburst, so it's more solid, plus the lower res is less demanding anyway) Atari ST and C64 -in high res- should be even worse than MD since it's using an 8 MHz dot clock, both out of sync and higher resolution (let alone the 16 MHz dot mode of the ST for 640x200).

In all cases, dark on light colors make the most dramatic chroma artifacts, white on black being the most extreme. (hence why such artifacts are really common with dithered transparent meshes like Chemical Plant Zone or the white explosion clouds in Earthworm Jim 2 over the dark BGs of many levels in that game)

Posterising the 256x18-bit VGA palette down to 12-bit and then dithering that to 9-bit (2 9-bit colors blend/accumulate to 12-bit) is a pretty straightforward route for this. Posterizing to 12-bit might actually drop the color count to within practical 16-color dither psudocolor limits too.
Well i just tested the rom and i have admit the result is really great (and smooth) ! The used palette perfectly fit the original color and i believe we can use it to draw sprites as well without much color issues

Chilly Willy and I got into a discussion at one point over whether dithered color like that would be adequate for Doom's textures. (in the context of doing a Doom conversion on SVP or Sega CD)

It would probably be a bit more extreme than Wolf3D's palette, but I still think that could work OK. (similar context of truncating/posterizing the 256 color 18-bit palette down to 12-bit and then using 16-color 9-bit to approximate those colors)

I was thinking about it but it would make the image a bit blurry, no clean edge for walls...

IMO it looks rather nice. Using Fusions CVBS filter makes for blur very similar to what you'd get from 1-pixel offset H-blur (it's single pass horizontal blending), and it looks very nice and solid. Using the TV-mode filter in Fusion is excessively blurry though. (it's 2-pass h-blur and doesn't help blending anyway for 2-pixel dither like this)

NTSC composite video in H40 is pretty close to that blurry anyway (depending on the TV and color/art usage), but you get chroma artifacts there that don't blend colors as evenly as H-blur. (RGB with blur would be ideal IMO, and should be close to Fusion's CVBS filter) Plus, using H-scroll blur might help make NTSC chroma artifacts more consistent too. (filtering/evening them out)

Honestly, this demo looks even better with single pass H-blur than Duke 3D or Zero Tolerance. (better color usage in general)

Posted: **Sun Aug 25, 2013 12:07 am**

gasega68k wrote:About the sound: i made the song with tfm music maker and the 68000 is used to play the music with a tfc replayer that i made in asm, but using only 5 channels, thus i use the sixth channel to make two-channel samples with the z80. The samples are compressed 2 to 1 and is played directly in the z80 ram, the samples are played at about 6900hz and each channel can use 3584 bytes (about 1 second),the driver uses less than 1k. I did it this way because with DMA, the samples do not sound right, because you have to stop the z80 during DMA.
I have not tested this on real hardware, if anyone can do would be great.

What sort of compression are you using? 4-bit DPCM or ADPCM? (I assume not just 4-bit linear PCM)

4-bit DPCM seems to be a common choice. Stef uses that, as does Sega's own SMPS. (a 4-bit delta sample format with fixed "palette" of 16 delta values -Stef's engine uses a considerably smaller range than Sega's, the latter just using powers of 2)
Though, honestly, at the bitrates/sample rates used here, plain linear 4-bit PCM could probably work pretty well too, if not better than DPCM. (at such low bitrates, linear 4-bit PCM should work fairly well, and from what I understand, ADPCM/DPCM doesn't tend to fare particualrly well at such low rates)

On that note, Tiido made an interesting sort of hybrid 2-bit DPCM/ADPCM scheme with 3:1 compression over 8-bit. (TADPCM) It's technically ADPCM, but should require CPU overhead closer to DPCM since it uses explicit delta values rather than algorithmically derived ones. (the "adaptive" computation is done entirely on the encoding end rather than both encoding+decoding as with typical ADPCM or CVSD)

It uses an 8-bit which is used as a definition for the following 12 2-bit deltas (2 bits defined +/- the 8-bit value or 1/2 that, ie 128 would be used as +128, -128, +64, -64), and then the next 8-bit value is provided for the next 12 deltas. So you get 12 samples in 4 bytes, or 3:1 over 8-bit PCM.

Chilly Willy has a 22 kHz demo using this format in his CVSD thread:
viewtopic.php?t=586&postdays=0&postorder=asc&start=15

Chilly Willy's 2-bit CVSD offers 4:1 compression and sounds pretty good too considering (in the demos, there's a few areas it sounds better than TADPCM, but that's likely more down to encoder limitations than actual quality of the compression). 1-bit CVSD would provide the highest compression ratio, but more quality lost relative to sample rate (still maybe a good option for certain low bitrate stuff, especially speech). But then you've got to consider the CPU overhead involved with CVSD relative to sample rate too. (2-bit is probably much more useful overall for typical MD stuff, and TADPCM probably more so)

I suppose you could also try a fully fixed palette 2-bit DPCM version to (like 4-bit DPCM but only 4 values), but I doubt that would sound very good. (at least better than 1-bit DPCM like the NES does)

With your current set-up, you're loading entire ~1s long chunks of samples into Z80 RAM and then playing them? That would work well as long as you kept samples within that short length, or segmented longer ones to load in chunks. (but you'd get small gaps between playback then, perhaps less noticeably if you specifically chopped up samples with that in mind, especially for speech)

Loading samples into Z80 RAM like that also works around the Z80 bank switching overhead for multi-channel playback. (well, at least it will once you exceed 32 kB of samples in ROM

)

Stef wrote:
gasega68k wrote: I guess you can somehow do direct write in tiled organization because of the way the raytracer work. I tried to find a way to organize my bitmap buffer in the SGDK bitmap engine but i really can't find anything working efficiently :-/ Modifying the organization would require too much overhead in bitmap drawing functions as line or polygon filling so i have to manually transform the bitmap in cell during the vram upload (software transfert). At least this has the advantage of not screwing up the Z80
Would it help if you made a polygon renderer that used column filling rather than line filling?

That'd be a pain to do for full-res 4-bit pixels, but more useful if you were willing to use paired pixels (bytes) like this set-up does. You'd lose resolution, but gain speed. (and could consider dithered color like this too)

Oh i understand of course, the sound is pretty good given you have these sample limitations ! I developed a specific driver for the bad apple demo which take care of the DMA contention but it would not work in your case as you extend the vblank (i used vint on Z80 side to detect start of blank period). But now i'm thinking about it, if you limit your frame rate to 20 FPS max (which is smooth enough and it seems to never goes higher anyway) you can divide your DMA on 3 frames and then have complete screen display and an easier DMA contention with Z80... but that would require double buffering then :-/ ok, not a good idea :p

Could you possibly use a combination of vblank interrupt and YM2612 timers for working around extended vblank?

Ie use vblank to sync up with video timing, and set a YM timer to account for the extended blanking period, and poll the timer to keep track of where active display will end. You could maybe even keep a screen counter that cycled every few frames and tie that to a framerate cap put on the renderer (and have the 68k keep a similar counter so it only ever set DMA on the right frame). That probably wouldn't help that much overall though, since you'd still have the same minimum per-frame bandwidth bottleneck to consider. (on frames asserting DMA) So probably not worth it, depending how long your mixing buffer was. (ie long enough to smooth out CPU time over several frames)

To maximize usable CPU time per frame, you'd also probably want to set different timing for 50 and 60 Hz, or set the display taller in 50 Hz so the extended vblank period is of an equal scanline count. Ie sync to v-int in either case, but make sure the actual vblank period is equal on both, so if 60 Hz has screen cut to 176 lines with 86 vblank lines, you'd want the 50 Hz screen to be set to 226 lines, and positioned such that the same timer set-up code would work. You could still leave that added space black, of course, or just tiled with a repeating 8x8 texture. (still, PAL uses slightly faster h-sync timing than NTSC, so that's a consideration too -iirc it's 15.65 vs 15.75 kHz)

Wait . . . does the v-int signal the normal start of vblank, or end of vblank?
. . . Well, in any case, the generall idea (interval timer combined with v-int) would still apply, just some of my above comments won't make sense.

Posted: **Sun Aug 25, 2013 10:06 am**

kool kitty89 wrote: Combination/permutation math . . . which I don't entirely know by heart, but it's easy enough to look up. Plus there's things like this:
http://www.mathsisfun.com/combinatorics ... lator.html

...

Also, 16 of those colors must be 9-bit only, since they're the base colors used in the palette. (you get 120 pseudo 12-bit colors)
Raw permutations are easy here: 16 values, 2 chosen, order important, repetition allowed, so just 16^2=256.

Yeah i know it is simplistic combinations but still sometime i really have hard time to figure them :p
Anyway thanks for the link, that would help me in future

And that's what you could get for actual composite video artifact colors where the dot clock is an integer multiple of colorburst -like 7.16, 10.74, or 14.32 MHz NTSC
....
dithered transparent meshes like Chemical Plant Zone or the white explosion clouds in Earthworm Jim 2 over the dark BGs of many levels in that game)

Good to know how much the dot clock can affect the pixel blending... anyway we can't really rely on it depending the signal you use (composite, s video or RGB) the pixels won't blend the same at all, i prefer to explicitly do the blend operation with the H scroll change at each frame tech for instance so you know you have the effect whatever is the output

Posterising the 256x18-bit VGA palette down to 12-bit and then dithering that to 9-bit (2 9-bit colors blend/accumulate to 12-bit) is a pretty straightforward route for this. Posterizing to 12-bit might actually drop the color count to within practical 16-color dither psudocolor limits too.

I agree, i don't know if it's the way gasega68k did his palette but i guess it the way to get a very strong one

Chilly Willy and I got into a discussion at one point over whether dithered color like that would be adequate for Doom's textures. (in the context of doing a Doom conversion on SVP or Sega CD)

It would probably be a bit more extreme than Wolf3D's palette, but I still think that could work OK. (similar context of truncating/posterizing the 256 color 18-bit palette down to 12-bit and then using 16-color 9-bit to approximate those colors)

Haha a doom port on Sega CD would be awesome, probably a bit laggy though :p But yeah having a good base palette is *very* important and actually does the difference, Duke 3D could look far better with a better base palette.

IMO it looks rather nice. Using Fusions CVBS filter makes for blur very similar to what you'd get from 1-pixel offset H-blur (it's single pass horizontal blending), and it looks very nice and solid. Using the TV-mode filter in Fusion is excessively blurry though. (it's 2-pass h-blur and doesn't help blending anyway for 2-pixel dither like this)

...

Honestly, this demo looks even better with single pass H-blur than Duke 3D or Zero Tolerance. (better color usage in general)

But don't you think that swapping pixels chunk at each frame would produce a better result that simple H pixel scroll ? This way you don't blend different "object" pixels together. Of course technically it is not as easy to do than the H scroll..

Would it help if you made a polygon renderer that used column filling rather than line filling?

Well i have to keep "linear memory" polygon fill to have the algo as fast as possible but anyway if we assume the scene to be 90° rotated we obtain the same result than a column filling renderer... still i don't see how that help with DMA. DMA is actually always doing *word* source memory read (even if internally it can does byte write operation when destination is VRAM) so i don't see how you can arrange tilemap and address increment register to transform the bitmap buffer (even 90° rotated) to tiles :-/
I found a way but it then required to transfer 2 times the amount of initial data which almost kill all the benefit of using the DMA.

That'd be a pain to do for full-res 4-bit pixels, but more useful if you were willing to use paired pixels (bytes) like this set-up does. You'd lose resolution, but gain speed. (and could consider dithered color like this too)

I could indeed make my polygon fill faster if i do byte alignment for filling operation but that reduce the X resolution and look really... "low resolution" compared to X full res.
I can make that optional to improve the performance if needed... actually you have only to take care of 4 bit pixel on polygon edge, i use byte fill for the inside, faster and allow color dithering

Could you possibly use a combination of vblank interrupt and YM2612 timers for working around extended vblank?

...

To maximize usable CPU time per frame, you'd also probably want to set different timing for 50 and 60 Hz, or set the display taller in 50 Hz so the extended vblank period is of an equal scanline count. Ie sync to v-int in either case, but make sure the actual vblank period is equal on both, so if 60 Hz has screen cut to 176 lines with 86 vblank lines, you'd want the 50 Hz screen to be set to 226 lines, and positioned such that the same timer set-up code would work. You could still leave that added space black, of course, or just tiled with a repeating 8x8 texture. (still, PAL uses slightly faster h-sync timing than NTSC, so that's a consideration too -iirc it's 15.65 vs 15.75 kHz)

Wait . . . does the v-int signal the normal start of vblank, or end of vblank?
. . . Well, in any case, the generall idea (interval timer combined with v-int) would still apply, just some of my above comments won't make sense.

You are right, that is totally possible

V int occurs at the start of V blank... so we can set the YM timer so it expire when blank area is about to start. We could image the Z80 code to check the YM2612 status at regular interval (each 500 Z80 cycles for instance) so the YM2612 tmer should be set accordling.

Then as soon the timer expire Z80 enter in a new code block where it does not access anymore the 68k bus for sometime. Depending PAL or NTSC mode, this period is longer.

Posted: **Sun Aug 25, 2013 10:00 pm**

While the colors look great here for being dithered colors, it's a little easier than Doom would be because of the lack of shading. Doom maintains 32 shades of many of the colors used (look at a picture of Doom's palette and you'll see multiple runs of shades of the same colors). Doing Doom on a stock MD graphics might be possible, but you'd probably have to eliminate shading (or maybe cut it back drastically). Doom would be more a SCD project than a Genesis one. The SCD 68000 is more than capable of running the game logic - that's what the Jaguar version does, and its 68000 is just barely faster. You'd have to do most of the rendering on the Genesis side. You'd REALLY need to cut down the textures a lot - basically, someone would need to edit the Doom wad file and redo all the level and things graphics. It's a lot of work... much more than W3D.

Also, the color is more complex than just considering two pixels... look at this

AB CD EF GH etc

That's how people are considering the pixels to calculate the colors. But that's rather arbitrary... it could just as easily be

A BC DE FG H etc

In truth, it's a bit of both. The Apple 2 emulator my brother and I worked on for the Amiga emulated the graphics by recomputing the color for a pixel change by covering SEVEN adjacent pixels on the line (three before, the new pixel, and three after). It gave the best emulation of Apple 2 colors seen to date. You need to consider the "ditherd" color much the same way here, but we just do AB CD EF GH for ease and speed.

Posted: **Sun Aug 25, 2013 10:48 pm**

Yeah doom use shade level, we cannot reproduce it the same way but Toy Story is actually using 3 shade levels for his 3D level and that does look great

Palette is really heavily turned around gray, purple and green colors though.

https://www.youtube.com/watch?feature=p ... Md0#t=2130

I know that we can have more than 2 adjacent pixels for color dithering, but as you said, for simplicity reason we only assume 2 pixels chunk .
7 adjacent pixels sounds insane :p

Posted: **Mon Aug 26, 2013 1:51 am**

Stef wrote:Yeah doom use shade level, we cannot reproduce it the same way but Toy Story is actually using 3 shade levels for his 3D level and that does look great Palette is really heavily turned around gray, purple and green colors though.

Doom uses colormaps for shading, so it's just a matter of doing new palettes (regular, hurt, rad suit, and invincible palettes), and new colormaps for the shading.

I know that we can have more than 2 adjacent pixels for color dithering, but as you said, for simplicity reason we only assume 2 pixels chunk .
7 adjacent pixels sounds insane :p

It was slight overkill to make DAMN sure the color was right. It's only an Apple 2, so it was hardly a tax on the speed. If you're gonna do it, do it right!

Posted: **Mon Aug 26, 2013 8:08 am**

Chilly Willy wrote: Doom uses colormaps for shading, so it's just a matter of doing new palettes (regular, hurt, rad suit, and invincible palettes), and new colormaps for the shading.

I was talking about the depth shading, on MD we could not reproduce the same smooth shade but i guess we could get something not too bad

Of course others effects are just different palette

It was slight overkill to make DAMN sure the color was right. It's only an Apple 2, so it was hardly a tax on the speed. If you're gonna do it, do it right!

That is the way to do

Posted: **Mon Aug 26, 2013 6:24 pm**

Stef wrote:
Chilly Willy wrote: Doom uses colormaps for shading, so it's just a matter of doing new palettes (regular, hurt, rad suit, and invincible palettes), and new colormaps for the shading.
I was talking about the depth shading, on MD we could not reproduce the same smooth shade but i guess we could get something not too bad Of course others effects are just different palette

So was I. On Doom, to draw at a particular shade, they merely pick 1 of 32 colormaps, each one presenting a particular shade. The colormap takes the texture pixel value as an index, and gives the new value that points to the same color at the proper shade in the same palette (assuming there is one). At really low light levels, many of the colors are pointed to the same values, giving the unsaturated look to dim parts of the level.

Of course, if you are outputting in 15bit color mode, you could make much more accurate colors at all the levels since you aren't limited to the 256 color palette. That's basically what the Jaguar version does, and many PC source ports when set for 15 or 24 bit mode.

SpritesMind.Net

wolfenstein demo for sega genesis