Disassembling the Digital Pictures media codec

Ask anything your want about Mega/SegaCD programming.

Moderator: Mask of Destiny

Paul Jensen
Interested
Posts: 32
Joined: Mon Apr 06, 2009 4:17 pm
Location: Hiroshima, Japan

Disassembling the Digital Pictures media codec

Post by Paul Jensen » Wed Jul 09, 2014 1:59 pm

How hard would it be to reverse engineer the graphics decoder from a Sega CD game?

I've been working on a decoder for the media files used in games from Digital Pictures (e.g. Night Trap, Sewer Shark, etc.). I've been able to figure out most of the encoding types (there are about 16 in total) using what little info is available out there, and with the help of TascoDLX (see this thread), but I'm having trouble figuring out the remaining encoding types. I was thinking maybe if somebody were to disassemble the game executable files from the various Digital Pictures games, we could get a 100% accurate decoder out of them.

Paul Jensen
Interested
Posts: 32
Joined: Mon Apr 06, 2009 4:17 pm
Location: Hiroshima, Japan

Working on type C2 graphics

Post by Paul Jensen » Mon Aug 18, 2014 10:30 am

I've got most of the various graphics compression schemes cracked, but I'm stumped on type C2 and its derivatives. I'm sure it's compressed in some way, but I'm having trouble figuring out how the compressed data relates to the decompressed VDP data.

Below is the first frame from the Digital Pictures logo found in Slam City with Scottie Pippen. I've included both the compressed frame data found in the file DPLOGO.SGA and a dump of VRAM of the frame made via Gens KMod.

Compressed frame data
VRAM dump

I'd be extremely grateful for any ideas about this. This is the last major step in decoding all of the videos used in Digital Pictures' Sega CD/32X games.

TascoDLX
Very interested
Posts: 262
Joined: Tue Feb 06, 2007 8:18 pm

Post by TascoDLX » Thu Oct 02, 2014 2:45 am

I totally missed this post. :oops:

I don't think it would be too hard, but I never really had any desire to reverse engineer the player. It depends on how much experience you have doing that and how thorough you want to be about it.

I don't know how much progress you've made, but I'll take a look at the frame, if only for my own curiosity. I should warn you: I believe KMod VRAM dumps are byteswapped. That could be quite confusing.

r57shell
Very interested
Posts: 478
Joined: Sun Dec 23, 2012 1:30 pm
Location: Russia
Contact:

Post by r57shell » Thu Oct 02, 2014 4:17 am

Why not just understand how extraction routines work?
Get emulator with good enough debug: gens-rr r57shell mod or MESS, I prefer first.
Set breakpoint on address where data starts, for read only, and catch it there.
Image

Paul Jensen
Interested
Posts: 32
Joined: Mon Apr 06, 2009 4:17 pm
Location: Hiroshima, Japan

Post by Paul Jensen » Fri Oct 24, 2014 2:29 pm

Thank you both for replying to this thread.

@TascoDLX: You missed the post? What the hell am I paying you for? :wink:
As far as I can tell, the VRAM dump is not byteswapped. The first tiles in Corpse Killer's VRAM are a font, which is easily recognizable in a hex editor.

@r57shell: I tried gens-rr k57shell mod, but unfortunately I couldn't get it to load Corpse Killer properly. Gens-rr, however, did work, and I was able to make a few trace logs of the intro video (OPENING.SGA on the disk), including the code that executes between frames.

The problem is that I only know a tiny bit about 68000 assembly, so even after looking at the trace logs (which I'm sure contain code for decompressing frames), I can't be sure exactly which code is responsible for handing decompression.

Figuring out the decompression seems like a trivial task for someone who already knows a lot about 68000 assembly, so if possible, I'd like somebody experienced with this sort of thing to make their own trace logs and help me figure out how the compression codes work.

What would really be cool is if somebody (not me -- I'm not good enough at this stuff) could identify all of the routines in one of the later decoders. Then it would be a simple matter of porting the routines over to C or whatever to make a "perfect" decoder.

TascoDLX
Very interested
Posts: 262
Joined: Tue Feb 06, 2007 8:18 pm

Post by TascoDLX » Sat Oct 25, 2014 10:25 am

Paul Jensen wrote:As far as I can tell, the VRAM dump is not byteswapped. The first tiles in Corpse Killer's VRAM are a font, which is easily recognizable in a hex editor.
Well, the VRAM dump you posted from Slam City is definitely byteswapped. You can tell by looking at the name table. That's the only reason I mention it. It's an old KMod bug and I'm not sure if it was ever fixed in an official release. Just something to look out for.

Here's what I can tell about C2: The data is all encoded, no compression. The decoder takes advantage of the Sega CD's font generator, which is used to generate 2-color patterns. The packet breakdown (in order) is: palette data, command data, color (index) data, pattern data. Offsets for color data and pattern data are tacked on to the header. The (implied) offset for command data is zero. All offsets are relative to the end of palette data.

Also, if you need a pointer, the C2 decode routine in Slam City is located at $012792 on the sub side [SUBCODE.BIN, file offset $A52A].

Paul Jensen
Interested
Posts: 32
Joined: Mon Apr 06, 2009 4:17 pm
Location: Hiroshima, Japan

Post by Paul Jensen » Fri Nov 21, 2014 4:10 pm

TascoDLX wrote: Well, the VRAM dump you posted from Slam City is definitely byteswapped. You can tell by looking at the name table. That's the only reason I mention it. It's an old KMod bug and I'm not sure if it was ever fixed in an official release. Just something to look out for.
I'll take your word for it.
Here's what I can tell about C2: The data is all encoded, no compression. The decoder takes advantage of the Sega CD's font generator, which is used to generate 2-color patterns. The packet breakdown (in order) is: palette data, command data, color (index) data, pattern data. Offsets for color data and pattern data are tacked on to the header. The (implied) offset for command data is zero. All offsets are relative to the end of palette data.
Thanks for that. After poking around a bit I figured out that the palette data comes before any other data, and I knew there were two (or sometimes three) distinct groups of data following it, but I didn't know what they were for. Do you have any more information on the Sega CD font generator?

EDIT: Is it significant that the routine makes so many writes to $FF804D? I did some research and noticed that a the range from $FF0000 ~ is used for a lot of special functions of the Sega CD, including graphics at $FF8058. Actually, if I think about it, the tile routines actually appear read out some data from the RAM used for Sega CD graphics, but it looks like it would be reading from the status area, which doesn't make a lot of sense.

EDIT 2: Got it. I just checked out the Mega CD Software Development Manual, and yeah, $FF804D is the address used for writing a color data, and $FF804E~F are used for patterns. When these bytes are written to, the system automatically generates font data and stores it in the read-only range $FF8050~6. That's why the code looks like it's reading from RAM that hasn't been written to yet.
Also, if you need a pointer, the C2 decode routine in Slam City is located at $012792 on the sub side [SUBCODE.BIN, file offset $A52A].
Thanks so much for this! I've taken it upon myself to learn a bit of M68k assembly language, and after studying a disassembly of SUBCODE.BIN, I've learned that the video in C2 frames is built from coded patterns, just like you said. My knowledge of assembler is really shaky (I just started learning a few days ago), but I think I might be able to figure this out now.

Of course, if anybody who knows a lot about the 68k wants to take a look at the disassembly, I'd be happy to make it available. :-)

It looks like the SUBCODE.BIN from Slam City also contains decoding routines for some of the other chunk types that I previously figured out (as I figured it would, being a later game and all). A quick check of the code has shown me so far that a lot of the guesses I made were correct. I'm seeing a lot of values in the code that I had to figure out by trial and error. I'm curious to look through the code now to see if I missed anything in my implementation, though.

One really cool thing is that the routine that branches out to the various decoding routines also does checks for some chunk types that I haven't seen before. I'm interested to find out whether these chunk types are actually used or not.

EDIT:
I spent some time translating the 68K assembly into Visual Basic, and I've got a working parser that can read through all the data commands and build a tile map.

It looks like the codes range from $00 to $3F, and the palette index is stored in the top two bits of the code. For codes $00 ~ $2F, the routine branches off into lots of subroutines for drawing whole or partial tiles. There look to be about 50 subroutines in total. Codes $30 ~ $3F are for referencing repeated tiles.

Below are the routines for codes $01 and $02:

Code: Select all

;ADDRESSES
;a2 = color index data
;a3 = bit pattern data
;a4 = decoded tile data
;a6 = scratch RAM for building tiles

;Code $01
move.b	(a2)+,(a6)
move.l	3(a6),(a4)+
move.l	3(a6),(a4)+
move.l	3(a6),(a4)+
move.l	3(a6),(a4)+
move.l	3(a6),(a4)+
move.l	3(a6),(a4)+
move.l	3(a6),(a4)+
move.l	3(a6),(a4)+
rts

;Code $02
move.b	(a2)+,(a6)
move.b	(a3)+,1(a6)
move.b	(a3)+,2(a6)
move.l	3(a6),(a4)+
move.l	7(a6),(a4)+
move.b	(a3)+,1(a6)
move.b	(a3)+,2(a6)
move.l	3(a6),(a4)+
move.l	7(a6),(a4)+
move.b	(a3)+,1(a6)
move.b	(a3)+,2(a6)
move.l	3(a6),(a4)+
move.l	7(a6),(a4)+
move.b	(a3)+,1(a6)
move.b	(a3)+,2(a6)
move.l	3(a6),(a4)+
move.l	7(a6),(a4)+
rts
What kind of tiles would these build? I'm guessing that both codes build 8 x 8 tiles, because they both output 8 incremental long words to a4. Code $01 looks like it builds a tile with a two-pixel wide vertical line on one edge of the tile, but how? To me, it looks like it copies a byte of color index data to the first byte of (a6), but then it copies long words to (a4) from (a6) starting three bytes AFTER the color data it just copied. I'm sure I'm just reading it wrong, but I don't know how.

EDIT: If it helps, address register a6 points to address $FF804D, an odd address. I think the fact that the address is odd might have something to do with the way these routines work.

As always, help would be much appreciated.

TascoDLX
Very interested
Posts: 262
Joined: Tue Feb 06, 2007 8:18 pm

Post by TascoDLX » Mon Dec 29, 2014 10:35 pm

Paul Jensen wrote:Do you have any more information on the Sega CD font generator?
It's listed in the dev manual under "color calculation".
- You write two 4-bit color index values to the first register ($ff804c). Call these color 0 and color 1.
- You write a 16-bit bitmap to the second register ($ff804e). This generates 16 pixels of pattern data (2 rows of 8 pixels each); bit val of 0 maps to color 0, bit val of 1 maps to color 1.
- The resulting pattern data is read from the following registers ($ff8050,$ff8052)
Paul Jensen wrote:EDIT: If it helps, address register a6 points to address $FF804D, an odd address. I think the fact that the address is odd might have something to do with the way these routines work.
The high byte of the first register is not used, so it just points to the next address. Obviously, byte access is allowed.

Code: Select all

;Code $01
move.b   (a2)+,(a6)
move.l   3(a6),(a4)+
move.l   3(a6),(a4)+
move.l   3(a6),(a4)+
move.l   3(a6),(a4)+
move.l   3(a6),(a4)+
move.l   3(a6),(a4)+
move.l   3(a6),(a4)+
move.l   3(a6),(a4)+
rts 
The first instruction writes the color (index) values. Normally you would write to 1(a6) afterward to set the pattern, but it seems the default pattern value is $0000, so this code generates a solid (1-color) pattern.

Code: Select all

;Code $02
move.b   (a2)+,(a6)
move.b   (a3)+,1(a6)
move.b   (a3)+,2(a6)
move.l   3(a6),(a4)+
move.l   7(a6),(a4)+
move.b   (a3)+,1(a6)
move.b   (a3)+,2(a6)
move.l   3(a6),(a4)+
move.l   7(a6),(a4)+
move.b   (a3)+,1(a6)
move.b   (a3)+,2(a6)
move.l   3(a6),(a4)+
move.l   7(a6),(a4)+
move.b   (a3)+,1(a6)
move.b   (a3)+,2(a6)
move.l   3(a6),(a4)+
move.l   7(a6),(a4)+
rts 
This code generates a pattern using two colors (i.e., loads the color register only once) then uses separate pattern bytes for each row. The hardware works on two rows at a time, so the sequence is: write two row patterns, generate two rows, write two row patterns, generate two rows.... Alternatively, you could probably do it one row at a time for the same effect.

Paul Jensen
Interested
Posts: 32
Joined: Mon Apr 06, 2009 4:17 pm
Location: Hiroshima, Japan

Post by Paul Jensen » Thu Jan 08, 2015 11:21 am

Thanks as always, TascoDLX. Actually, I stumbled upon the answer myself back in December 2014, but I didn't know about the default value for the font generator. Cheers for that!

I put a similar post up on the Sega-16 forums and ended up answering my own question. Forgot to update my post here as well. :oops:

I've figured out most of the routines in the code. There's a lot of repeating code, so I've distilled it down into a few generalized routines -- which I'm guessing is the case with a lot of ASM to higher-level ports. There are still a lot of little bugs in the code, though, and it's hard to pinpoint the exact cause since there are so many things going on at once. The routines read from three separate streams of data, and if the streams get out of sync, the whole process falls apart. I expect to have the bugs worked out soon, though.

Paul Jensen
Interested
Posts: 32
Joined: Mon Apr 06, 2009 4:17 pm
Location: Hiroshima, Japan

Post by Paul Jensen » Mon Jan 26, 2015 5:43 pm

The C2 decoder is working near perfectly now. That means SCAT can decode basically any media file from a Digital Pictures game! I'll make a new version up of SourceForge as soon as I iron out a couple of remaining bugs.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Mon Jan 26, 2015 7:56 pm

Great job! Another proprietary format that can now be decoded. :D

Paul Jensen
Interested
Posts: 32
Joined: Mon Apr 06, 2009 4:17 pm
Location: Hiroshima, Japan

Post by Paul Jensen » Wed Feb 04, 2015 4:23 am

One last thing I'm having trouble with.

There are a few routines in the code that seem to temporarily point the address of the pattern data to a location offset from a fixed address, draw a section of a tile, and then restore the original pointer. Here's an example:

Code: Select all

LAB_07C3: ; Code $1E
	;Get pointer offset and repoint pointer
	clr.l	d1			;DB18: 4281
	move.b	(a1)+,d1		;DB1A: 1219
	lsl	#3,d1			;DB1C: E749
	addi.l	#$00015DB4,d1		;DB1E: 068100015DB4
	movem.l	a3,-(a7)		;DB24: 48E70010
	movea.l	d1,a3			;DB28: 2641
	
	;Draw a 2-color 4x4 tile section
	move.b	(a2)+,(a6)		;DB2A: 1C9A
	move	(a3)+,1(a6)		;DB2C: 3D5B0001
	
	move	3(a6),(a4)		;DB30: 38AE0003
	move	5(a6),4(a4)		;DB34: 396E00050004
	move	7(a6),8(a4)		;DB3A: 396E00070008
	move	9(a6),12(a4)		;DB40: 396E0009000C
	
	;Restore pointer
	movem.l	(a7)+,a3		;DB46: 4CDF0800
	rts				;DB4A: 4E75
It looks like the program is repointing to a location offset from address $00015DB4. I checked that area of memory using an emulator, and it's always empty. However, using all zeroes in my code leads to graphics glitches in the output.

I'm stumped. Can I get some help on this?

TascoDLX
Very interested
Posts: 262
Joined: Tue Feb 06, 2007 8:18 pm

Post by TascoDLX » Wed Feb 04, 2015 8:49 am

Paul Jensen wrote:It looks like the program is repointing to a location offset from address $00015DB4. I checked that area of memory using an emulator, and it's always empty. However, using all zeroes in my code leads to graphics glitches in the output.

I'm stumped. Can I get some help on this?
It's right under your nose -- $015DB4 immediately follows the code you just quoted. SUBCODE.BIN gets loaded at $008268, so you're looking for the data (64 bytes) at offset $DB4C in the BIN file.

Looking forward to the new release. :D

Paul Jensen
Interested
Posts: 32
Joined: Mon Apr 06, 2009 4:17 pm
Location: Hiroshima, Japan

Post by Paul Jensen » Wed Feb 04, 2015 1:23 pm

TascoDLX wrote:
Paul Jensen wrote:It looks like the program is repointing to a location offset from address $00015DB4. I checked that area of memory using an emulator, and it's always empty. However, using all zeroes in my code leads to graphics glitches in the output.

I'm stumped. Can I get some help on this?
It's right under your nose -- $015DB4 immediately follows the code you just quoted. SUBCODE.BIN gets loaded at $008268, so you're looking for the data (64 bytes) at offset $DB4C in the BIN file.

Looking forward to the new release. :D
Holy crap! Thanks!

Question: How did you know that SUBCODE.BIN gets loaded at $008268?

Paul Jensen
Interested
Posts: 32
Joined: Mon Apr 06, 2009 4:17 pm
Location: Hiroshima, Japan

Post by Paul Jensen » Wed Feb 04, 2015 3:37 pm

Paul Jensen wrote:
TascoDLX wrote:
Paul Jensen wrote:It looks like the program is repointing to a location offset from address $00015DB4. I checked that area of memory using an emulator, and it's always empty. However, using all zeroes in my code leads to graphics glitches in the output.

I'm stumped. Can I get some help on this?
It's right under your nose -- $015DB4 immediately follows the code you just quoted. SUBCODE.BIN gets loaded at $008268, so you're looking for the data (64 bytes) at offset $DB4C in the BIN file.

Looking forward to the new release. :D
Holy crap! Thanks! EDIT: Looks like everything's working now. So far I've got pixel-perfect output when compared to emulator screenshots. Just gotta optimize the code a bit and complete the support for $C4 frames (inter-frame version of $C2) and the decoder should basically be complete.

Question: How did you know that SUBCODE.BIN gets loaded at $008268?

Post Reply