Super VDP

Ask anything your want about the 32X Mushroom programming.

Moderator: BigEvilCorporation

ob1
Very interested
Posts: 463
Joined: Wed Dec 06, 2006 9:01 am
Location: Aix-en-Provence, France

Post by ob1 » Wed Feb 23, 2011 8:27 am

Chilly Willy wrote:A couple minor booboos in the rendering (couple cell lines here and there)
There are glitches that seem to come from the DMA. Look like I have to add a few NOP between DMA init and DMA trigger. Don't know exactly why. I'm trying to fix it.

ob1
Very interested
Posts: 463
Joined: Wed Dec 06, 2006 9:01 am
Location: Aix-en-Provence, France

Post by ob1 » Wed Feb 23, 2011 6:28 pm

I feel like trying to mend the Titanic while sunking.
As soon as I fix a problem, another one comes in !

"One nightmare ends, another fertile"

Aaaaaaaaaaanyway, I've got a demo : Vega backgrounds (thanks to PR-Kun for the artwork), from Street Fighter Alpha 3 Arcade.
185 colours.

http://www.valpocl.com/SuperVDP/

Bigger to come.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Wed Feb 23, 2011 10:50 pm

Really nice. :D

I like all the levels of parallax you have in it.

As to the DMA, do you mean the DMA inside the SH2? It's hard to see how it could work sometimes and not others in an emulator... that's the kind of timing issues you expect from real hardware that NEVER shows in emulation tests.

ob1
Very interested
Posts: 463
Joined: Wed Dec 06, 2006 9:01 am
Location: Aix-en-Provence, France

Post by ob1 » Fri Feb 25, 2011 11:37 pm


Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Sat Feb 26, 2011 1:40 am

That second video is pretty awesome. :twisted:

sega16
Very interested
Posts: 251
Joined: Sat Jan 29, 2011 3:16 pm
Location: U.S.A.

Post by sega16 » Sat Feb 26, 2011 3:50 pm

Wow that is amazing.I think this could have great potential to be made into a game.Did anyone test this on real hw?

TotOOntHeMooN
Interested
Posts: 38
Joined: Sun Jun 01, 2008 1:12 pm
Location: Lyon, France
Contact:

Post by TotOOntHeMooN » Tue Mar 08, 2011 7:16 pm

:shock:

You finally reach your goal. Congratulation !

Edge-`
Interested
Posts: 39
Joined: Sun Dec 10, 2006 3:26 am

Awesome

Post by Edge-` » Wed Mar 09, 2011 3:20 am

Fantastic write up ob1 and great work!
Genny Wars (Someday.. :D)

ammianus
Very interested
Posts: 124
Joined: Sun Jan 29, 2012 2:10 pm
Location: North America
Contact:

Post by ammianus » Sun Oct 28, 2012 11:35 pm

Every few months I return to these forums and try to reread this thread, hoping I'll understand better after having learned more on my own about programming a similar kind of demo for a while now. I still can't make heads or tails about what is going on here. :(

Perhaps I miss some other discussion in another thread/offline, as well as it seems some links in older posts no longer work.

ob1, could you please perhaps explain, for us civilians, what your final demo is actually doing? Since throughout the thread you tried many different approaches it seems, I am curious what was the end result.

Can you post a higher resolution version of this diagram: http://www.valpocl.com/SuperVDP/diagram.jpg

In the end are you using both processors for drawing sprites/tiles? If so, how do you divide work between them?

Are you using DMA after all? If so, how?

What size tile are you using in final version of the demo?

I see you have done impressive things, squeezing every last cycle from the 32X for best possible performance. When you have scrolling planes though, are you redrawing the FB each time? In my simplistic game demo, I "scroll" the background, but really I am just telling it what part of the game map to draw each frame. Is that what you mean by scrolling? Or is it something more fundamental than that? Anyway you can diagram what it is you do? :D

Thank you in advance.
Last edited by ammianus on Mon Oct 29, 2012 3:18 pm, edited 1 time in total.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Mon Oct 29, 2012 2:13 am

Yeah, I have a few questions on the 68K->32X DMA myself... I have been making a little CD32X demo for folks and everything works peachy... except the 68K to 32X DMA. That works on the emulator (Fusion), but not real hardware.

http://www.mediafire.com/?37c4xn0q527uxeb

That uses my SCD bootloader:
http://www.mediafire.com/?2ngcxlsomjeejac

Anywho, it seems to work on real hardware, but the data is wrong. Makes me suspect a cache issue, but I flush the cache for the data buffer, so it must be something else. Here's the MD side:

Code: Select all

dma_to_32x:
        tst.b   d7
        bne.b   0f
        moveq   #-1,d0                  /* 32X not initialized */
        rts
0:
        move.w  #0x0000,0x0106(a1)      /* clear 68S bit - stops DMA */

        lea     0xA15000,a1
        movea.l 4(sp),a0                /* source address */
        move.l  8(sp),d0                /* length in words */
        addq.l  #3,d0
        andi.w  #0xFFFC,d0
        move.w  d0,0x0110(a1)           /* SH DREQ Length Reg */

        move.b  #0x0004,0x0107(a1)      /* set 68S bit - starts DMA */
        nop
        nop
        nop
        nop
        lea     0x0112(a1),a1
1:
        move.w  (a0)+,(a1)              /* FIFO = next word */
        move.w  (a0)+,(a1)
        move.w  (a0)+,(a1)
        move.w  (a0)+,(a1)
        dbra    d0,1b

        moveq   #0,d0
        rts
I tried waiting on the FIFO full flag at one point, but all that does is hang the MD side. The docs seems to be wrong about how it works. I don't set the source or dest address registers because I don't need them with my code, and the docs say they are dummies just used for passing info to the 32X if needed.

Here's the 32X side:

Code: Select all

    // init DMA channel 0
    SH2_DMA_SAR0 = (int)&MARS_SYS_DMAFIFO;
    SH2_DMA_DAR0 = 0;
    SH2_DMA_TCR0 = 0;
    SH2_DMA_CHCR0 = 0;
    SH2_DMA_DRCR0 = 0;
    SH2_DMA_SAR1 = 0;
    SH2_DMA_DAR1 = 0;
    SH2_DMA_TCR1 = 0;
    SH2_DMA_CHCR1 = 0;
    SH2_DMA_DRCR1 = 0;
    SH2_DMA_DMAOR = 1; // enable DMA

    while (1)
    {

        while (!MARS_SYS_COMM0) ;

        // transfer MOD
#if 1
        while (!(MARS_SYS_DMACTR & MARS_SYS_DMA_68S)) ; // wait for DMA to start
        j = MARS_SYS_COMM2;
        SH2_DMA_DAR0 = 0x20000000 | (int)&data;
        SH2_DMA_TCR0 = (j + 3) & 0x1FFFC;
        SH2_DMA_CHCR0 = 0x44E1;
        // flush data[]
        for (i=0; i<65536; i+=8)
            CacheClearLine(&data[i]);
        while (!(SH2_DMA_CHCR0 & 2)) ; // wait on TE
#else
        j = MARS_SYS_COMM2;
        for (i = 0; i < j; i++)
            data[i] = frame[i + 0x100];
#endif
        MARS_SYS_COMM4 = 0x0001; // tell Slave SH2 to flush data[]
        while (MARS_SYS_COMM4) ; // wait until flushed

    {
        char temp[44];
        Hw32xSetFGColor(128,31,31,31);
        Hw32xSetBGColor(1,0,0,0);
        sprintf(temp, "%d %s", j, (char *)&data);
        Hw32xScreenSetXY(20, 26);
        Hw32xScreenPuts(temp);
        Hw32xDelay(300);
    }
I have tried moving the flush around... it's where it is right now because I was trying to use it as a delay before checking if the DMA was done as a test. It's the same result no matter where the flush goes... the print at the end shows wrong data transferred (it should be the name of the MOD). The DMA either transfers bad data, or errors out (haven't printed the DMA code yet) as it DOES get past the wait on TE.

By the way, the SVDP rom on the website doesn't work on real hardware - I just get a blank display.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Mon Oct 29, 2012 5:05 am

A couple of changes, and a bug fix (using a1 before it was set)...

Code: Select all

        lea     0xA15000,a1
        move.b  #0x00,0x0107(a1)        /* clear 68S bit - stops DMA */

        movea.l 4(sp),a0                /* source address */
        move.l  8(sp),d0                /* length in words */
        addq.l  #3,d0
        andi.w  #0xFFFC,d0              /* FIFO operates on units of four words */
        move.w  d0,0x0110(a1)           /* SH DREQ Length Reg */
        move.w  d0,0x0122(a1)           /* COMM2 = # words to dma */
        lsr.l   #2,d0
        subq.l  #1,d0                   /* for dbra */

        move.b  #0x04,0x0107(a1)        /* set 68S bit - starts DMA */
        lea     0x0112(a1),a1
1:
        cmpi.w  #0x55AA,0xA15120        /* wait for SH2 to start DMA */
        bne.b   1b
2:
        move.w  (a0)+,(a1)              /* FIFO = next word */
        move.w  (a0)+,(a1)
        move.w  (a0)+,(a1)
        move.w  (a0)+,(a1)
3:
        btst    #7,0xA15107             /* check FIFO full flag */
        bne.b   3b
        dbra    d0,2b

Code: Select all

        while (!(MARS_SYS_DMACTR & MARS_SYS_DMA_68S)) ; // wait for SH DREQ to start
        j = MARS_SYS_COMM2; // # words to DMA
        if (j == 0)
            j = 0x10000; // 0 => 64K words
        SH2_DMA_DAR0 = 0x20000000 | (int)&data;
        SH2_DMA_TCR0 = j;
        SH2_DMA_CHCR0 = 0x44E1;
        MARS_SYS_COMM0 = 0x55AA; // SH2 DMA started
        while (!(SH2_DMA_CHCR0 & 2)) ; // wait on TE
        SH2_DMA_CHCR0 = 0x44E0; // clear DMA TE
Still works in Fusion, but fails on real hardware differently... the MD side never finishes the loop storing to the FIFO. That would seem to indicate a discrepancy between the two sides on the number of words transferred.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Wed Oct 31, 2012 1:21 am

Okay, after much more testing, I found the following...

When doing 68K to 32X DMA via the FIFO in the 32X, it has a tendency to lose data randomly. Specifically, the DREQ fails to trigger the DMA under unknown circumstances, which means the SH2 DMA stops. If you are checking the FIFO full flags, it will "stick" at full because the DMA is no longer emptying the FIFO.

I would guess that's why ob1 doesn't check the FIFO flag in SVDP. It's probably also the source of the occasional drop-outs in the video you see.

If you ignore the FIFO full flag, the next store to the FIFO "bumps" the hardware and DMA resumes... but you lose one word. On a DMA transfer of 65536 words, it typically lost between 20 and 120 words (usually closer to 20 than 120).

I'll try to find why it loses the DREQ, but it might be that this is something that will have to be worked around... a hardware bug not fixed before the production model chips were made.

A work around would be to send packets of data (say 256 words), don't check the FIFO full flag, and send extra words at the end to make sure the SH2 DMA finishes. Then put a checksum for the packet into the COMM registers. If the checksum is good, go on to the next packet, and if it's bad, retry the packet.

ob1
Very interested
Posts: 463
Joined: Wed Dec 06, 2006 9:01 am
Location: Aix-en-Provence, France

Post by ob1 » Mon Nov 12, 2012 7:50 am

Hello you all !
Long time no see.

I'd like to answer to ammianus
ammianus wrote:Every few months I return to these forums and try to reread this thread, hoping I'll understand better after having learned more on my own about programming a similar kind of demo for a while now. I still can't make heads or tails about what is going on here. :(
When I started SuperVDP, I didn't know how to do it. The end result is far far away from what I planned in the very beginning.
ammianus wrote:ob1, could you please perhaps explain, for us civilians, what your final demo is actually doing?
The demo takes 2 tiles plane and scroll them.
The Vega (?) stage features line scrolling, whereas the "chinese" stage features sprites (the boat and the kanji).
ammianus wrote:Can you post a higher resolution version of this diagram: http://www.valpocl.com/SuperVDP/diagram.jpg
Don' have it right now. It's a little bit simpler than the diagram though.
ammianus wrote:In the end are you using both processors for drawing sprites/tiles?
Absolutely.
The master draws sprites that lay in the upper screen (0 to 111), and the slave draws sprite that lay in the lower screen (112 to 223).

ammianus wrote:If so, how do you divide work between them?
As far as I remember, the CPU says :
for each scanline (0 to 111 or 112 to 223),
I browse the plane table (which is similar to the Genesis plane table),
If a line from a tile is the same of the current scanline
Then I draw this line of the tile
ammianus wrote:Are you using DMA after all? If so, how?
Definitely !
DMA, byte by byte, from the tiles data to the Frame Buffer.
If I remember correctly,
the Master DMA0 is used to transfer tile data from the Genesis side to the 32x SDRAM
the Master DMA1 is for the SuperVDP
the Slave DMA0 is for the SuperVDP
the Slave DMA1 is fr the 32x palette.
ammianus wrote:What size tile are you using in final version of the demo?
8x8 I guess, but it could go up to 32x32.
Besides, sprites are handled differently. You'll only find raw data (colors), and not indirect access to tiles.

ammianus wrote:I see you have done impressive things, squeezing every last cycle from the 32X for best possible performance.
I have been rethinking about it quite recently, and there's a way to be faster. That's the NeoGeo 32x thing I won't have time to do.
Feel free to develop it ;)

ammianus wrote:When you have scrolling planes though, are you redrawing the FB each time?
Yes.
ammianus wrote:In my simplistic game demo, I "scroll" the background, but really I am just telling it what part of the game map to draw each frame. Is that what you mean by scrolling? Or is it something more fundamental than that? Anyway you can diagram what it is you do? :D
In a Genesis only game, for each frame, the 68k tells the VDP what is to be drawn : scroll data, plane table, sprite data, tiles, palette ...
It's exactly the same for the SuperVDP. For each frame, the 68k tells the SuperVDP what to draw : scroll data, plane table, tiles, sometimes sprites, palette. My 2 demos focus only on a stage (~640 x 224), but if you want something bigger (a game demo), the 68k would have to tell the 32x what to draw, determine what part of the game map to draw, and so on...
My demos are more proof of the concept, 32x-side.

Hop that helps ;)

ammianus
Very interested
Posts: 124
Joined: Sun Jan 29, 2012 2:10 pm
Location: North America
Contact:

Post by ammianus » Tue Nov 13, 2012 2:40 am

Thanks ob1!, that is just the kind of summary context I need to process this thread.

I haven't yet done any coding using DMA or second cpu. I am feeling very positive now. 8)

ehaliewicz
Very interested
Posts: 50
Joined: Tue Dec 24, 2013 1:00 am

Post by ehaliewicz » Tue Oct 28, 2014 3:50 pm

So, did anyone figure out how to get this working on HW?

I'm thinking about doing some tile-heavy 32X work and it would be amazing if I could get graphics anywhere near this :)

Post Reply