New Documentation: An authoritative reference on the YM2612

Sauraen · Post by **Sauraen** » Thu Nov 22, 2018 5:57 pm

jotego wrote: Thu Nov 22, 2018 12:35 pm My implementation -so far as v0.61 of JT12- was assuming that a 24 CSR (circular shift register) only had an update point, so it would take 24 FM ticks to get new data in. That also made sense in view of the length of the BUSY counter, which Sauraen had reported to count for 32 FM ticks. However, the evidence from Kabuto and from YM3438 document tells us that the CSR can be updated in just 12 FM ticks. Thinking of different ways to accomplish this on silicon, I think that an update point just in the middle of it is economical and makes sense if they were reusing layout work from YM2203. As I explained before. Now, if someone looks at the die shots (YM3438 or YM2612) and checks the CSR and outputs from each flip flop just go straight to the next FF's input without any mux anywhere, then we will have to consider other implementation options.

Surprise surprise, I was wrong. I thought I remembered them being 24 units/layers deep, and they are, but they're broken up to two groups of 12, with logic in between and at the beginning and the end. I don't have time right now to fully transcribe it, but given the other findings here, it seems reasonable that this allows a write to be made to the end or the middle. This would take more control logic, since at any given time there's two different operator's values which could be updated, but it's not that bad because the two values that could be updated would be subsequent operators in the same voice (e.g. voice 5 operator 3 is in the middle and operator 2 is at the end).

The voice-level circular shift registers are 5-6 units/layers deep as expected. If I remember correctly for the YM2612, the register holding the DAC (PCM) value should update immediately (as soon as the data is in the internal data register), it just only gets used once per complete FM cycle (unless you have the DAC Loud bit turned on).

Stef · Post by **Stef** » Thu Nov 22, 2018 9:32 pm

I can confirm DAC can be updated at more than 26 Khz, i don't know why people keep repeating that... maybe it depends from the system version but at least i was able to do 32 Khz on both MD1 and the MD2 and i'm confident we can get as high than 53 Khz. I think the 26 Khz mistake come from a driver using the busy bit to test the PCM write register speed, today we know we can't realy on this bit to correctly evaluate "busy" time.

TmEE co.(TM) · Post by **TmEE co.(TM)** » Fri Nov 23, 2018 4:06 am

It is possible to update DAC once every sample if you have cycle synchronity to the YM. I did some tests where YM and Z80 were made to share the clock in MD and I could get unique samples out every 144 cycles. Doing rates between full and half sample rate are very ugly though due to the "resampling" effect, leading to pretty nasty sound (which is why I originally thought samples get missed). Unfortunately YM and Z80 don't share clock in YM and it is not possible to maintain any cycle level synchronity with the YM, especially when any ROM access is being done which adds uncertain amount of latency to the operation.

jotego · Post by **jotego** » Fri Nov 23, 2018 10:49 am

Sauraen wrote: Thu Nov 22, 2018 5:57 pm Surprise surprise, I was wrong. I thought I remembered them being 24 units/layers deep, and they are, but they're broken up to two groups of 12, with logic in between and at the beginning and the end. I don't have time right now to fully transcribe it, but given the other findings here, it seems reasonable that this allows a write to be made to the end or the middle.

Awesome news. So we have this piece of information confirmed. It is funny how they went for breaking up the CSR. As I said before, I think they were reusing layout from YM2203. Because the break is made at stage 12, it just happens that in comparison with the next data at the head, the difference is only in the operator MSB, let me explain. You have data flowing this way:

Operator (binary) .........00...........01.........10............11.....
Channel (decimal) 0-1-2-3-4-5-0-1-2-3-4-5-0-1-2-3-4-5-0-1-2-3-4-5

So from one location in the CSR to another 12 stages away the only difference in the operator-channel identification is the MSB of the operator, where in one location is 00b, in the other it will be 10b. So the logic to decide whether to update or not can be partly shared. Not sure if they actually shared it.

Anyway, another mistery solved!

TmEE co.(TM) wrote: Fri Nov 23, 2018 4:06 am It is possible to update DAC once every sample if you have cycle synchronity to the YM. I did some tests where YM and Z80 were made to share the clock in MD and I could get unique samples out every 144 cycles. Doing rates between full and half sample rate are very ugly though due to the "resampling" effect, leading to pretty nasty sound (which is why I originally thought samples get missed). Unfortunately YM and Z80 don't share clock in YM and it is not possible to maintain any cycle level synchronity with the YM, especially when any ROM access is being done which adds uncertain amount of latency to the operation.

I agree. An update of exactly 144 cycles (that is to say 24 FM ticks) will work. By the way, any PCM rate that is not an integer multiple of 24 FM ticks will eventually have sound artifacts. I don't know if software developers were aware of this. Probably not.
For instance, if you update at 36 FM ticks half your samples will take two output slots (48 ticks) and the other half only one (24 ticks) so on average you do have (48+24)/2 = 36 ticks per sample but they will not be evenly output to the DAC so there will be very noticeable distortion.

Sauraen · Post by **Sauraen** » Fri Nov 23, 2018 5:40 pm

jotego wrote: Fri Nov 23, 2018 10:49 am Anyway, another mystery solved!

Yeah! I was wondering why it wasn't just uniformly 24 layers...

jotego wrote: Fri Nov 23, 2018 10:49 am By the way, any PCM rate that is not an integer multiple of 24 FM ticks will eventually have sound artifacts. I don't know if software developers were aware of this. Probably not.

Anyone who has rudimentary knowledge of DSP should realize this. But, if you sample at a much lower rate like 4 kHz used in common games (is it 8 kHz?), the effect is less noticeable since there are many samples which are the same. Plus, there is additional aliasing between the actual rate of writing to the DAC register and the mandatory 44.1kHz sampling rate of the VGM file--this was obviously not an issue when making games, but it is an issue now.

Also, if my hypothesis that the DAC Loud test bit just makes the DAC value outputted all the time instead of a fraction of 1/6th of the time, you should be able to push audio out at an even higher sampling rate, since the 24 FM ticks no longer has any bearing on the output.

nukeykt · Post by **nukeykt** » Sat Nov 24, 2018 7:10 am

Probably my cycle accurate YM3438 emulator(which is basically translation of YM3438 schematic to C code) can give some insight about YM2612/YM3438 internals.

jotego · Post by **jotego** » Sat Nov 24, 2018 8:15 am

nukeykt wrote: Sat Nov 24, 2018 7:10 am Probably my cycle accurate YM3438 emulator(which is basically translation of YM3438 schematic to C code) can give some insight about YM2612/YM3438 internals.

That's an impressive piece of work. I will look at it carefully. Particularly I want to learn more about the undocumented bits. I do not see any schematic in Github, though. Only a very nice SVG file but that is very little information about the chip. Do you have something else?

Apologies for saying it but It is not cycle accurate! For instance, you are not using a circular shift register for the pipeline data so you are not sensitive to inconsistency in writes. Indeed, if you do not have a clock input, I don't see how you can be cycle accurate.

nukeykt · Post by **nukeykt** » Sat Nov 24, 2018 8:43 am

jotego wrote: Sat Nov 24, 2018 8:15 am That's an impressive piece of work. I will look at it carefully. Particularly I want to learn more about the undocumented bits. I do not see any schematic in Github, though. Only a very nice SVG file but that is very little information about the chip. Do you have something else?

Apologies for saying it but It is not cycle accurate! For instance, you are not using a circular shift register for the pipeline data so you are not sensitive to inconsistency in writes. Indeed, if you do not have a clock input, I don't see how you can be cycle accurate.

Thanks!

I have no any other documentation unfortnately. YM3438 die shot is pretty clean, so just labeling certain bits was enough to understand it's logic. Vectorizing entire chip is overkill i think.
About cycle accuracy:
Emulation core runs at YM2612's internal FM rate(i.e 24x of YM2612's sampling rate). Calling OPN_Clock will advace chip state by one FM cycle and return one stereo sample. Shift registers were replaced by arrays for performance reasons. This change doesn't affect accuracy.

Sik · Post by **Sik** » Sat Nov 24, 2018 2:38 pm

On the topic of DAC output: how do DAC bits map to the actual channel output? Because DAC seems to sound too quiet compared to what FM channels can do and I always have to make samples louder to compensate (and that's considering I already make FM channels to be TL = 8 or quieter).

nukeykt · Post by **nukeykt** » Sat Nov 24, 2018 3:06 pm

Sik wrote: Sat Nov 24, 2018 2:38 pm On the topic of DAC output: how do DAC bits map to the actual channel output? Because DAC seems to sound too quiet compared to what FM channels can do and I always have to make samples louder to compensate (and that's considering I already make FM channels to be TL = 8 or quieter).

9 bits of PCM data are directly fed to YM2612's 9-bit DAC just like FM channels.

Sik · Post by **Sik** » Sat Nov 24, 2018 4:33 pm

But register $2A only has 8 bits...

nukeykt · Post by **nukeykt** » Sat Nov 24, 2018 4:47 pm

Sik wrote: Sat Nov 24, 2018 4:33 pm But register $2A only has 8 bits...

LSB bit of PCM is mapped to $2C:3.

jotego · Post by **jotego** » Sat Nov 24, 2018 6:02 pm

nukeykt wrote: Sat Nov 24, 2018 4:47 pm
Sik wrote: Sat Nov 24, 2018 4:33 pm But register $2A only has 8 bits...
LSB bit of PCM is mapped to $2C:3.

Exactly, and MSB is inverted in order to get correct sign for addition with other channels:

Code: Select all

pcm_data = { ~pcm[8], pcm[7:0] } ;

(Verilog code)

jotego · Post by **jotego** » Sat Nov 24, 2018 6:27 pm

nukeykt wrote: Sat Nov 24, 2018 8:43 am I have no any other documentation unfortnately. YM3438 die shot is pretty clean, so just labeling certain bits was enough to understand it's logic. Vectorizing entire chip is overkill i think.
About cycle accuracy:
Emulation core runs at YM2612's internal FM rate(i.e 24x of YM2612's sampling rate). Calling OPN_Clock will advace chip state by one FM cycle and return one stereo sample. Shift registers were replaced by arrays for performance reasons. This change doesn't affect accuracy.

I agree that it doesn't affect accuracy for well behaved software. What you called one FM cycle, is what I have been calling 24 FM clock ticks, and from the discussion above we have determined that register updates happen in a 12-FM-clock-tick frame. Thus when software changes dynamically register values while sound is still played, there will be a 12-FM tick delay between your solution and real hardware. Nothing major, unless someone writes software to actually exploit this difference. For instance, if software updates PCM at 36-FM-tick pace, as I explained before. Indeed, when a value is updated within the 24 FM tick cycle will make a difference if software is not nice behaved (which means updating register values only while key-off, using PCM update in integer multiples of 24FM ticks and reading BUSY bit).

By the way, there is still one question about how the register selection works pending there: when is the part selection bit (i.e. A1 pin) latched? For instance, if user writes to address 2'b00 the value $30, and then to address $b11 the value $FF, where does $FF go to? What happens when there is inconsistency between the A1 pin value for write address and write data operations? Have you got any insight about this?

Code: Select all

            case 0xa0:
                chip->fnum[channel] = (chip->data & 0xff) | ((chip->reg_a4 & 0x07) << 8);
                chip->block[channel] = (chip->reg_a4 >> 3) & 0x07;
                chip->kcode[channel] = (chip->block[channel] << 2) | fn_note[chip->fnum[channel] >> 7];
                break;
            case 0xa4:
                chip->reg_a4 = chip->data & 0xff;
                break;
            case 0xa8:
                chip->fnum_3ch[channel] = (chip->data & 0xff) | ((chip->reg_ac & 0x07) << 8);
                chip->block_3ch[channel] = (chip->reg_ac >> 3) & 0x07;
                chip->kcode_3ch[channel] = (chip->block_3ch[channel] << 2) | fn_note[chip->fnum_3ch[channel] >> 7];
                break;
            case 0xac:
                chip->reg_ac = chip->data & 0xff;
                break;

I see that you got different latch for MSB of FNUM of channel 3 special values and the rest. That is very interesting. Have you confirmed that by inspecting the die shot?

It looks like you got all test bits implemented to! That's awesome. I think I might be missing some. I will probably add all for completeness.

Another question from your code:

Code: Select all

    if (chip->eg_kon_csm[slot])
    {
        nextlevel |= chip->eg_tl[1] << 3;
    }

I thought CSM would just trigger key-on for a cycle, and after that key-off. But you seem to be altering the value in a strange way. Why TL[1]? Is that a value of TL inside the envelope pipeline? I am not sure if your implementation actually tries to replicate the pipeline, but in order to do that you should operate at the FM clock rate and not the 24*FM clock rate (i.e. sample rate) you said. And what does that line do? An OR there can only make sound quieter but why make it quieter for a key on? How confident are you about the chip actually doing this?

By the way, I love your simple API and neat implementation. I have seen many and they make everything very software-ish complicating everything a lot.

nukeykt · Post by **nukeykt** » Sat Nov 24, 2018 7:36 pm

jotego wrote: Sat Nov 24, 2018 6:27 pm I agree that it doesn't affect accuracy for well behaved software. What you called one FM cycle, is what I have been calling 24 FM clock ticks, and from the discussion above we have determined that register updates happen in a 12-FM-clock-tick frame. Thus when software changes dynamically register values while sound is still played, there will be a 12-FM tick delay between your solution and real hardware. Nothing major, unless someone writes software to actually exploit this difference. For instance, if software updates PCM at 36-FM-tick pace, as I explained before. Indeed, when a value is updated within the 24 FM tick cycle will make a difference if software is not nice behaved (which means updating register values only while key-off, using PCM update in integer multiples of 24FM ticks and reading BUSY bit).

One FM clock in my code is 6 master clocks. For example in Genesis Plus GX emulator OPN2_Clock is called 1.27(7.67/6) million times per second(for comparison MAME code works at 53.267 KHz). Like real hardware my implementation updates registers within 12 FM ticks. Look carefully to OPN2_DoRegWrite function.

jotego wrote: Sat Nov 24, 2018 6:27 pm By the way, there is still one question about how the register selection works pending there: when is the part selection bit (i.e. A1 pin) latched? For instance, if user writes to address 2'b00 the value $30, and then to address $b11 the value $FF, where does $FF go to? What happens when there is inconsistency between the A1 pin value for write address and write data operations? Have you got any insight about this?

It's a bit weird i think. I'm not 100% sure, but looking at my code it's behaves like this. For global registers(0x21-0x2C) A1 of data write matters, but for register and channel registers A1 from address write matters instead. Would be cool if someone could perform hardware tests.

EDIT:
Looks like for global registers both address and data writes must have zero A1 pin state in order to work properly.

jotego wrote: Sat Nov 24, 2018 6:27 pm

Code: Select all

            case 0xa0:
                chip->fnum[channel] = (chip->data & 0xff) | ((chip->reg_a4 & 0x07) << 8);
                chip->block[channel] = (chip->reg_a4 >> 3) & 0x07;
                chip->kcode[channel] = (chip->block[channel] << 2) | fn_note[chip->fnum[channel] >> 7];
                break;
            case 0xa4:
                chip->reg_a4 = chip->data & 0xff;
                break;
            case 0xa8:
                chip->fnum_3ch[channel] = (chip->data & 0xff) | ((chip->reg_ac & 0x07) << 8);
                chip->block_3ch[channel] = (chip->reg_ac >> 3) & 0x07;
                chip->kcode_3ch[channel] = (chip->block_3ch[channel] << 2) | fn_note[chip->fnum_3ch[channel] >> 7];
                break;
            case 0xac:
                chip->reg_ac = chip->data & 0xff;
                break;

I see that you got different latch for MSB of FNUM of channel 3 special values and the rest. That is very interesting. Have you confirmed that by inspecting the die shot?

Yes

jotego wrote: Sat Nov 24, 2018 6:27 pm It looks like you got all test bits implemented to! That's awesome. I think I might be missing some. I will probably add all for completeness.

Another question from your code:
Code: Select all
    if (chip->eg_kon_csm[slot])
    {
        nextlevel |= chip->eg_tl[1] << 3;
    }
I thought CSM would just trigger key-on for a cycle, and after that key-off. But you seem to be altering the value in a strange way. Why TL[1]? Is that a value of TL inside the envelope pipeline? I am not sure if your implementation actually tries to replicate the pipeline, but in order to do that you should operate at the FM clock rate and not the 24*FM clock rate (i.e. sample rate) you said. And what does that line do? An OR there can only make sound quieter but why make it quieter for a key on? How confident are you about the chip actually doing this?

By the way, I love your simple API and neat implementation. I have seen many and they make everything very software-ish complicating everything a lot.

This behavior should be consistent with real hardware. Not sure why Yamaha implemented it in that way though.

In theory my code should be 1:1 to real hardware. But in practice i could make some mistakes in implementation.

SpritesMind.Net

New Documentation: An authoritative reference on the YM2612

Re: New Documentation: An authoritative reference on the YM2612

Re: New Documentation: An authoritative reference on the YM2612

Re: New Documentation: An authoritative reference on the YM2612

Re: New Documentation: An authoritative reference on the YM2612

Re: New Documentation: An authoritative reference on the YM2612

Re: New Documentation: An authoritative reference on the YM2612

Re: New Documentation: An authoritative reference on the YM2612

Re: New Documentation: An authoritative reference on the YM2612

Re: New Documentation: An authoritative reference on the YM2612

Re: New Documentation: An authoritative reference on the YM2612

Re: New Documentation: An authoritative reference on the YM2612

Re: New Documentation: An authoritative reference on the YM2612

Re: New Documentation: An authoritative reference on the YM2612

Re: New Documentation: An authoritative reference on the YM2612

Re: New Documentation: An authoritative reference on the YM2612