I am glad to see the thread still alive. We do not know everything about the YM2612 yet
About this one, I've run a few tests on VA4 MD1 (with discrete YM2612) and VA0 MD2 (with 315-5660 ASIC) regarding the BUSY flag and here is what I figured or confirmed from Sauraen die shot analysis :
1) BUSY flag can only be read from port 0 (A0=A1=0) on discrete YM2612 while it can be read from any port on ASIC-integrated version
2) BUSY flag is only set on DATA port writes (A0=1)
2) BUSY flag duration is constant and does not depend on the written register
3) BUSY flag duration seems to be 32 internal clocks (32*6 68k clocks): this was tested with my emulator against real hardware using a test program that counts number of status read with BUSY flag being set. Results were identical to real hardware on emulator with busy wait set to 32*6 68k cycles, other values (like 24, 48 or even 30) give too much different values than real hardware
4) BUSY flag duration is the same on discrete YM2612 and ASIC-integrated chip
I actually entered the forum today to talk about this. I didn't expect to find this post. Well, what I am going to say kind of contradicts this information. I spent the last 5 days working on JT51
, adding lessons learnt from the JT12
experience. One tool I made when developing JT51 was a sample accurate operator in C. With some parameters about the operator configuration, the tool produces an output on screen with the exact same sequence as the real YM2151. It is verified against literally thousands of chip measurements as I automated the measurement process to take samples for each set of register values. The YM2151 has digital output, which YM2612 doesn't, so that work is rather easy. (In my case it wasn't because the PCB I made for this was a disaster
Anyway, on my v1.0 implementation of JT51 I had to make a lot of tricks to get the exact output from the Verilog RTL code. The problem is that one operator, which is called M2 in YM2151 and is equivalent to S3 in YM2612 seems to be getting output in the middle of two samples. Suppose you use algorithm 7 (all operators are just summed at the output. It is supposed to be good for organ sound emulation). Then you would expect samples to have the sum of S1,S3,S2,S4 (or M1,M2,C1,C2 in YM2151 documentation). However, as the keyon
is set for all operator on a channel, M2
(S3) gets output one sample ahead of time. This is issue #1
Then, if you go for an algorithm where one operator modulates another, like S1->S2 in alg. 6 (called M1->C1 in YM2151) then if keyon
did not occur first on S1 and then on S2, the modulation effect of S1 over S2 would be lost for the first sample. However, in all my measurements I found consistently that modulation was present from the very first sample. I took measurements primarily on channel 0 but I did take many measurements on the other channels to verify whether they were equal or not. They were
. (NB: I could not measure channel 7. Only verified 0-6). So keyon order on the operators is important. This is issue #2
So these two issues seem to point at two hardware blocks:
Issue #1 -> related to when the accumulator unit resets the output sum, produces and output and starts to sum up the next sample.
Issue #2 -> related to when they keyon information gets processed.
I fixed these issues on v1.0 with ugly
verilog. A hack. I wanted to take Sauraen's work on the operator of YM2612 and take it to substitute my operator unit in JT51. Although the YM2151 had 8 channels instead of the 6 of YM2612 so I had to make some arrangements. As expected, -after many hours of work- the operator did actually work. When comparing outputs from Verilog to outputs from my C emulator -which is exact in value and sample time to the original- I found that issue #1 was manifesting for some algorithms (but not all). So M2 being off by one sample was creating an error in my regression tests. When I look at the waveforms the error signals are virtually identical and the difference is likely not to be hearable at all. But you know what happens to engineers, we want it perfect so eventually I will go back to issue #1 in the future.
But as for issue #2, I found that a way to fix it that did not call for a complex implementation was to run the operator update not as a constant time, but at a variable. This is where the busy
signal comes in. Let me explain:
When the user writes to a register, the internal circular shift registers may be at any point. The easy way to get the new data in is to wait for the shift register to output the old and then feed in the new one. If the user writes data when register 2/5 (operator 2, channel 5) is being output then it just waits (busy on
) until the same register 2/5 gets output again. During that time the required register has been traversed too and so the data has been written. I made it that way in JT51 and then in JT12 too. But I think my JT12 implementation may take only 24 cycles instead of 32... I should check.
The problem with this is the keyon
register. That register must
be another circular shift register that holds the key-on state of each operator. But if it was updated as explained on the previous paragraph then sometimes S2 would get the keyon before S1 does and then the first sample of S2 would not contain modulation from S1 as S1 is still in the keyoff state. This is the key aspect here: keyon must occur in order for the operators
. At least in YM2151 the keyon for M1 (S1) always occurs before that of C1 (S2) and the modulation is always present. Was it the same for YM2612? I wonder.
Note that in order to make S1 keyon happen ahead of time of S2 the register update strategy cannot take a fixed number of cycles. Not at least for keyon writes
. This is something I would be very grateful to Eke if he could verify on the real part. Eke has shown that it takes 32 cycles to get the busy signal cleared. But... Does it take 32 cycles too for keyon writes?
By the way, you may be thinking that it is not a big deal to update the keyon register in a different way. But when dealing with actual logic gates, the design can get very complex and large easily. The update way I propose is very economical in terms of circuit area and gate count so I bet the original one is like that.
I hope I have not lost you with this long post. I know it is hard to follow. Probably even with illustrations it would be hard!