New Documentation: An authoritative reference on the YM2612

For anything related to sound (YM2612, PSG, Z80, PCM...)

Moderator: BigEvilCorporation

Sauraen
Interested
Posts: 49
Joined: Sat Sep 19, 2015 2:44 pm
Contact:

Re: New Documentation: An authoritative reference on the YM2612

Post by Sauraen » Sun Aug 14, 2016 3:40 am

GManiac wrote:
operators are shifted arithmetically (not logically) by 5 bits
The top 9 bits of the 14-bit operator output are sent to the accumulator, which adds the operators within one voice. So there's no question of shifting--it just takes the top 9 bits and discards the bottom 5. Only on PC where you are actually running these calculations in a 32-bit register does it matter which kind of shift you're using. The correct one here would indeed be an arithmetic shift, though depending on your implementation of the saturation (clipping) when operators are added, it may not matter. Remember, these are signed numbers, not unsigned numbers with an extra sign bit.

Any emulator which is outputting 14-bit audio is not authentic, simply because the YM2612 DAC is 9 bit.

The ladder effect is a result of the resistor string DAC--when the MSB is 0 and the string is between VDD and the midpoint, it pulls up the midpoint voltage slightly, and vice versa when the MSB is 1 and the string is between GND and the midpoint. Basically, the two halves of the range have decent linearity in themselves, but they don't quite meet in the middle. A decent way of emulating this would be to offset negative values by a constant amount (which it sounds like you're doing). Determining what that amount is has probably already been done quite well by someone with a good scope, but I can't do an effective job of that just by reading the die. :)

Eke
Very interested
Posts: 884
Joined: Wed Feb 28, 2007 2:57 pm
Contact:

Re: New Documentation: An authoritative reference on the YM2612

Post by Eke » Mon Aug 15, 2016 2:15 pm

GManiac wrote: Can anyone add this "feature" to existing emulators / players plugins? Usually I listen music via Maxim's plugin.
I am emulating the 9-bit DAC (although not 100% accurately for all algorithms since bits 0-4 masking is done after the operator 14-bit outputs have been eventually summed while it should be done on each operator carrier).

Problem with emulating the "ladder effect" is that no precise measures/tests have been made on DAC output to actually emulate it accurately, we just know that an "offset" must be added for negative values but I'm still not sure how large it should be and to which levels it should be applied. So I was actually waiting to be able to accurately measure the voltage on MOL/MOR for each 512 DAC levels before implementing something.

I guess that we could use the recently discovered 9th bit for Ch6 DAC mode in test register $2C (bit 3) to write a test ROM that outputs all 512 levels by muting all channels except Ch6 and writing directly to DAC register.

jotego
Interested
Posts: 22
Joined: Sat Jan 28, 2017 8:30 am
Location: Valencia (Spain)
Contact:

Re: New Documentation: An authoritative reference on the YM2612

Post by jotego » Sun Feb 26, 2017 8:30 pm

Hi all,

I have finished reading the full thread and I have enjoyed every bit of it. Let me add to it now.

First, I am working on a verilog module that replicates in hardware the YM2612/YM3438 to a very precise level. The internal operation is likely to be pretty much the same as the original. Everything runs in parallel, with pipelines, circular shift registers and stuff. I had actually developed this architecture when I did the YM2151 clone (the JT51). Then I was asked to do the YM2612, found Sauraen's and Nemesis' findings and was just very glad to see the convergence between JT51 architecture and actual hardware. On this new clone, the JT12, I am getting even closer. BTW, I also want to say thank you to HardWareMan for taking the die pictures. I have seen contributions on this thread of many others, thank to all of you.

JT12 is GPL. You can have a look here. The folder hdl contains actual verilog files.

Note that the work is not complete yet, though it is probably just a couple of weeks away from completition. You can check the issues page to see then pending tasks. You can also see it working with the current status here. It is running on a MiST FPGA board.

Many of the things discussed in the thread make a lot of sense when you take a look at the hardware implementation in the verilog files. Aspects such as the duration of the BUSY signal, or EG behaviour are much clearer when look at from the eyes of a digital design, rather than software. By the way, the operator implementation (jt12_op.v) is pretty much a copy of Sauraen's VHDL file. Although converted to verilog and with style modifications, in essence, it is the same thing.

Second, I have a blunt offer to people working on emulators. I think it might make sense to use Verilator in order to obtain a C++ model of the Verilog files, which is cycle-clock accurate. With that model you do not need an emulator, you just get the actual chip running inside your software! Although this might sound like a slow solution, I have seen some YM2612 emulator work which is so high level in terms of objects, abstraction layers, interfaces and stuff that it cannot be fast either!

Third, let me make a couple of comments to things that were said at some point in the thread:
HardWareMan wrote:
TmEE co.(TM) wrote:I'm not very sure but I think the 2 big zig zags (capacitors ?) right next to each other under DAC part are part of channel switching.
Yeah, it could be capacitors. Or it could be power current amplifier. I'm not sure for now. But, there are lot of capacitors, that interconnect some power rails on die. Look at bidirectional pin:
Image
And analog output pin:
Image
Analog output has only one transistor, wich connected to AVCC. Thus, analog output can only source current, not sink. External resistor required.

And here those "zig-zags". Definitely this is a two transistors with some control circuit.
Image
I am almost certain that these zig-zags are ESD protection structures. These are compulsory protection circuits that must be placed next to the pads of a silicon die.
Eke wrote:
① Confirm the connection (algorithm) data. If these data are zero (0) through three (3), then the waveform does not distort, even at TL=0.

② When con. = 4, set the TL of the second individual carrier so that it will be negative six (-6) [dB] or more.

③ When con. = 5 or 6, set the TL of the third individual carrier so that it will be negative nine and one half (-9.5) [dB] or more.

④ When con. = 7, set all TL’s to negative twelve (-12) or more.
This tells about the accumulator behaviour and lack of clamping and resolution. In some aspects, these chips were too cheap :(
Stef wrote:Thanks Sauraen for the precious and detailed informations ! The internal organization in shift register explain indeed the volatile BUSY time. I guess it will be complicated to find a good formula to calculate a perfect timing. Anyway because of this unpredictable aspect almost sound driver always pool the busy flag because writing the YM registers.
The YM3438 application notes PDF does contain a table with the length of BUSY time depending on the register. I have to compare it to my implementation (jt12_reg.v) but at the end we are doing the same thing: we have a circular shift register and we have to wait for the selected register to come out of it so we can insert the new value in its place. Note that in my implementation, the old value comes out and the new value in, so it takes a full 24-operator cycle to get the new value take effect. It is possible to discard the old value but it requires more logic, and if they were mean on the DAC, which is more important, I do not think they will extra logic for this.

Well, thanks for reading this long post. I will keep working on JT12 and let you know when completed.

HardWareMan
Very interested
Posts: 745
Joined: Sat Dec 15, 2007 7:49 am
Location: Kazakhstan, Pavlodar

Re: New Documentation: An authoritative reference on the YM2612

Post by HardWareMan » Mon Feb 27, 2017 2:04 am

Hi and welcome!
jotego wrote:I am almost certain that these zig-zags are ESD protection structures. These are compulsory protection circuits that must be placed next to the pads of a silicon die.
Image
A lot of time is passed. Now I sure that analog output have only one transistor that can only source current. You may clearly see it.
Image
As for about this one's, it is strong buffer for clock. It feeds from input logic of CLK pin and have some kind divider or counter (notice: with 6 steps). Also there is some kind shift register right below it. I believe this is main sync circuit, that set starts of any process inside (such start of channel and operator in it).

Anyway, I can't be 100% sure until I make better die shots.

Sauraen
Interested
Posts: 49
Joined: Sat Sep 19, 2015 2:44 pm
Contact:

Re: New Documentation: An authoritative reference on the YM2612

Post by Sauraen » Mon Feb 27, 2017 9:08 pm

HardWareMan is correct, those are large drive transistors, not ESD protection devices. The one between OUT and AVCC is the driver for the output pin, and the two in a pair near the "Control" label are indeed for driving the positive and negative clock lines throughout the chip. These are labeled in my annotated die shot with the blocks labeled.

The only sort of protection I've seen on the chip is some sort of resistor on the digital input pins. The YM2612 isn't CMOS so large protection transistors (really diodes) weren't required.

jotego
Interested
Posts: 22
Joined: Sat Jan 28, 2017 8:30 am
Location: Valencia (Spain)
Contact:

Re: New Documentation: An authoritative reference on the YM2612

Post by jotego » Wed Mar 01, 2017 11:59 am

Sauraen wrote:The only sort of protection I've seen on the chip is some sort of resistor on the digital input pins. The YM2612 isn't CMOS so large protection transistors (really diodes) weren't required.
I see. ESD really became popular with CMOS processes.

Eke
Very interested
Posts: 884
Joined: Wed Feb 28, 2007 2:57 pm
Contact:

Re: New Documentation: An authoritative reference on the YM2612

Post by Eke » Tue Mar 14, 2017 2:42 pm

Sauraen wrote: I would go with the LFO, but honestly I haven't found it yet.
Hi,

Recently doing some correction in YM2612 core, I was wondering if you ever figured out the LFO and how it works exactly, especially regarding Phase Modulation.

Some questions I have (based on current implementations):

1) how is LFO PM value calculated ?

MAME implementation uses an offset table which takes LFO step (0-31) and LFO depth (0-7) as X/Y then use the seven upper bits of the FNUM value as weights to compute an offset value :

for (i = 4 to 10) LFO_PM += (FNUM & (1<<i)) ? ((LFO_TAB[step][depth] << 1) >> (10 - i)) : 0

Exodus use similar table but uses the whole FNUM range (bits 0-10) to calculate LFO PM value and does the bitweight shifting differently:

for (i = 0 to 10) LFO_PM += (FNUM & (1<<i)) ? ((LFO_TAB[step][depth] << i) : 0
LFO_PM >> 9

See also this discussion: viewtopic.php?f=24&t=386&start=480

The difference is that MAME implementation right-shifts LFO table value for each weighted bit BEFORE adding them while Exodus first left-shift all table value, add them and THEN performs right-shift on the final sum. Result is similar but MAME implementation therefore limits itself to 7 upper bits (since right-shifted table values will always be 0 starting from bit3) while Exodus implementation provides additional bits of precision.

From hardware point of view, both implementations seem plausible so it's hard to figure which one is correct.

2) how and where exactly is LFO PM value applied ?

MAME applies the offset value on the whole BLOCK+FNUM register value, then recalculate BLOCK (which has been possibly impacted by LFO PM), Key-Code (which depends on FNUM and BLOCK) and Detune (which depends on Key Code) values before doing normal PG process on modified FNUM value (BLOCK shift, DETUNE add, etc). It's also notable that there is an additional bit of precision introduced here for some unknown reason (offset value is twice the one used in Exodus but bit0 is not used when adding to BLOCK/FNUM value and is instead injected during BLOCK shift).

Exodus directly applies the offset value on the FNUM value then recalculate Key-Code and Detune values before doing normal PG process.

Note that, while Key Code is recalculated in both implementations (and is therefore impacted by LFO PM) , Key Scale and EG rates (which depend on Key code value) are never recalculated and therefore not impacted by LFO PM.

From an hardware point of view, it seems odd that BLOCK shift value would be impacted, same goes for Key Code if Key Scale is not (unless key code is recalculated by both EG and PG blocks !). I guess it would be a matter of figuring where the LFO PM adder takes place, what it takes as inputs, if it is internal or external to PM block and where its output goes).

Sauraen
Interested
Posts: 49
Joined: Sat Sep 19, 2015 2:44 pm
Contact:

Re: New Documentation: An authoritative reference on the YM2612

Post by Sauraen » Tue Mar 14, 2017 7:37 pm

Eke wrote:I was wondering if you ever figured out the LFO and how it works exactly, especially regarding Phase Modulation.
I never did a full reverse engineering of it, but here's what I have (from private messages to jotego):
Sauraen wrote:The LFO seems to have 3 sections:

[*] 7-bit linear prescaler. The test bit 0x21:1 goes into what looks like the carry-in or something similar; it could go into the reset, I can't quite tell with more detailed analysis. The 7-bit output (plus maybe carry-out) gets logiced together into 8 lines (evidently perform "== N" with N hardcoded for each line), and these go into a little selector unit which is also fed by the LFO Speed and LFO Enable bits. I can't quite see the output of this, but I do see there's some sort of feedback to the prescaler's reset. So this clearly seems like a divide-by-N prescaler. I can try to read the eight N's for you if you want, but you should be able to reverse engineer them from knowing what the LFO speeds are.
[*] 7-bit linear counter, with an 8-bit unit after the output (possibly inverts the output after each cycle to make a triangle wave?). Bits 1:6 of the output of this go to the EG, and stick into its pipeline at the same place where the LFO->Amplitude two bits go. (Elsewhere the operator LFO enable flag simply forces these two bits to zero for operators not affected by the LFO.) Some modified version of the 8-bit signal between the counter and the inverter unit thing goes to the third unit of the LFO.
[*] Highly complex unit which modifies the frequency data as it goes from the channel registers to the PG. The block bits bypass this, but all the frequency bits get modified by it. There's a bitslice portion corresponding to bits 0:6, and then what appears to be the same logic folded over to process bits 7:A. But the interesting part is that bits 4:A of the frequency data go into the bitslices 0:6. That is, the bitslice unit for bit 0 has bit 0 enter at the middle and leave (to the PG) at the bottom. But it also has bit 4 enter at the top. And so on through bit 6 having bit A enter at the top. It looks like the top portion is some sort of shifter for bits 4:A--the wires go diagonally so that bit 4 only gets used once, bit 5 gets used in bitslice 1 and 0, bit 6 gets used in bitslices 2:0, and so on so that bit A gets used in all of them. I'm guessing this whole unit is basically a multiplier, multiplying bits 4:A of the frequency value by bits 0:7 of the LFO state, and then adding the result to bits 0:6 of the frequency value (with carry up to the higher bits). It looks like, between the multiplied output and the adder, there's another shifter whose value is based on the the LFO->Frequency bits. But it looks like it's a bit more complex than I'm describing.
The upshot is that the LFO is definitely doing frequency modulation, not phase modulation.

Eke
Very interested
Posts: 884
Joined: Wed Feb 28, 2007 2:57 pm
Contact:

Re: New Documentation: An authoritative reference on the YM2612

Post by Eke » Wed Mar 15, 2017 1:18 pm

Sauraen wrote: I never did a full reverse engineering of it, but here's what I have (from private messages to jotego):
Thanks, it still helps confirming a few stuff
[*] 7-bit linear prescaler. The test bit 0x21:1 goes into what looks like the carry-in or something similar; it could go into the reset, I can't quite tell with more detailed analysis. The 7-bit output (plus maybe carry-out) gets logiced together into 8 lines (evidently perform "== N" with N hardcoded for each line), and these go into a little selector unit which is also fed by the LFO Speed and LFO Enable bits. I can't quite see the output of this, but I do see there's some sort of feedback to the prescaler's reset. So this clearly seems like a divide-by-N prescaler.
This would be LFO clock generator which is divided from internal clock based on LFO speed bits. Presumably, LFO enable bit halts the LFO clock generator as well as Test register 0x21 bit1 (similarly to YM2151 test register described here: to http://www.msxarchive.nl/pub/msx/mirror ... ym2151.txt)
I can try to read the eight N's for you if you want, but you should be able to reverse engineer them from knowing what the LFO speeds are.
According to MAME implementation, values are {108, 77, 71, 67, 62, 44, 8, 5} "sample clocks" where sample clock is internal clock / 24.
From the value specified in YM2612 doc, assuming a 8Mhz input clock and 128 LFO clocks per period, 3.98 hz for speed 0 would give 8000000/6/24/3.98/128 = 109 samples between each LFO clock, which is one above MAME specified value. All other speed values give also one additional sample value (72.2 for speed 7 gives 8000000/6/24/72.2/128 = 6 samples)
[*] 7-bit linear counter, with an 8-bit unit after the output (possibly inverts the output after each cycle to make a triangle wave?). Bits 1:6 of the output of this go to the EG, and stick into its pipeline at the same place where the LFO->Amplitude two bits go. (Elsewhere the operator LFO enable flag simply forces these two bits to zero for operators not affected by the LFO.)
That would be 7-bit LFO step counter (which is incremented on each LFO clock and reset when LFO enable bit is cleared). LFO AM value indeed corresponds to LFO step counter bits 0:5 shifted left by one and XORed with inversion of bit 6 (to generate an inverted triangle waveform)

LFO AM sensitivity (2 bits) indicates to EG how much LFO AM value is shifted before adding to EG output

Some modified version of the 8-bit signal between the counter and the inverter unit thing goes to the third unit of the LFO.
This would be LFO PM step (0-31), which takes bits 2:6 of LFO step counter (0-127) and goes to LFO PM calcuation unit you describe below
[*] Highly complex unit which modifies the frequency data as it goes from the channel registers to the PG. The block bits bypass this, but all the frequency bits get modified by it.
This would be LFO PM calculation unit and this would confirm BLOCK calculation is not affected by LFO PM.
Any chance you have located the "Key Code" calculation unit ? This would be a simple unit with a few OR/AND/NOT gates, taking highest 4 bits of FNUM value as inputs and outputting a 2-bit value (LSB of 5-bit Key Code value, MSB being BLOCK bits). Since this value is used by both EG and PG unit, it's more likely to be external to these units. Whether it interfaces with original frequency bits or the output of LFO PM calculation would indicate if LFO PM really has an impact on Detune value (since it depends on Key code value) as it is currently done in emulator implementations and if EG rates should be impacted as well.
There's a bitslice portion corresponding to bits 0:6, and then what appears to be the same logic folded over to process bits 7:A. But the interesting part is that bits 4:A of the frequency data go into the bitslices 0:6. That is, the bitslice unit for bit 0 has bit 0 enter at the middle and leave (to the PG) at the bottom. But it also has bit 4 enter at the top. And so on through bit 6 having bit A enter at the top. It looks like the top portion is some sort of shifter for bits 4:A--the wires go diagonally so that bit 4 only gets used once, bit 5 gets used in bitslice 1 and 0, bit 6 gets used in bitslices 2:0, and so on so that bit A gets used in all of them. I'm guessing this whole unit is basically a multiplier, multiplying bits 4:A of the frequency value by bits 0:7 of the LFO state, and then adding the result to bits 0:6 of the frequency value (with carry up to the higher bits). It looks like, between the multiplied output and the adder, there's another shifter whose value is based on the the LFO->Frequency bits. But it looks like it's a bit more complex than I'm describing.
This seems to match with MAME implementation (more than Exodus implementation). If we look at LFO PM offset table used in MAME we can see the max offset value per FNUM bit is 96 (0x60) for when bit 10 is set so this would only require 7-bit addition/substraction (hence bitslices 0 to 6) with carry propagating on higher bits. For following FNUM bits, LFO PM offset are shifted by one, so max value would be :
0x30 (6-bits i.e bitslices 0 to 5) for bit 9 ,
0x18 (5-bits i.e bitslices 0 to 4) for bit 8,
... ,
0x03 (2-bits i.e bitslices 0 & 1) for bit 5
and finally 0x01 (1-bit i.e bitslice 0 only) for bit 4.

I just don't understand the extra bit precision added in MAME implementation: from the chip hardware point of view, this would mean that this unit would takes 11 bits of original Frequency (FNUM) value but output 12 bits and that frequency input of PG unit is actually 12 bit or more, not 11 (with MSBs corresponding to original FNUM and LSB being forced to zero when no LFO modulation is applied) .
Another solution is that LFO offset value is calculated as 7 bit (6.1 fixed point) value and the lowest bit is discarded when adding to FNUM value.
Would it be possible to confirm or deny this from looking at the die shot ?

Eke
Very interested
Posts: 884
Joined: Wed Feb 28, 2007 2:57 pm
Contact:

Re: New Documentation: An authoritative reference on the YM2612

Post by Eke » Sun Apr 02, 2017 6:50 pm

TmEE co.(TM) wrote:26KHz is the effective max for the YM DAC register from my tests, going higher makes the sound worse. I haven't done any actual investigation but there cannot be anything else happening than missed writes. I played some real music and all the higher freq stuff was getting garbled when you went beyond 26KHz.

The game Hellfire has 2x slower music on MD2 than on MD1. The game spams YM with unnecessary writes and is being polite about it (waits for busy).
Discrete YM3438 didn't seem to run 2x slower (I no longer remember the result...), but it has issues in other games (I recall Sonic Spinball being one).
About this one, I've run a few tests on VA4 MD1 (with discrete YM2612) and VA0 MD2 (with 315-5660 ASIC) regarding the BUSY flag and here is what I figured or confirmed from Sauraen die shot analysis :

1) BUSY flag can only be read from port 0 (A0=A1=0) on discrete YM2612 while it can be read from any port on ASIC-integrated version
2) BUSY flag is only set on DATA port writes (A0=1)
2) BUSY flag duration is constant and does not depend on the written register
3) BUSY flag duration seems to be 32 internal clocks (32*6 68k clocks): this was tested with my emulator against real hardware using a test program that counts number of status read with BUSY flag being set. Results were identical to real hardware on emulator with busy wait set to 32*6 68k cycles, other values (like 24, 48 or even 30) give too much different values than real hardware
4) BUSY flag duration is the same on discrete YM2612 and ASIC-integrated chip


Now, this also explains why Hellfire music is slower on MD2 with ASIC-integrated FM: as mentioned before, the game spams the FM chip with writes while waiting for BUSY flag to be cleared between each of them. The problem is that they made a mistake in the sound engine as they are reading FM status from the same port they write the data to (i.e port 1 or port 2) so the result is that when run on MD1 (which is likely the model it was developed for), the BUSY flag always appears to be cleared and there isn't enough delay between each writes. This is the reason there is a lot of unnecessary writes being done as they likely noticed there was a problem with their code and they "fixed" it by repeating the same writes multiple times.

When run on MD2, the BUSY flag is read correctly and the code this time waits correctly between each write. But since there are a lot of repeated writes because of the workaround implemented for MD1 hardware, it results in slower music.

TmEE co.(TM)
Very interested
Posts: 2440
Joined: Tue Dec 05, 2006 1:37 pm
Location: Estonia, Rapla City
Contact:

Re: New Documentation: An authoritative reference on the YM2612

Post by TmEE co.(TM) » Sun Apr 02, 2017 7:23 pm

I'd like to add that it is possible to write stuff to YM at sample rate (1 write every 144 YM clocks). I made YM run at same clock as Z80 and made code write one sample every 144 cycles and I got all writes intact. It is not possible to get perfect synchronity to YM like that in normal condition, when 68K accesses Z80 side there are always penalty cycles, aswell as when Z80 accesses 68K side.
I'll do more tests around it later on to determine what are the minimum delays per register before data gets missed.
Mida sa loed ? Nagunii aru ei saa ;)
http://www.tmeeco.eu
Files of all broken links and images of mine are found here : http://www.tmeeco.eu/FileDen

Sauraen
Interested
Posts: 49
Joined: Sat Sep 19, 2015 2:44 pm
Contact:

Re: New Documentation: An authoritative reference on the YM2612

Post by Sauraen » Thu Apr 06, 2017 3:14 am

Eke wrote:Any chance you have located the "Key Code" calculation unit ? This would be a simple unit with a few OR/AND/NOT gates, taking highest 4 bits of FNUM value as inputs and outputting a 2-bit value (LSB of 5-bit Key Code value, MSB being BLOCK bits). Since this value is used by both EG and PG unit, it's more likely to be external to these units. Whether it interfaces with original frequency bits or the output of LFO PM calculation would indicate if LFO PM really has an impact on Detune value (since it depends on Key code value) as it is currently done in emulator implementations and if EG rates should be impacted as well.
Your intuition seems to be correct. There's a unit at the beginning of the PG which takes the 3 bits of BLOCK, plus two bits from an "unknown source" (I'll call them "pink bits" because I happened to color their wires pink on my annotated chip image), delays them by 1 or 2 cycles, sends the BLOCK bits into the PG, and sends the delayed 5 bits into the PG and the EG. I traced these two pink bits back to a beginning somewhere in the logic for the channel parameter shift registers, and with a little more tracing they are definitely being computed from the top 4 bits of the raw FNUM value, pre-LFO. (By the way, this value is the FNUM for the current operator, including the cases of the operators of voice 3 with their own frequencies.) So unless the key code value is modified later (i.e. the PG's copy of the key code is modified before being used, but I don't think this is happening), it's not affected by the LFO at all.
Eke wrote:I just don't understand the extra bit precision added in MAME implementation: from the chip hardware point of view, this would mean that this unit would takes 11 bits of original Frequency (FNUM) value but output 12 bits and that frequency input of PG unit is actually 12 bit or more, not 11 (with MSBs corresponding to original FNUM and LSB being forced to zero when no LFO modulation is applied) .
Another solution is that LFO offset value is calculated as 7 bit (6.1 fixed point) value and the lowest bit is discarded when adding to FNUM value.
Would it be possible to confirm or deny this from looking at the die shot ?
Not so lucky here. 11 bits enter the LFO multiplier thing, 12 bits leave and go to the PG. In fact, the first unit in the PG (unknown function) where these 12 bits enter is actually 16 bits wide: the three LSBs are zero, then the 12 bits from the LFO, then the MSB is also zero. This is followed by another unit which actually has a 17-bit output. These two units are extremely compact custom logic and I can't quite tell their function, but they appear to be shifters; the second one appears to shift right four bits depending on bit 2 of BLOCK, and I think the first one shifts one or two bits based on bits 0 or 1 of BLOCK. The unit after this, which looks like it's adding something (Detune?), is also 17 bits, with no carry-out as far as I can tell. The unit after that is 18 bits and the unit after that is 20 bits, both of which look like more elaborate shifters. Then comes what appears to be the PG state shift register addition and control unit.

Eke
Very interested
Posts: 884
Joined: Wed Feb 28, 2007 2:57 pm
Contact:

Re: New Documentation: An authoritative reference on the YM2612

Post by Eke » Thu Apr 06, 2017 12:38 pm

Sauraen wrote: Your intuition seems to be correct. There's a unit at the beginning of the PG which takes the 3 bits of BLOCK, plus two bits from an "unknown source" (I'll call them "pink bits" because I happened to color their wires pink on my annotated chip image), delays them by 1 or 2 cycles, sends the BLOCK bits into the PG, and sends the delayed 5 bits into the PG and the EG. I traced these two pink bits back to a beginning somewhere in the logic for the channel parameter shift registers, and with a little more tracing they are definitely being computed from the top 4 bits of the raw FNUM value, pre-LFO. (By the way, this value is the FNUM for the current operator, including the cases of the operators of voice 3 with their own frequencies.) So unless the key code value is modified later (i.e. the PG's copy of the key code is modified before being used, but I don't think this is happening), it's not affected by the LFO at all.
Thanks a lot for confirming that. As said previously, this wasn't very logical to have key code impacted by LFO when calculating detune but not when calculating EG rates. Seems much more logical this way.

On a side note, do you happen to have this annotated chip image available somewhere, I couldn't find it back in the thread?
Sauraen wrote:Not so lucky here. 11 bits enter the LFO multiplier thing, 12 bits leave and go to the PG.
At least, it seems to confirm LFO adds a bit of precision to FNUM 11-bit input, just like Jarek's implementation in MAME is doing.
Sauraen wrote:In fact, the first unit in the PG (unknown function) where these 12 bits enter is actually 16 bits wide: the three LSBs are zero, then the 12 bits from the LFO, then the MSB is also zero. This is followed by another unit which actually has a 17-bit output. These two units are extremely compact custom logic and I can't quite tell their function, but they appear to be shifters; the second one appears to shift right four bits depending on bit 2 of BLOCK, and I think the first one shifts one or two bits based on bits 0 or 1 of BLOCK.
Yes, I was able to see that as well on the die shot, 16-bit input with only 12-bit in the middle holding value, then converted to 17-bit but I couldn't tell where these 12-bit were coming from so thanks for confirming it comes directly out of LFO PM unit.

I guess this is a way of preshifting frequency value before doing BLOCK right-shifting.
In MAME implementation, this is done like that: (freqnum << 5) >> (7 - BLOCK) where freqnum = (FNUM<<1) + lfo_offset (12-bit adjusted FNUM value after LFO).

Here it seems like it first does something equivalent to doing (freqnum << 3) then does BLOCK shifting in multiple passes
To get equivalent formula as above implementation, it would be something like that:
1) 1-bit right-shifting if BLOCK bit 0 is cleared
2) 2-bit left-shifting if BLOCK bit 1 is set
3) 4-bit right-shifting if BLOCK bit 2 is cleared
The unit after this, which looks like it's adding something (Detune?), is also 17 bits, with no carry-out as far as I can tell.
Yes, quite likely. Detune addition result can indeed overflow and output is 17 bit
The unit after that is 18 bits and the unit after that is 20 bits, both of which look like more elaborate shifters.
This would be the multiplication unit, not sure why it is 18-bit but it has to handle the case where MUL=0 ( x 1/2) so there must be some pre-shifthing or post-shifting involved.

Sauraen
Interested
Posts: 49
Joined: Sat Sep 19, 2015 2:44 pm
Contact:

Re: New Documentation: An authoritative reference on the YM2612

Post by Sauraen » Thu Apr 06, 2017 3:23 pm

Eke wrote:On a side note, do you happen to have this annotated chip image available somewhere, I couldn't find it back in the thread?
I've posted small segments from it on occasion, but no. It's a 1.6 GB GIMP image with 6 layers or so (just different types of signals, not broken out into the actual chip layers), and it uses 6 GB of RAM to be loaded for editing. Maybe when I'm "done" (whatever that means) I can upload the flattened result somewhere, but even exporting it as a JPG would be ~200 MB.
Eke wrote:I guess this is a way of preshifting frequency value before doing BLOCK right-shifting.
As you know, hardware doesn't care how a value is aligned, just how many bits it contains.
Eke wrote:This would be the multiplication unit, not sure why it is 18-bit but it has to handle the case where MUL=0 ( x 1/2) so there must be some pre-shifthing or post-shifting involved.
I think both of the units I mentioned together are the multiplication unit, though I'm still not quite sure--it doesn't look like it has the adders, just the shifters, though maybe the adders are present in the subsequent unit.

jotego
Interested
Posts: 22
Joined: Sat Jan 28, 2017 8:30 am
Location: Valencia (Spain)
Contact:

Re: New Documentation: An authoritative reference on the YM2612

Post by jotego » Mon Apr 17, 2017 9:31 pm

I am glad to see the thread still alive. We do not know everything about the YM2612 yet!
Eke wrote: About this one, I've run a few tests on VA4 MD1 (with discrete YM2612) and VA0 MD2 (with 315-5660 ASIC) regarding the BUSY flag and here is what I figured or confirmed from Sauraen die shot analysis :

1) BUSY flag can only be read from port 0 (A0=A1=0) on discrete YM2612 while it can be read from any port on ASIC-integrated version
2) BUSY flag is only set on DATA port writes (A0=1)
2) BUSY flag duration is constant and does not depend on the written register
3) BUSY flag duration seems to be 32 internal clocks (32*6 68k clocks): this was tested with my emulator against real hardware using a test program that counts number of status read with BUSY flag being set. Results were identical to real hardware on emulator with busy wait set to 32*6 68k cycles, other values (like 24, 48 or even 30) give too much different values than real hardware
4) BUSY flag duration is the same on discrete YM2612 and ASIC-integrated chip
I actually entered the forum today to talk about this. I didn't expect to find this post. Well, what I am going to say kind of contradicts this information. I spent the last 5 days working on JT51, adding lessons learnt from the JT12 experience. One tool I made when developing JT51 was a sample accurate operator in C. With some parameters about the operator configuration, the tool produces an output on screen with the exact same sequence as the real YM2151. It is verified against literally thousands of chip measurements as I automated the measurement process to take samples for each set of register values. The YM2151 has digital output, which YM2612 doesn't, so that work is rather easy. (In my case it wasn't because the PCB I made for this was a disaster!)

Anyway, on my v1.0 implementation of JT51 I had to make a lot of tricks to get the exact output from the Verilog RTL code. The problem is that one operator, which is called M2 in YM2151 and is equivalent to S3 in YM2612 seems to be getting output in the middle of two samples. Suppose you use algorithm 7 (all operators are just summed at the output. It is supposed to be good for organ sound emulation). Then you would expect samples to have the sum of S1,S3,S2,S4 (or M1,M2,C1,C2 in YM2151 documentation). However, as the keyon is set for all operator on a channel, M2 (S3) gets output one sample ahead of time. This is issue #1.

Then, if you go for an algorithm where one operator modulates another, like S1->S2 in alg. 6 (called M1->C1 in YM2151) then if keyon did not occur first on S1 and then on S2, the modulation effect of S1 over S2 would be lost for the first sample. However, in all my measurements I found consistently that modulation was present from the very first sample. I took measurements primarily on channel 0 but I did take many measurements on the other channels to verify whether they were equal or not. They were. (NB: I could not measure channel 7. Only verified 0-6). So keyon order on the operators is important. This is issue #2.

So these two issues seem to point at two hardware blocks:
Issue #1 -> related to when the accumulator unit resets the output sum, produces and output and starts to sum up the next sample.
Issue #2 -> related to when they keyon information gets processed.

I fixed these issues on v1.0 with ugly verilog. A hack. I wanted to take Sauraen's work on the operator of YM2612 and take it to substitute my operator unit in JT51. Although the YM2151 had 8 channels instead of the 6 of YM2612 so I had to make some arrangements. As expected, -after many hours of work- the operator did actually work. When comparing outputs from Verilog to outputs from my C emulator -which is exact in value and sample time to the original- I found that issue #1 was manifesting for some algorithms (but not all). So M2 being off by one sample was creating an error in my regression tests. When I look at the waveforms the error signals are virtually identical and the difference is likely not to be hearable at all. But you know what happens to engineers, we want it perfect so eventually I will go back to issue #1 in the future.

But as for issue #2, I found that a way to fix it that did not call for a complex implementation was to run the operator update not as a constant time, but at a variable. This is where the busy signal comes in. Let me explain:

When the user writes to a register, the internal circular shift registers may be at any point. The easy way to get the new data in is to wait for the shift register to output the old and then feed in the new one. If the user writes data when register 2/5 (operator 2, channel 5) is being output then it just waits (busy on) until the same register 2/5 gets output again. During that time the required register has been traversed too and so the data has been written. I made it that way in JT51 and then in JT12 too. But I think my JT12 implementation may take only 24 cycles instead of 32... I should check.

The problem with this is the keyon register. That register must be another circular shift register that holds the key-on state of each operator. But if it was updated as explained on the previous paragraph then sometimes S2 would get the keyon before S1 does and then the first sample of S2 would not contain modulation from S1 as S1 is still in the keyoff state. This is the key aspect here: keyon must occur in order for the operators. At least in YM2151 the keyon for M1 (S1) always occurs before that of C1 (S2) and the modulation is always present. Was it the same for YM2612? I wonder.

Note that in order to make S1 keyon happen ahead of time of S2 the register update strategy cannot take a fixed number of cycles. Not at least for keyon writes. This is something I would be very grateful to Eke if he could verify on the real part. Eke has shown that it takes 32 cycles to get the busy signal cleared. But... Does it take 32 cycles too for keyon writes?

By the way, you may be thinking that it is not a big deal to update the keyon register in a different way. But when dealing with actual logic gates, the design can get very complex and large easily. The update way I propose is very economical in terms of circuit area and gate count so I bet the original one is like that.

I hope I have not lost you with this long post. I know it is hard to follow. Probably even with illustrations it would be hard!

Post Reply