New Documentation: An authoritative reference on the YM2612

neozeed · Post by **neozeed** » Fri Apr 15, 2016 2:49 am

I'm sure everyone's seen this, but the ym2151.c in mame is now GPL!

Commits on Jan 21, 2016
@mmicko
Setting GPL license for Jarek Burczynski, by his request (nw)
https://github.com/mamedev/mame/commit/ ... 4663ba3ba9

Code: Select all

// license:GPL-2.0+
// copyright-holders:Jarek Burczynski
/*****************************************************************************
*
*   Yamaha YM2151 driver (version 2.150 final beta)
*
******************************************************************************/

Eke · Post by **Eke** » Mon Apr 18, 2016 1:38 pm

Sauraen wrote:I
I drew out some diagrams and these are the final results. There's no concept of a complete cycle for updating a channel, but if such a concept is artificially introduced, the four operators use the following values for their modulation in "Cycle 1":
Op 1: Op 1 Cycle 0, Op 1 Cycle -1
Op 2: Op 1 Cycle 1
Op 3: Op 1 Cycle 0, Op 2 Cycle 0
Op 4: Op 1 Cycle 1, Op 2 Cycle 0, Op 3 Cycle 1
There is at least one algorithm in each case that uses each of these sources, though usually most are unusued (i.e. there's no algorithm where ops 1, 2, and 3 all modulate 4).

This can also be replicated simply by processing the operators in the order 3, 1, 4, 2, and in every case using the most recent output of the needed operator. (So that when you process op 3, you're using the results of ops 1 and 2 from the previous cycle; and when you process op 4, you're using the most recent 1 and 3 but old 2.)

To be clear, the operator numbering I'm using is such that in Algorithm 0 (chain of operators), they are in the order 1-2-3-4.

The "two operator units in parallel" claim that Steve Snake made--that the chip processes operators 1 and 3 at the same time, then it processes 2 and 4 at the same time--is functionally correct, in so far as it produces the same relationships between operators, though it's not strictly how the chip behaves.

I didn't noticed it initially but it seems MAME implementation is a little bit different.

If you look at current channel algorithm calculation code:

https://bitbucket.org/eke/genesis-plus- ... 2612.c-843

Code: Select all

INLINE void setup_connection( FM_CH *CH, int ch )
{
  INT32 *carrier = &out_fm[ch];

  INT32 **om1 = &CH->connect1;
  INT32 **om2 = &CH->connect3;
  INT32 **oc1 = &CH->connect2;

  INT32 **memc = &CH->mem_connect;

  switch( CH->ALGO ){
    case 0:
      /* M1---C1---MEM---M2---C2---OUT */
      *om1 = &c1;
      *oc1 = &mem;
      *om2 = &c2;
      *memc= &m2;
      break;

   case 1:
      /* M1------+-MEM---M2---C2---OUT */
      /*      C1-+                     */
      *om1 = &mem;
      *oc1 = &mem;
      *om2 = &c2;
      *memc= &m2;
      break;
    case 2:
      /* M1-----------------+-C2---OUT */
      /*      C1---MEM---M2-+          */
      *om1 = &c2;
      *oc1 = &mem;
      *om2 = &c2;
      *memc= &m2;
      break;
    case 3:
      /* M1---C1---MEM------+-C2---OUT */
      /*                 M2-+          */
      *om1 = &c1;
      *oc1 = &mem;
      *om2 = &c2;
      *memc= &c2;
      break;
    case 4:
      /* M1---C1-+-OUT */
      /* M2---C2-+     */
      /* MEM: not used */
      *om1 = &c1;
      *oc1 = carrier;
      *om2 = &c2;
      *memc= &mem;  /* store it anywhere where it will not be used */
      break;
    case 5:
      /*    +----C1----+     */
      /* M1-+-MEM---M2-+-OUT */
      /*    +----C2----+     */
      *om1 = 0;  /* special mark */
      *oc1 = carrier;
      *om2 = carrier;
      *memc= &m2;
      break;
    case 6:
      /* M1---C1-+     */
      /*      M2-+-OUT */
      /*      C2-+     */
      /* MEM: not used */
      *om1 = &c1;
      *oc1 = carrier;
      *om2 = carrier;
      *memc= &mem;  /* store it anywhere where it will not be used */
      break;
    case 7:
      /* M1-+     */
      /* C1-+-OUT */
      /* M2-+     */
      /* C2-+     */
      /* MEM: not used*/
      *om1 = carrier;
      *oc1 = carrier;
      *om2 = carrier;
      *memc= &mem;  /* store it anywhere where it will not be used */
      break;
  }

  CH->connect4 = carrier;
}

&
https://bitbucket.org/eke/genesis-plus- ... 612.c-1423

Code: Select all

unsigned int eg_out = volume_calc(&CH->SLOT[SLOT1]);

    m2 = c1 = c2 = mem = 0;

    *CH->mem_connect = CH->mem_value;  /* restore delayed sample (MEM) value to m2 or c2 */
    {
      INT32 out = CH->op1_out[0] + CH->op1_out[1];
      CH->op1_out[0] = CH->op1_out[1];

      if( !CH->connect1 ){
        /* algorithm 5  */
        mem = c1 = c2 = CH->op1_out[0];
      }else{
        /* other algorithms */
        *CH->connect1 += CH->op1_out[0];
      }

      CH->op1_out[1] = 0;
      if( eg_out < ENV_QUIET )  /* SLOT 1 */
      {
        if (!CH->FB)
          out=0;

        CH->op1_out[1] = op_calc1(CH->SLOT[SLOT1].phase, eg_out, (out<<CH->FB) );
      }
    }

    eg_out = volume_calc(&CH->SLOT[SLOT3]);
    if( eg_out < ENV_QUIET )    /* SLOT 3 */
      *CH->connect3 += op_calc(CH->SLOT[SLOT3].phase, eg_out, m2);

    eg_out = volume_calc(&CH->SLOT[SLOT2]);
    if( eg_out < ENV_QUIET )    /* SLOT 2 */
    *CH->connect2 += op_calc(CH->SLOT[SLOT2].phase, eg_out, c1);

    eg_out = volume_calc(&CH->SLOT[SLOT4]);
    if( eg_out < ENV_QUIET )    /* SLOT 4 */
      *CH->connect4 += op_calc(CH->SLOT[SLOT4].phase, eg_out, c2);
      
    /* store current MEM */
    CH->mem_value = mem;

I am not sure how much you can read C code but, for algorithm 0, modulators being used by each operator (OP1,OP2,OP3,OP4) at cycle n will be :

OP1 (n-2) + OP1(n-1)) modulating OP1 (n)
OP1 (n-1) modulating OP2 (n)
OP2 (n-1) modulating OP3 (n)
OP3 (n) modulating OP4 (n)

The one in bold in particular contradicts your observations.

More generally, the way it is implemented, OP1 will never directly modulate OP2 or OP4 on the same cycle but always modulate them with one cycle (sample) delay. More over, in case of algo 1 or algo5, OP3(n) is actually modulated by OP1(n-2) !

I am not sure if this is an error in MAME implementation when they introduced the MEM register (used to emulate the cycle delay you observed for OP3 modulator/s) in regard to OP1 feedback registers or if this is correct behavior but it seems a little weird... What do you think?

Sauraen · Post by **Sauraen** » Tue Apr 19, 2016 3:33 am

I took the bait and started an analysis of the control unit that generates the signals specifying which values are added during which cycles (the op_algorithm_ctl signal in my VHDL version of the operator unit). Here's the results so far:

All these observations are directly from the YM2203, not the YM2612, but the results should be equivalent.

We already know that the chip processes one operator from each channel per cycle. We don't yet know the relative phase between the operators, that is whether the chip processes operator 1 from all channels before moving on to the next operator, or whether it interleaves them in some fashion (for various reasons I think the latter is much more likely). But within one channel, it definitely processes one operator, than operators from two other channels, than the next operator from the first channel again. (Remember this is on the YM2203 with three channels.) Also remember that this is from the point of view of any point in the pipeline, and that each operator takes 6 cycles to go through the whole pipeline. The result is that the operators are evaluated in the order 1-3-2-4 OR 1-4-2-3 (don't know yet which one) [EDIT: later confirmed to be 1324].

For this next bit, keep in mind that I'm considering the "first" step of evaluation to be the step of summing up to 2 operator output values to produce the feedback/feedforward phase for the next operator. Therefore:

when operator 1 begins evaluation, the result from operator 2 is ready (op_result_internal is the result of the most recent evaluation of operator 2)
when operator 3 begins evaluation, op_result_internal is from operator 4
when operator 2 begins evaluation, op_result_internal is from operator 1
when operator 4 begins evaluation, op_result_internal is from operator 3

So from the control unit that I've analyzed, I've made a handwritten table of the values of x and y (the operator values being added) in the case of each algorithm and each operator phase. I won't reproduce it here because it basically amounts to the same information as the diagrams of the operator connections in the algorithms, but here's the key points. Remember the operator unit stores 3 values, "Old 1" (two evaluations ago), "1" (most recent evaluation of operator 1), and "2" (most recent evaluation of operator 2).

In the phase where op_result_internal is 2 (and we're beginning to evaluate operator 1), X is always "Old 1" and Y is always "1". This matches all previous results.
In the phase where op_result_internal is 4 (and we're beginning to evaluate operator 3), sometimes X is "2" and sometimes Y is "1", for the appropriate algorithms. That is, operator 3 is always calculated from the most recent results of operators 1 and 2, but those will be the results received 1 and 3 phases ago and therefore BEGUN 3 and 5 phases ago. (Which is which depends on whether the overall order is 1324 or 1423. [EDIT: later confirmed to be 1324])
In the phase where op_result_internal is 1 (and we're beginning to evaluate operator 2), sometimes Y is op_result_internal (for the appropriate algorithms). This is the most recently evaluated result from operator 1, just finishing now; also available (about to be overwritten) are the previous two results from operator 1, but these are not used (consider this confirmed). That is, operator 2 is always calculated from the most recent result of operator 1, the one whose calculation was begun 2 phases ago.
Finally, in the phase where op_result_internal is 3 (and we're beginning to evaluate operator 4), "1", "2", and op_result_internal (i.e. 3) are all used as necessary.

The overall conclusion is that, yes, in all cases an operator's value is calculated from the most recent result of the required operator (except of course for the feedback, which intentionally uses the last two values of operator 1). However, this most recent result could have been received up to 3 phases ago and therefore begun evaluating up to 5 phases ago. Therefore, I retract my previous statement that evaluating the operators in some other order (3142 or something) will produce the correct result--it won't. If every operator is evaluated all at once (how it would normally be done in code rather than in VLSI), there has to be storage for up to 5 phases.

This is how I would organize the code, assuming that the evaluation order is actually 1324 [EDIT: later confirmed]. It's based on exactly what the chip does in the same order:

Each operator has a variable for its most recent output, op[n].output. (n is 1-indexed here for consistency with the above)
Three other variables are available: outOld1, out1, out2 (probably not named these things!)
Evaluate operator 1 using outOld1 and out1. Also copy op[2].output to out2.
Evaluate operator 3 using out1 and out2.
Evaluate operator 2 using op[1].output. Also copy out1 to outOld1 and op[1].output to out1.
Evaluate operator 4 using out1, out2, and op[3].output.

If it turns out that the actual order is 1423, simply switch the steps for evaluating operators 3 and 4.

THEREFORE, I think the MEM thing is wrong, or at least your description of it is. (I certainly can read C, but that's about the least readable C I've ever seen!) No matter how you cut it, operator 2 gets evaluated with the most recent output from operator 1, and similarly operator 4 uses the most recent output from operator 3. Those are clear. The tricky part is that assuming it's 1324 [EDIT: later confirmed], operator 4 is evaluated with the result of operator 2 which was calculated 5 phases ago, not the one from last phase (on the chip it's still not done yet). Similarly, operator 3 is evaluated with the 5-phase-old operator 1 value, not the one from 1 phase ago.

Eke · Post by **Eke** » Tue Apr 19, 2016 4:50 pm

Ouch, I've read your post multiple time but this is quite confusing, I guess because I'm not familiar with some terms you are using or that I am missing some basic concepts you figured about how the operator unit is functioning in hardware.

Sauraen wrote: The result is that the operators are evaluated in the order 1-3-2-4 OR 1-4-2-3 (don't know yet which one).

For this next bit, keep in mind that I'm considering the "first" step of evaluation to be the step of summing up to 2 operator output values to produce the feedback/feedforward phase for the next operator. Therefore:

when operator 1 begins evaluation, the result from operator 2 is ready (op_result_internal is the result of the most recent evaluation of operator 2)

when operator 3 begins evaluation, op_result_internal is from operator 4

when operator 2 begins evaluation, op_result_internal is from operator 1

when operator 4 begins evaluation, op_result_internal is from operator 3

What do you mean by "evaluation of operator X " exactly ? Operator output calculation or its use as modulation input for another operator?
And what exactly is "op_result_internal"? operator output or modulation input?
It's not very clear to me what you mean by "when operator X begins evaluation", "op_result is from operator Y" and I don't quite follow what this is implying. Is that related to operator execution order or something else?

Sauraen wrote:So from the control unit that I've analyzed, I've made a handwritten table of the values of x and y (the operator values being added) in the case of each algorithm and each operator phase. I won't reproduce it here because it basically amounts to the same information as the diagrams of the operator connections in the algorithms, but here's the key points. Remember the operator unit stores 3 values, "Old 1" (two evaluations ago), "1" (most recent evaluation of operator 1), and "2" (most recent evaluation of operator 2).

In the phase where op_result_internal is 2 (and we're beginning to evaluate operator 1), X is always "Old 1" and Y is always "1". This matches all previous results.

In the phase where op_result_internal is 4 (and we're beginning to evaluate operator 3), sometimes X is "2" and sometimes Y is "1", for the appropriate algorithms. That is, operator 3 is always calculated from the most recent results of operators 1 and 2, but those will be the results received 1 and 3 phases ago and therefore BEGUN 3 and 5 phases ago. (Which is which depends on whether the overall order is 1324 or 1423.)

In the phase where op_result_internal is 1 (and we're beginning to evaluate operator 2), sometimes Y is op_result_internal (for the appropriate algorithms). This is the most recently evaluated result from operator 1, just finishing now; also available (about to be overwritten) are the previous two results from operator 1, but these are not used (consider this confirmed). That is, operator 2 is always calculated from the most recent result of operator 1, the one whose calculation was begun 2 phases ago.

Finally, in the phase where op_result_internal is 3 (and we're beginning to evaluate operator 4), "1", "2", and op_result_internal (i.e. 3) are all used as necessary.

Again, I don't get what you mean by "op_result_internal is X / we're beginning to evaluate operator Y) ", seems like 1/2 and 3/4 go in pair for some reason I don't understand. Also, what is that "phase" you are talking about?

Sauraen wrote:The overall conclusion is that, yes, in all cases an operator's value is calculated from the most recent result of the required operator (except of course for the feedback, which intentionally uses the last two values of operator 1). However, this most recent result could have been received up to 3 phases ago and therefore begun evaluating up to 5 phases ago.

What do you mean by "received'"? I don't understand what that last sentence is implying either.

Sauraen wrote:Therefore, I retract my previous statement that evaluating the operators in some other order (3142 or something) will produce the correct result--it won't. If every operator is evaluated all at once (how it would normally be done in code rather than in VLSI), there has to be storage for up to 5 phases.

This is how I would organize the code, assuming that the evaluation order is actually 1324. It's based on exactly what the chip does in the same order:

Each operator has a variable for its most recent output, op[n].output. (n is 1-indexed here for consistency with the above)

Three other variables are available: outOld1, out1, out2 (probably not named these things!)

Evaluate operator 1 using outOld1 and out1. Also copy op[2].output to out2.

Evaluate operator 3 using out1 and out2.

Evaluate operator 2 using op[1].output. Also copy out1 to outOld1 and op[1].output to out1.

Evaluate operator 4 using out1, out2, and op[3].output.

If it turns out that the actual order is 1423, simply switch the steps for evaluating operators 3 and 4.

I don't understand, evaluating operators in order 3142 could give the exact same result as above , with:
[*] op3 using out1 and out2
[*] op1 using outOld1 and out1 then copy out1 to outOld1 and op[1].output to out1.
[*] op4 using out1 (same as op[1].output at this stage), out2 and op[3].output
[*] op2 using out1 (same as op[1].output at this stage), then copy op[2].output to out2.

It could even be 1342 with:
[*] op1 using outOld1 and out1 then copy out1 to outOld1 and op output to out1.
[*] op3 using outOld1 and out2
[*] op4 using out1 (same as op[1].output at this stage), out2 and previous op ouput (corresponding to op[3].output at this stage)
[*] op2 using out1 (same as op[1].output at this stage), then copy op output to out2.

with the benefit of not needing to store the output of each operator but only the last calculated output, by cleverly using the other three existing registers keeping old op1 / op2 outputs.

In any case, unless I'm missing something, I don't see the difference with what you described in your initial post.

Sauraen wrote:THEREFORE, I think the MEM thing is wrong, or at least your description of it is. (I certainly can read C, but that's about the least readable C I've ever seen!)

I agree Jarek's implementation can be quite complex to read, it took me quite some time to get familiarized with it but I can assure you the MEM thing does exactly what you are describing (i.e it corresponds to the modulation input used for op3, which is either out1, out2 or the addition of both).

I can explain it if you (or anyone) are interested.

Basically, in "setup_connection" function, memory pointers are used to connect (depending on algorithm):

- the output of op1 (CH->connect1) to either intermediate register (mem), modulation inputs of op2 (c1) and op4 (c2) or accumulator (out_fm).
- the output of op2 (CH->connect2) to either intermediate register (mem) or accumulator (out_fm).
- the output of the intermediate register (CH->mem_connect) to modulation input of op3 (m2) or op4 (c2)
- the output of op3 (CH->connect3) to either modulation inputs of op4 (c2) or accumulator (out_fm).

NB: the output of op4 (CH->connect4) is always connected to accumulator (out_fm) by default

Next, in "chan_calc" function, the intermediate register output (CH->mem_connect) is updated with saved mem value and operator outputs are calculated sequentially and added o to their respective CH->connect, with the exception of op1. For some reason, the output of op1 (CH->connect1) is updated with out1 value (previous op1 output) BEFORE op1 output is effectively calculated and copied to out1. The result is that it works the same as what you describe with the exception that:
- op2 & op4 are using out1 instead of op[1].output
- op3 is using outOld1 instead of out1

As I said, I'm not sure if this is intended or a mistake made when dealing with out1 and outOld1 (CH->op1_out[1] and CH->op1_out[0] in MAME code) .

Sauraen wrote:No matter how you cut it, operator 2 gets evaluated with the most recent output from operator 1, and similarly operator 4 uses the most recent output from operator 3. Those are clear. The tricky part is that assuming it's 1324, operator 4 is evaluated with the result of operator 2 which was calculated 5 phases ago, not the one from last phase (on the chip it's still not done yet). Similarly, operator 3 is evaluated with the 5-phase-old operator 1 value, not the one from 1 phase ago. (If it's actually 1423, the only difference is that now operator 4 uses the old operator 1, and operator 3 uses the old operator 2.)

From what you said before, in both cases, op3 uses the old op2 output. Or you mean that with 1423, it would use new op2 output?

Sauraen · Post by **Sauraen** » Tue Apr 19, 2016 11:36 pm

Eke wrote:Ouch

I don't have time right now for a giant answer, but I think this is what you're missing. Please also refer to my operator unit VHDL code I linked in the previous post.

The operator unit, like all the other units in the YM2203, is pipelined. It takes six internal clock cycles to compute the output of an operator. But at any given moment, the six pipeline stages are all working on a different operator's data. So for instance, at one instant in the operator unit:

Pipeline stage 1, the creation of the feedback/modulation signal, is processing channel 1 operator 2
Pipeline stage 2, the feedback shifter and adder of the PG phase, is processing channel 3 operator 3
Pipeline stage 3, the logsin table and adder of the EG attenuation, is processing channel 2 operator 3
Pipeline stage 4, the exponential table, is processing channel 1 operator 3
Pipeline stage 5, the floating-point-to-integer conversion, is processing channel 3 operator 1
Pipeline stage 6, which doesn't do any processing, is holding channel 2 operator 1

This example is under two assumptions, first that the operator order is 1324 [EDIT: later confirmed], and second that the chip processes the same operator from each channel before going to the next operator (it processes operator 1 of all three channels, then operator 3 of all three channels, etc.--[EDIT: later confirmed]).

Because of this pipeline, in any given stage, every 3 clock cycles an operator from our chosen channel is being processed. But the previous and next stages are at the same time processing different operators from different channels.

So therefore, at stage 1 above (which produces the feedback/modulation signal and is critical for answering the question of which cycle's value is used),

in clock cycle 0, channel 1 operator 1 is being processed
in clock cycle 3, channel 1 operator 3 is being processed
in clock cycle 6, channel 1 operator 2 is being processed
in clock cycle 9, channel 1 operator 4 is being processed

with the other channels' operators being processed in the intervening clock cycles.

This is the origin of my term "phase"--we can ignore the other channels for now and say there are 4 phases, during each one a different operator is being processed.

Since it takes 6 cycles (2 phases) for an operator's value to finish being processed, in the above example operator 1 was begun on cycle 0, which means it's done and its result is available on cycle 6, which is also when operator 2 is being processed (again just by this one pipeline stage). So the question was whether at that time, the unit takes the just-finished value of operator 1, or uses the previously-stored value of operator 1, when calculating the modulation value which will be used for operator 2. And the answer, which I confirmed from the control unit, is that it takes the just-finished value of operator 1, not the old (and definitely not the oldold) value.

Sik · Post by **Sik** » Wed Apr 20, 2016 12:20 am

Is it me or are you all overcomplicating things? Can't you just do it like the YM2612 and just process one operator at a time? (whether a new or an old value gets used will be implicitly handled by just doing that)

Mask of Destiny · Post by **Mask of Destiny** » Wed Apr 20, 2016 1:13 am

Sik wrote:Is it me or are you all overcomplicating things? Can't you just do it like the YM2612 and just process one operator at a time? (whether a new or an old value gets used will be implicitly handled by just doing that)

So the point that Sauraen is making is that the YM2612 doesn't exactly process one operator at a time (well assuming that it behaves like the YM2203 that it is based on anyway). Any given stage in the operator pipeline evaluates a single operator at a time, but the stages are operating in parallel and won't be processing the same operator. Depending on the order of evaluation it's possible that an "old" value is used when modulating due to the pipeline latency.

That said, if it iterates through an operator for all the channels first (a big assumption at the moment) then it's not going to be an issue on the YM2612, regardless of operator evaluation order, since there are as many channels as there are pipeline stages. Of course, it's conceivable that it processes all the operators for part 1 before moving on to the part 2 operators in which case the behavior would be the same as the YM2203.

Sauraen · Post by **Sauraen** » Wed Apr 20, 2016 2:58 am

Sik wrote:Is it me or are you all overcomplicating things?

Yes, we are, in so far as I can't even think of a test sound that would sound audibly different whether modulators were delayed by a cycle or not.

Sik wrote:Can't you just do it like the YM2612 and just process one operator at a time?

Mask of Destiny wrote:the YM2612 doesn't exactly process one operator at a time

Exactly. However...

Mask of Destiny wrote:if it iterates through an operator for all the channels first (a big assumption at the moment) then it's not going to be an issue on the YM2612, regardless of operator evaluation order, since there are as many channels as there are pipeline stages.

Nope. I purposely didn't include any discussion of the YM2612 in the last post (though I have in past posts, plus it's covered in the VHDL model), because it's in fact MORE complicated than the YM2203, not less. For starters, the YM2612 pipeline contains a 6-stage-long shift register with the sole purpose of delaying the operator evaluation so that it's actually 12 cycles, not 6. This makes sense, because if we double the number of channels and also double the number of pipeline stages, the algorithm will be the same as in the YM2203. [EDIT: the following is not true] I am guessing, though I have not confirmed this in any way, that the chip evaluates the channels in the order 1-4-2-5-3-6; that is, it evaluates channel 1, then channel 1 in the high bank, then channel 2, etc.[EDIT: later confirmed to be 123456]

So if you want to consider this all from the YM2612, in my last post make the cycles listed 0-6-12-18 instead of 0-3-6-9. Still, the result from operator 2 is available when operator 1 is being calculated, and so on--the logic is exactly the same.

r57shell · Post by **r57shell** » Thu Apr 21, 2016 1:32 pm

Anyway you overcomplicating things.
Sorry if I'm being captian obvious

Let say
1) We have 3 workers.
2) Each has different profession. So, he can do only single type of job.
3) They can't do several jobs at once.
4) They do only single type product.
5) Time to make any kind of job are equal between.
So, making this product, requires steps:
First worker (A) don't need anything, and can create, starting product = completion state 1
Starting product can be updated to next completion state only by second worker (B) = completion state 2
And last completion state is made by last worker (C) = finished product = completion state 3
In time zero - we don't have any product. All workers free, but only A can do his job.

Code: Select all

A B C
1 X X

where A B C - workers, 1 - id of product, X is - slacking (does nothing)
Next time, A can start make next, B has to do... so whole table will looks like this:

Code: Select all

A B C
1 X X
2 1 X
3 2 1
4 3 2
5 4 3
....
500 499 498
...

Idea simple: all workers "update" product to next completion state. And, you don't need to have 6 workers for each kind of job (6*3), to make 6 products.

Here is same, difference is, products are remade. Channels results remade.
I'll use term Channel, for 6 FM channels, and Operators, for 4 operators.
(just to make it strict, at least for myself, because I was struggling to remember this stuff)
So, from scratch I suppose it should look like:

Code: Select all

But taking into consideration, that there is 6 channels, 4 operators.
So, we call "product" as channel output. Number of workers = number of steps to update one operator. One important addition in this case, that each worker must do his job 4 times (must do his job for each operator)
Let say, workers again 3, then sequence will look like

Code: Select all

A   B   C
11 64 54
21 11 64
31 21 11
12 31 21
22 12 31
32 22 12
13 32 22
23 13 32
33 23 13
14 33 23
24 14 33
34 24 14
41 34 24
21 41 34
...
54 34 24
64 54 34
...repeat...

where first digit is - channel id, second is operator id processed. They can place it in other order... For example... whole cycle of channels first, then operators.
Anyway, each channel update time will take only 6*4 steps, and will do all jobs at once (all workers are busy).
But what I consider, should be taken from this example, is that result is not dependent to order of channels.
So, you can think as about processing single channel. and only order of updating operators, and completion steps (you may say phase), does matter.

Summary, I think, process should be well explainable by defining N steps. (jobs in previous examples), and their order for operators.
For previous example, it is representable as array: A1 B1 C1 A2 B2 C2 A3 B3 C3 A4 B4 C4 (job, operator)
To verify that, you may replace all numbers starting not from 1, to XX, and see, that processing of single channel does exactly in this way.
So, in my theory, all chanels should be indpendent.

Sauraen · Post by **Sauraen** » Thu Apr 21, 2016 10:59 pm

r57shell wrote:all chanels should be indpendent

Yes, I wasn't saying they weren't. Obviously parameters of one channel don't affect the others (at least not until we get to the DAC...). The order of evaluation of the channels won't make any difference to whether a software implementation produces a sample-accurate result or not. It does make a difference to my VHDL code, though--otherwise the channels will all work, but which channel's registers and values are being used by which units on what cycles will be all mixed up. (E.g. I might have ch3 op4 getting its envelope attenuation value from ch6 op4 or from ch3 op3.)

r57shell · Post by **r57shell** » Thu Apr 21, 2016 11:48 pm

Sauraen wrote:It does make a difference to my VHDL code, though--otherwise the channels will all work, but which channel's registers and values are being used by which units on what cycles will be all mixed up. (E.g. I might have ch3 op4 getting its envelope attenuation value from ch6 op4 or from ch3 op3.)

But it does not happen in YM2612? So, it may happen to your VHDL due to your bug?

I guess you can make automatic tests, that will test, that each phase access to variables only same channel, no?

Sauraen · Post by **Sauraen** » Fri Apr 22, 2016 12:57 am

What I mean is that behind all this crazy pipelining in the YM2612 is a number of control units which each have exactly the correct number of delays in each signal so that all the signals used in each channel are correct. There's no question that this is implemented correctly in the YM2612, and the particular order doesn't affect the sounds, so it's not relevant for a software implementation. However, my VHDL version (of which right now only the operator unit exists) attempts to match the original chip structure in every detail, not just the resulting functionality. So if I have a mistake in any of the control units--which is very likely because I don't actually map every gate one at a time, I learn as much as I need to from the die and make up the rest--the output will be all mixed up and obviously wrong. At this point I'm not even trying to make the VHDL version into a working implementation, I'm more using it to describe my findings in the chip.

Eke · Post by **Eke** » Fri Apr 22, 2016 2:36 pm

Sauraen wrote: I don't have time right now for a giant answer, but I think this is what you're missing.

Thanks, it's a lot more clearer now, I totally missed the pipeline stuff.

I also made a sheet (with configurable channel / operator order) which helped me figuring what you were referring to (see zipped file attached)

: ym2612_pipeline.png (100.89 KiB) Viewed 39963 times

This also helps to understand what you meant when you said it could be either 1324 or 1423.

Sauraen wrote: This example is under two assumptions, first that the operator order is 1324 (it might be 1423), and second that the chip processes the same operator from each channel before going to the next operator (it processes operator 1 of all three channels, then operator 3 of all three channels, etc.--this is probably not correct, it's probably in an interleaved order, but that wouldn't change the discussion below).

I think 1324 makes more sense because of the operator register address order (lower address is Op1, then Op3, Op2, Op4) and the fact operators are designed to work paired initially (a modulator then a carrier).

I guess that the strange channel order and the spacing (4 FM cycles between each channel output, with output remaining available during 1 FM cycle) observed by Nemesis could possibly help figuring the exact channel order as well as how operators are interleaved for each channels by tring different combinations.

cf.this post: viewtopic.php?f=24&t=386&start=369

Sauraen wrote:So the question was whether at that time, the unit takes the just-finished value of operator 1, or uses the previously-stored value of operator 1, when calculating the modulation value which will be used for operator 2. And the answer, which I confirmed from the control unit, is that it takes the just-finished value of operator 1, not the old (and definitely not the oldold) value.

So this confirms there is an error in MAME implementation regarding Op1 output calculation as both Op2 & Op4 always use Op1 output from previous sample (one sample delay) and Op3 always use Op1 output from two samples ago (one sample delay + MEM register delay). This also causes Algorithm7 to output to accumulator/DAC the previous output of Op1 combined with current output of Op2/Op3/Op4, which definitively seems wrong.

So the code below

Code: Select all

 INT32 out = CH->op1_out[0] + CH->op1_out[1];
      CH->op1_out[0] = CH->op1_out[1];

      if( !CH->connect1 ){
        /* algorithm 5  */
        mem = c1 = c2 = CH->op1_out[0];
      }else{
        /* other algorithms */
        *CH->connect1 += CH->op1_out[0];
      }

      CH->op1_out[1] = 0;
      if( eg_out < ENV_QUIET )  /* SLOT 1 */
      {
        if (!CH->FB)
          out=0;

        CH->op1_out[1] = op_calc1(CH->SLOT[SLOT1].phase, eg_out, (out<<CH->FB) );
      }

should be corrected to

Code: Select all

 INT32 out = CH->op1_out[0] + CH->op1_out[1];
      CH->op1_out[0] = CH->op1_out[1];
      
      CH->op1_out[1] = 0;
      if( eg_out < ENV_QUIET )  /* SLOT 1 */
      {
        if (!CH->FB)
          out=0;

        CH->op1_out[1] = op_calc1(CH->SLOT[SLOT1].phase, eg_out, (out<<CH->FB) );
      }
              
      if( !CH->connect1 ){
        /* algorithm 5  */
        mem = c1 = c2 = CH->op1_out[1];
      }else{
        /* other algorithms */
        *CH->connect1 += CH->op1_out[1];

      }

Sauraen wrote:
Sik wrote:Is it me or are you all overcomplicating things?
Yes, we are, in so far as I can't even think of a test sound that would sound audibly different whether modulators were delayed by a cycle or not.

I am not sure, the fact that the delay was implemented in MAME (through the use of MEM value) likely means Jarek was able to notice it while testing the hardware. He wouldn't have used that complicated implementation if it wasn't worth it. By the way, it's one of the missing thing not emulated in Exodus because Nemesis was not 100% sure how this worked so I believe this is still an interesting thing to understand correctly.
Anyway, my initial question was more to figure if there was an error in MAME implementation or not.

Mask of Destiny · Post by **Mask of Destiny** » Fri Apr 22, 2016 8:59 pm

Eke wrote:I am not sure, the fact that the delay was implemented in MAME (through the use of MEM value) likely means Jarek was able to notice it while testing the hardware. He wouldn't have used that complicated implementation if it wasn't worth it.

Wasn't the MAME implementation originally somewhat generic in that could emulate multiple members of the OPN family? Is it possible it's only observable on certain chips it supported?

Eke wrote:By the way, it's one of the missing thing not emulated in Exodus because Nemesis was not 100% sure how this worked so I believe this is still an interesting thing to understand correctly.

I've avoided any emulation of the MEM stuff in BlastEm for similar reasons (though there are still other unrelated things I'm missing). I'd certainly be interested in the actual behavior too.

There's something I'm a bit unclear on here with regards to overall timing. On the YM-2612 it's known that it takes 144 cycles of its input clock to process all the operators for all 6 channels. Previously, my assumption was that operators were processed in a relatively linear fashion and that processing an operator took 6 cycles. 6 cycles * 6 channels * 4 operators per channel does indeed get you 144 cycles. With a pipelined design, while the execution time of a single operator would indeed be 6 cycles (or 12 in the YM-2612 as you mentioned before due to the extra dummy stages), the overlap in execution means that it only takes 24 cycles to process all the operators (ignoring the 12 cycle overlap with the next runthrough anyway).

Is there in fact an internal /6 divider on the clock line or do these pipeline stages take 6 cycles to execute? Alternatively, is there something else I'm missing?

Sauraen · Post by **Sauraen** » Fri Apr 22, 2016 9:49 pm

Mask of Destiny wrote:Is there in fact an internal /6 divider on the clock line

Yup, this has been known for a while. It takes 24 INTERNAL clock cycles to evaluate everything, which is 144 external clock cycles.

Mask of Destiny wrote:Wasn't the MAME implementation originally somewhat generic in that could emulate multiple members of the OPN family? Is it possible it's only observable on certain chips it supported?

All my work so far has been from the YM2203 die because it's clearer, and then cross-checking things with the YM2612 die when necessary. They're similar enough that my VHDL operator unit covers both of them, and there's no reason MAME couldn't. But if there is a difference, it might have to do with the DAC versus the shift register output section.

By the way, the difference in channel order in the YM2612 may be observable, in so far as the YM2612 outputs the channel values sequentially. That's actually where I got the 1-4-2-5-3-6 order from, I think it was Nemesis or someone who posted those findings years ago. Now, just because the chip outputs the channels in that order doesn't necessarily mean it processes them in that order, but it's a strong hint. The reason I say it's not guaranteed is that if the chip processes all the channels of one operator before going to the next operator, and then just outputs the results after operator 4 finishes, it'll have no results for 18 cycles (while operators 1, 3, and 2 were being processed) and then spit out 6 results in quick succession. Instead we know that the results are uniformly spaced, implying that there's more going on here. [EDIT: this is wrong. For any operator, channels are processed in the order 123456. But at the same time, channels are output in the order 153264, with each channel value held for four cycles. Disregard the below also.]

Edit: Actually Eke linked us back to Nemesis's post about this, evidently it's 4 internal clocks of zero and then 1 of a value... Hmm. And if he got 1-5-3-2-6-4, that would mean:
1 X X X X 5 X X X X 3 X X X X 2 X X X X 6 X X X X 4 X X X X
which adds up to 30 cycles, nah that's not right. It's gotta be 3 internal clocks of zero and 1 of a value.
1 X X X 5 X X X 3 X X X 2 X X X 6 X X X 4 X X X
Let's fill in some blanks...
1 2 3 4 5 6 1 2 3 4 5 6 1 ut-oh!
That still doesn't quite work out, simply because we know that 12 cycles after a particular channel is being evaluated (at a particular unit), a different operator from the same channel is being evaluated at the same unit. However, it clearly is 3 internal clocks of zero and 1 of a value--a couple posts down from that one of Nemesis it shows the output along with the master clock. It's clearly 18 master clocks (3 internal) between the beginning of the falling edge and the rising edge, and 6 master clocks (1 internal) between the beginning of the rising edge and the falling edge. Of course the text there is meaningless, because they're talking about an embedded serial DAC which has long since been disproven.

SpritesMind.Net

New Documentation: An authoritative reference on the YM2612

Re: New Documentation: An authoritative reference on the YM2612

Re: New Documentation: An authoritative reference on the YM2612

Re: New Documentation: An authoritative reference on the YM2612

Re: New Documentation: An authoritative reference on the YM2612

Re: New Documentation: An authoritative reference on the YM2612

Re: New Documentation: An authoritative reference on the YM2612

Re: New Documentation: An authoritative reference on the YM2612

Re: New Documentation: An authoritative reference on the YM2612

Re: New Documentation: An authoritative reference on the YM2612

Re: New Documentation: An authoritative reference on the YM2612

Re: New Documentation: An authoritative reference on the YM2612

Re: New Documentation: An authoritative reference on the YM2612

Re: New Documentation: An authoritative reference on the YM2612

Re: New Documentation: An authoritative reference on the YM2612

Re: New Documentation: An authoritative reference on the YM2612