SOUND: YM2612 CSM mode raw test

Aly James · Post by **Aly James** » Sat Nov 30, 2013 2:45 pm

This was the very beginning of my noisy tests and researches on the YM2612 chip CSM mode (early speech synthesis tech).
These tests have served the development of the FMDRIVE Vsti and the special implementation of the CSM I have used.
http://www.alyjameslab.com

This is a simple ROM (running on an Everdrive card) that I made with very basic control on the joy-pad that sends register writes to the YM2612 chip in real-time.
Of course... to actually make a useful use of the CSM mode, live inputs by hand from a joy-pad is not fast enough..

congrats if you are able to watch this until the end...
joking aside, there is some really nice stuff we can do sound-wise with the CSM...and almost nobody have used AFAIK this feature on the Megadrive...

ROM preview : http://youtu.be/ZPRMF-1qe3s

It is not really easy to program to actually produce understandable speech but the technology is definitely here in the YM2612...
I have made a few video on the FMDRIVE VSti to showcase what you can do with it.
I have found some rare use of CSM inside some Game Arts games for MSX:
The Silpheed game on PC88 MSX computer featuring a very similar chip than YM2612 wih the exact same CSM feature.
In use here to produce the robotic speech:
http://youtu.be/8hVwAfy88NE
and Here:
http://youtu.be/W3apkzZQa4E

Reminder >>
The FM sound of YAMAHA has the ability to key-on / key off immediately (some channels) or all channels when the timer A built-in overflows.
It is called "CSM speech synthesis mode" and stands for Composite Sinusoidal Modeling.
A type of speech coding, CSM speech synthesis is a technique to reproduce with the combination of multiple sine wave, the original data of a vocal sample.

There is a theory using FFT to "de compose" the frequency content of a signal into a sum a different sine waves, in the time domain with different pitches and volume.
Based on this theory, If you play at the same time more than one sine at an appropriate TL volume and frequencies , you can reproduce the waveform similar to the original waveform.
YM2612 can output 4 sine with 4 different Frequency and 4 Different TL volume.
FMDRIVE Vst uses that with MIDI CH1 , 11, 12, 13 to control Frequency and Volume an additional CH 14 to control the timer A.
You can also midi learn these controls to any midi controller and you're good for some live talking shit smile

This mode is also useful to output new type of sounds similar to having a powerful filter on board...and that is what is very interesting in addition to the speech thing.
My testing have shown some really cool stuff

ROM USAGE
Keep in mind that is it is one of my test ROM not intended for public release and not specially user friendly

CSM MODE ROM:
-------------------------------------------------------------------------------------
Test mode for FMDrive Vsti dev.
Works on a real MD1 and Regen Emulator.
Use at your own risk

-------------------------------------------------------------------------------------
USE 2 OPERATORS ON CH3: OP2>OP4
The ROM starts in NORMAL mode until BUTTON C is pressed
(then it will be either in CSM or SPECIAL until ROM reset)
A key on to CH3 is set on startup and basic registers set.

COMMANDS: on PAD 1
(there is also a command on PAD2 that controls the TL of OP2..
cannot remember witch one ^^)
-------------------------------------------------------------------------------------
START : KEY on/off (OP2 + OP4)

A: Pressed Set AR of OP2 to 1F, depressed Set AR to 00
So if you want to have OP2 modulating OP4 keep it pressed

B: Pressed Key on OP2 and Key off OP4

C: Pressed CSM mode (auto key on/off at Timer A speed)
Depressed Special Mode (independent FRQ set by RIGHT)

LEFT : ALGO change from 0 to 7 then wrap.

RIGHT: FRQ change for OP2 (change block. down then wrap)

DOWN :FRQ change for OP4 (change block. down then wrap)

UP: Timer A period (down then wrap)

DOWNLOAD:
http://www.alyjameslab.com/tutorials/FM ... test03.bin

Stef · Post by **Stef** » Sat Nov 30, 2013 11:48 pm

Really interesting !
I already watched the video severals time :p
Thanks for the test rom, i will try to play a bit with tomorrow !
By the way, there is a good example of CSM mode on Sega Megadrive, i don't know if you are aware of it but just in case :
http://68000.web.fc2.com/bad_apple.html

One of the version of the demo use CSM mode for voices, i believe it's version 0.06

Just for that the rom is *very* interesting

TmEE co.(TM) · Post by **TmEE co.(TM)** » Sun Dec 01, 2013 5:47 am

Great stuff ! 4 sines is kinda little for anything beyond basic vowels though...

Aly James · Post by **Aly James** » Sun Dec 01, 2013 4:59 pm

Stef wrote:Really interesting !
I already watched the video severals time :p
Thanks for the test rom, i will try to play a bit with tomorrow !
By the way, there is a good example of CSM mode on Sega Megadrive, i don't know if you are aware of it but just in case :
http://68000.web.fc2.com/bad_apple.html

One of the version of the demo use CSM mode for voices, i believe it's version 0.06 Just for that the rom is *very* interesting

Oh yeah " merci du rappel

"
I totally forgot this awesome animation work that is Bad Apple!
but yeah, I never new that they got a version with CSM.
Checking the YM2612 registers log, you can see that they did not use CSM mode in fact... Timer A is doing nothing...it is the "special mode" in use here, different frequencies for all OP.
In the end it sounds almost the same that CSM.

They surely decompose the original singing vocals using FFT in 4 mains frequencies for each sample and the appropriate power, then converted the data into a stream of F-numbers / TLs for the 4 operators.

what CSM add is the Timer A re-triggering that can act as a pitch shifter for a fixed formant.
Let's say you program a stream of data saying "Hello" you can change the pitch only with the Timer A speed.

I will post an example using FMDrive vst, bellow.
---------------------------------------------------------------------------------------------
But there is something in the use of CSM that is useful not only for speech synthesis!
The fact that the Timer A as a resolution of 10Bits makes it usable to play musical notes in a limited way but still...
The result is an instrument that main pitch is controlled by Timer A and the 4 operators F-numbers becomes filter parameters.
This kind of filtered sound is impossible to produce with FM alone, however you can still approach this type of sound with SSG envelopes...

Aly James · Post by **Aly James** » Sun Dec 01, 2013 5:04 pm

TmEE co.(TM) wrote:Great stuff ! 4 sines is kinda little for anything beyond basic vowels though...

So here is some example of how to use CSM speech in a basic way.
First a reminder :
Example of CSM speech in the MSX game Zeilard :
http://www.alyjameslab.com/tutorials/MSX_CSM.wav
Example of CSM random speech in FMDrive vsti
http://www.alyjameslab.com/tutorials/FMDrive_CSM.wav

Sine waves artifacts are noticeable in both of these examples because there is no decay on the envelope generator.

Now lets decay very quickly and the artifact are less noticeable at the end.
Example of an attempt at saying " HELLO" with fast decay
first MIDI data are played very quickly then slowed down.
http://www.alyjameslab.com/tutorials/hellocsm.wav

and for a visual explanation better than words..
Here is one instance of FMDrive in Cubase with automation lane and MIDI channels for the special mode..
Notice the similarity between the spectrogram and the automation data.

what you see is power of spectrum and variation of frequency in the time domain.
Sum of sine waves at different frequencies gives us what we call vocal formants.

[/img]

TmEE co.(TM) · Post by **TmEE co.(TM)** » Sun Dec 01, 2013 5:19 pm

I am very familiar with the theory and practical uses, just 4 sines are kinda little for nice sound. Better results can be acquired by adding another channel to the mix, and using FM features to create the more complex sounds that for example consonants have. But it will sound robotic, and when you get closer and closer to realistic speech the data you push to the YM starts nearing what PGM samples would take...
I have actually made a sample based speech synth on MD long ago hehe.

But what you have done is really cool none the less ! This stuff does not get a whole lot of love

Aly James · Post by **Aly James** » Sun Dec 01, 2013 7:00 pm

I agree with that on the quality side, but this is what interested me first

I mean the character of the CSM speech is pretty unique, like LPC speech is unique for example, very characteristic.
I like the "water like" artifacts.

Its a bitch to program anyway for speech and I prefer PCM for understanding stuff but as a supplementary sound tool it is very nice, for robotic alien speech or fx or "talking instrument".
talking instrument quick vid coming next..
EDIT: quick recording cam audio quality...: http://youtu.be/pweVGVEJXtk

sample based speech synth with word input by the user? release this stuff!

might be nice to play

r57shell · Post by **r57shell** » Tue Dec 03, 2013 5:34 pm

I still don't understand. One simple question: can you make it speak?
If yes: then cool, post some video.
If no: then, I have nothing to say.

Gigasoft · Post by **Gigasoft** » Fri Dec 06, 2013 9:47 pm

The Bad Apple link does not work anymore, can someone upload the ROM somewhere?

For those that missed it, see this old topic for my 24 sine speech codec: viewtopic.php?t=1144

Aly James · Post by **Aly James** » Sun Dec 15, 2013 7:27 pm

r57shell wrote:I still don't understand. One simple question: can you make it speak?
If yes: then cool, post some video.
If no: then, I have nothing to say.

I did implement a sample loader converter for the CSM in my FMDrive VSTi It loaded a .wav file with speech, but the result was not particularly intelligible with only 4 sine waves...
that is why I have kept the function only for "talking" effect or musical use of the Timer A speed in the final release.

I will make a video of that original feature if I find some time !