Did you tried Genesis Plus on your 400 Mhz PDA ? It is pure C but the code is goodShiru wrote:In case of small devices like today's PDA's, smartphones, and current-gen handhelds, assembler is not solution. These devices often has relatively fast RISC CPU's (like ARM) with really slow memory. So, it's comes very hard to make hand-written code works faster than today's optimizing C compilers generate. They still generate bunch of crap, but that crap can works (if written specially for given architecture) as fast as your beautiful and perfect code - because in the end all limited by memory speed. Only advantage in writing for these devices in assembly - you always can write much shorter code than compiler generate.
So there is no problem with C/C++ itself, there is more problem with understanding of architecture of small devices and with optimization of algorithms and it's C/C++ code for RISC CPU's and slow memory.
Sega Megadrive Portable
-
- Very interested
- Posts: 3131
- Joined: Thu Nov 30, 2006 9:46 pm
- Location: France - Sevres
- Contact:
yes, i can confirm this: if you manage to understand how the rendering code is working (mostly in render.c) then you will know what smart code design really mean
Charles Mc Donald has made a really awesome work with this emulator and even if it's coded in pure C, the rendering code, for example, is damn fast,with use of lot of precalculated lookup tables and coding tricks
The only downsides might come from the 68k & YM2612 (and maybe Z80) 'C' cores, which came from MAME, are not really optimized and would perhaps slowdown the emulation on some platform, I don't know...
but on the Gamecube (~500Mhz PowerPC CPU), I have every genesis games running at full speed
PS: does someone know about the Radica ? according to devster (http://devster.monkeeh.com/sega/radica/) it's something like Genesis on a single Chip ... I really wonder how they did this, is this some kind of multiple chips "emulation" on a FPGA ?
Charles Mc Donald has made a really awesome work with this emulator and even if it's coded in pure C, the rendering code, for example, is damn fast,with use of lot of precalculated lookup tables and coding tricks
The only downsides might come from the 68k & YM2612 (and maybe Z80) 'C' cores, which came from MAME, are not really optimized and would perhaps slowdown the emulation on some platform, I don't know...
but on the Gamecube (~500Mhz PowerPC CPU), I have every genesis games running at full speed
PS: does someone know about the Radica ? according to devster (http://devster.monkeeh.com/sega/radica/) it's something like Genesis on a single Chip ... I really wonder how they did this, is this some kind of multiple chips "emulation" on a FPGA ?
-
- Very interested
- Posts: 326
- Joined: Mon Mar 12, 2007 1:53 am
- Contact:
-
- Very interested
- Posts: 2984
- Joined: Fri Aug 17, 2007 9:33 pm
The problem with using C/C++ vs assembly is register usage. For best speed, you need to keep certain things in registers as much as possible as the memory is SLOW (as you mentioned yourself). With C/C++ emulators, the emulated stated is always in memory. Because of this, I've yet to see a C/C++ emulator even reach half the speed of an emulator done in assembly, despite optimizing compilers. When you choose what to keep in registers correctly, even poor assembly is VASTLY superior to the best C code for emulations.
-
- Very interested
- Posts: 3131
- Joined: Thu Nov 30, 2006 9:46 pm
- Location: France - Sevres
- Contact:
Good C compiler knows how to optimize registers usage. Of course they can't do the best as you can with assembler code but they are able to keep certains values in registers when you often use them (you need good C code too) =)Chilly Willy wrote:The problem with using C/C++ vs assembly is register usage. For best speed, you need to keep certain things in registers as much as possible as the memory is SLOW (as you mentioned yourself). With C/C++ emulators, the emulated stated is always in memory. Because of this, I've yet to see a C/C++ emulator even reach half the speed of an emulator done in assembly, despite optimizing compilers. When you choose what to keep in registers correctly, even poor assembly is VASTLY superior to the best C code for emulations.
-
- Very interested
- Posts: 2984
- Joined: Fri Aug 17, 2007 9:33 pm
The difference is in GLOBAL register usage. That is particularly important in emulations. The emulator writer can pick what parts to keep in registers across the entire app, while compilers are still fairly poor in global optimizations.Stef wrote:Good C compiler knows how to optimize registers usage. Of course they can't do the best as you can with assembler code but they are able to keep certains values in registers when you often use them (you need good C code too) =)Chilly Willy wrote:The problem with using C/C++ vs assembly is register usage. For best speed, you need to keep certain things in registers as much as possible as the memory is SLOW (as you mentioned yourself). With C/C++ emulators, the emulated stated is always in memory. Because of this, I've yet to see a C/C++ emulator even reach half the speed of an emulator done in assembly, despite optimizing compilers. When you choose what to keep in registers correctly, even poor assembly is VASTLY superior to the best C code for emulations.
If I wanted local optimizations, I'd probably compile the function in C and look at the generated code. It would probably be better than you could do by hand. By the same token, the compiler wouldn't be able to determine what would be best to keep in registers across the entire program. Part of that problem has to do with the design of C/C++ emulators. The authors generally define an emulated machine state table which is allocated as one big memory block. Since some of the things that should be kept in registers are just one small piece of this big state structure, the compiler doesn't realize it has higher significance than the rest of the state structure and therefore leaves it in memory most of the time.
Maybe if the compiler had some way to define a priority of parts of a structure so that the optimization pass can focus on those areas with higher priority than the rest, maybe then C/C++ emulations would approach the speed pure assembly gives. The "register" keyword is too limiting on most compilers - it tends to reserve the register COMPLETELY for the scope of the variable specified. Again, this is something an assembly program can deal with better.
-
- Very interested
- Posts: 3131
- Joined: Thu Nov 30, 2006 9:46 pm
- Location: France - Sevres
- Contact:
Yep, i understand your problem with global register allocation. You have no choice, you have to localise your variables. It's what i did with C68K, the main function is huge because of thatChilly Willy wrote:The difference is in GLOBAL register usage. That is particularly important in emulations. The emulator writer can pick what parts to keep in registers across the entire app, while compilers are still fairly poor in global optimizations.Stef wrote:Good C compiler knows how to optimize registers usage. Of course they can't do the best as you can with assembler code but they are able to keep certains values in registers when you often use them (you need good C code too) =)Chilly Willy wrote:The problem with using C/C++ vs assembly is register usage. For best speed, you need to keep certain things in registers as much as possible as the memory is SLOW (as you mentioned yourself). With C/C++ emulators, the emulated stated is always in memory. Because of this, I've yet to see a C/C++ emulator even reach half the speed of an emulator done in assembly, despite optimizing compilers. When you choose what to keep in registers correctly, even poor assembly is VASTLY superior to the best C code for emulations.
If I wanted local optimizations, I'd probably compile the function in C and look at the generated code. It would probably be better than you could do by hand. By the same token, the compiler wouldn't be able to determine what would be best to keep in registers across the entire program. Part of that problem has to do with the design of C/C++ emulators. The authors generally define an emulated machine state table which is allocated as one big memory block. Since some of the things that should be kept in registers are just one small piece of this big state structure, the compiler doesn't realize it has higher significance than the rest of the state structure and therefore leaves it in memory most of the time.
Maybe if the compiler had some way to define a priority of parts of a structure so that the optimization pass can focus on those areas with higher priority than the rest, maybe then C/C++ emulations would approach the speed pure assembly gives. The "register" keyword is too limiting on most compilers - it tends to reserve the register COMPLETELY for the scope of the variable specified. Again, this is something an assembly program can deal with better.
-
- Very interested
- Posts: 2984
- Joined: Fri Aug 17, 2007 9:33 pm
Ah - make it one big local function and the optimizing SHOULD make it fast. Yes, that would make for a HUGE function for most emulators.Stef wrote:Yep, i understand your problem with global register allocation. You have no choice, you have to localise your variables. It's what i did with C68K, the main function is huge because of thatChilly Willy wrote:The difference is in GLOBAL register usage. That is particularly important in emulations. The emulator writer can pick what parts to keep in registers across the entire app, while compilers are still fairly poor in global optimizations.Stef wrote: Good C compiler knows how to optimize registers usage. Of course they can't do the best as you can with assembler code but they are able to keep certains values in registers when you often use them (you need good C code too) =)
If I wanted local optimizations, I'd probably compile the function in C and look at the generated code. It would probably be better than you could do by hand. By the same token, the compiler wouldn't be able to determine what would be best to keep in registers across the entire program. Part of that problem has to do with the design of C/C++ emulators. The authors generally define an emulated machine state table which is allocated as one big memory block. Since some of the things that should be kept in registers are just one small piece of this big state structure, the compiler doesn't realize it has higher significance than the rest of the state structure and therefore leaves it in memory most of the time.
Maybe if the compiler had some way to define a priority of parts of a structure so that the optimization pass can focus on those areas with higher priority than the rest, maybe then C/C++ emulations would approach the speed pure assembly gives. The "register" keyword is too limiting on most compilers - it tends to reserve the register COMPLETELY for the scope of the variable specified. Again, this is something an assembly program can deal with better.
-
- Very interested
- Posts: 2440
- Joined: Tue Dec 05, 2006 1:37 pm
- Location: Estonia, Rapla City
- Contact:
Why not squeeze some ASM into the program ? You don't need much to gain incredible amount of speed increase... 100lines of ASM in my QB program made it 200% faster....
Mida sa loed ? Nagunii aru ei saa
http://www.tmeeco.eu
Files of all broken links and images of mine are found here : http://www.tmeeco.eu/FileDen
http://www.tmeeco.eu
Files of all broken links and images of mine are found here : http://www.tmeeco.eu/FileDen
-
- Very interested
- Posts: 2984
- Joined: Fri Aug 17, 2007 9:33 pm
Yes, that's the KEY point for using C. Assembly should be reserved for those times when speed is the factor above all else... or maybe when FUN is the factor above all else.Stef wrote:ASM ins't portable. I use ASM only for non portable project and when it is really needed
I've been dumping small example apps from the PSP SDK (via psp-objdump) to get an idea of how you call the libraries and setup the system stuff from assembly. My goal is to make a SEGA Genesis/CD/32X emulation for the PSP. Such a beast cannot possibly be done in C/C++, no matter how good. It IS on the edge of possibility for pure assembly. In this case, portability is of no consequence - it's strictly a PSP-only project. SEGA Genesis/CD is fairly easy for assembly - it's squeezing in the 32X that is pushing it. Fortunately, the extra hardware the 32X adds is simple, and the extra CPUs are pretty straightforward RISC chips. If I shuffle all the non-CPU stuff to the MediaEngine, the main CPU should just barely squeak by emulating the CPUs.
while doing some googling, I find this page (all in japanese, I can't really read it) http://psp.nukenin.jp/index_E.html
which mention a port of DGEN for the PSP:
this is called Segadrive , it seems to be different from the other dgen port and is not mentionned in any "western" website
http://psp.nukenin.jp/HTM/index_SEGA_DRIVE.html
by the same guy, there is also a port of Picodrive for PSP (which is also a genesis emulator originally made by Dave and now is maintaned by Notaz, whomade the GP2X version and added segacd support)
http://psp.nukenin.jp/HTM/index_PICO_DRIVE.html
the dev' blog: http://ameblo.jp/pspdevblog/ seem also to regroup many technical infos (you can see many coding quote, it seems that this guy has done some work to improve dgen original code), too bad I can not read japanese (it seems to have some reflexion about FMemulation)
following some links on this japanese website, I found that there was also a port of Genesis Plus for PSP ! As always, I found no western website who mentionned about these port
http://psp.nukenin.jp/HTM/osakana_mirror.html
and there is also a modified version of Gens for windows :
which mention a port of DGEN for the PSP:
this is called Segadrive , it seems to be different from the other dgen port and is not mentionned in any "western" website
http://psp.nukenin.jp/HTM/index_SEGA_DRIVE.html
by the same guy, there is also a port of Picodrive for PSP (which is also a genesis emulator originally made by Dave and now is maintaned by Notaz, whomade the GP2X version and added segacd support)
http://psp.nukenin.jp/HTM/index_PICO_DRIVE.html
the dev' blog: http://ameblo.jp/pspdevblog/ seem also to regroup many technical infos (you can see many coding quote, it seems that this guy has done some work to improve dgen original code), too bad I can not read japanese (it seems to have some reflexion about FMemulation)
following some links on this japanese website, I found that there was also a port of Genesis Plus for PSP ! As always, I found no western website who mentionned about these port
http://psp.nukenin.jp/HTM/osakana_mirror.html
and there is also a modified version of Gens for windows :
I wonder what he means about the PSG wavesGens2.14Souvenir clone few "ASM" to "C" converting feautures, and with the Reality sound.
(the SEGA VDP's real the Triangultic rectangular wave's. but 76496 are Rectangular wave's. not compatibles.)
(do real machine from 315-5313 95pin sampling analysis it things)
This is megadrive/genesis only(without 32X, without MEGA/SEGA CD).(2006 Oct-1st)
-
- Very interested
- Posts: 2984
- Joined: Fri Aug 17, 2007 9:33 pm
The main page for PSP DGen is: http://syn-k.sakura.ne.jp/dgen_psp/
Thanks for the links - I'll check out some of these other emulators.
Thanks for the links - I'll check out some of these other emulators.
-
- Very interested
- Posts: 3131
- Joined: Thu Nov 30, 2006 9:46 pm
- Location: France - Sevres
- Contact:
triangle wave ?!! that doesn't seems realistic for me... if the VDP PSG was really generating triangle wave the sound would have been quite different.Eke wrote:...and there is also a modified version of Gens for windows :I wonder what he means about the PSG wavesGens2.14Souvenir clone few "ASM" to "C" converting feautures, and with the Reality sound.
(the SEGA VDP's real the Triangultic rectangular wave's. but 76496 are Rectangular wave's. not compatibles.)
(do real machine from 315-5313 95pin sampling analysis it things)
This is megadrive/genesis only(without 32X, without MEGA/SEGA CD).(2006 Oct-1st)
I know the PSG doesn't generate perfect square waves, they look more like that :
Code: Select all
___ ___
| ---___ | ---___
| |_________| |___
Maybe, it will be better to split thread to discussion about MDP and discussion about SMD emulation on handheld devices?
I tried it today, and that works surprisingly well. Not ideal, of course (as I understand, author did not optimize code for PPC) - with noticeable but acceptable frameskip, and with good sound, at least. Unfortunately, this emulator not shows FPS, and I was not able to edit config file at device while test, so I can't do some benchmarks. I only can say that v1.3 allow to choose between different Z80 and M68K cores, and MAME M68K core is much slower (games becomes unplayable), and change of Z80 core does not make noticeable difference.Stef wrote:Did you tried Genesis Plus on your 400 Mhz PDA ? It is pure C but the code is good :)
I believe that, but I must notice that lookup tables can slowdown code, which works very fast on x86 PC's, on systems like PocketPC - these tables just does not fit in smaller cache (sometimes real computations works faster than precalculated tables on systems with slow memory).Eke wrote:Charles Mc Donald has made a really awesome work with this emulator and even if it's coded in pure C, the rendering code, for example, is damn fast,with use of lot of precalculated lookup tables and coding tricks
As I said in this post, they very slowdown emulation. And I can say, sometimes these cores not 'not really optimized', but 'really not optimized'. That not bad, of course, because these cores designed not to work fast, but to work very precisely.Eke wrote:The only downsides might come from the 68k & YM2612 (and maybe Z80) 'C' cores, which came from MAME, are not really optimized and would perhaps slowdown the emulation on some platform, I don't know...
I think, Gamecube does not have slow memory, like portable devices has, and also have bigger cache.Eke wrote:but on the Gamecube (~500Mhz PowerPC CPU), I have every genesis games running at full speed
I don't know about Radica (never seen), but all SMD clones in my country from early 2000s is built on two or single big chips. And all NES clones here from mid of 1990s is one-chip - only in early 1990s there sometimes was multi-chip clones, with external memory in DIP cases, or complete with many discrete DIP chips (including CPU and PPU).Eke wrote:PS: does someone know about the Radica ? according to devster (http://devster.monkeeh.com/sega/radica/) it's something like Genesis on a single Chip ... I really wonder how they did this, is this some kind of multiple chips "emulation" on a FPGA ?
I don't say that you point is wrong, I just explain my point. If we have slow memory, to make code faster, we generally must minimize access to memory. Storage of most usable variables in registers is just an particular case of that. But we can't fit all needed variables in registers anyway, and there is other methods exists. We have cache, we can temporarily move global variables to register variables, until loop execution, etc. And when we minimize memory access in C code, we get same bottleneck as in equal assembly code - for loops it's usually reading of input data stream from slow memory.Chilly Willy wrote:The problem with using C/C++ vs assembly is register usage. For best speed, you need to keep certain things in registers as much as possible as the memory is SLOW (as you mentioned yourself). With C/C++ emulators, the emulated stated is always in memory. Because of this, I've yet to see a C/C++ emulator even reach half the speed of an emulator done in assembly, despite optimizing compilers. When you choose what to keep in registers correctly, even poor assembly is VASTLY superior to the best C code for emulations.
There is written triangultic rectangular, not just triangle. This can mean something like you draw - rectangular wave smoothed with output filter.Stef wrote:triangle wave ?!!Gens2.14Souvenir clone few "ASM" to "C" converting feautures, and with the Reality sound.
(the SEGA VDP's real the Triangultic rectangular wave's. but 76496 are Rectangular wave's. not compatibles.)