68000 programming optimization tips? (for speed)
Moderator: BigEvilCorporation
68000 programming optimization tips? (for speed)
Title speaks for itself. I do know some basic 68000 speed optimizations, but I wonder what other kinds of optimizations I can make to really speed up my programs.
-
- Very interested
- Posts: 209
- Joined: Sat Sep 08, 2012 10:41 am
- Contact:
Re: 68000 programming optimization tips? (for speed)
Is this generally, for a game, or for something very high demanding like an algorithm, FMV playback or demo stuff? Its quite a wide topic with many starting points.
If we're talking low level optimisations then the Amiga community have some of the best answers, a few examples:
http://www.sega-16.com/forum/showthread ... Discussion
http://www.easy68k.com/paulrsm/doc/trick68k.htm
http://eab.abime.net/showthread.php?t=83172
If we're talking high level stuff like identifying objects which don't need processing/rendering and how to manage active/dormant lists then some information about your engine might be necessary.
A few times I've looked into cycle counts and hardcore tricks to squeeze more performance out of some code, but nearly always it turns out I just needed to "do less" in the first place (cascade object ticks over many frames, culling of off-screen objects, etc).
There's also some high level tricks I've done to improve DMA performance (my game has a lot of animation), like reshuffling data to be contiguous, or batching up my DMA queue to spread across several frames.
What are you trying to speed up?
If we're talking low level optimisations then the Amiga community have some of the best answers, a few examples:
http://www.sega-16.com/forum/showthread ... Discussion
http://www.easy68k.com/paulrsm/doc/trick68k.htm
http://eab.abime.net/showthread.php?t=83172
If we're talking high level stuff like identifying objects which don't need processing/rendering and how to manage active/dormant lists then some information about your engine might be necessary.
A few times I've looked into cycle counts and hardcore tricks to squeeze more performance out of some code, but nearly always it turns out I just needed to "do less" in the first place (cascade object ticks over many frames, culling of off-screen objects, etc).
There's also some high level tricks I've done to improve DMA performance (my game has a lot of animation), like reshuffling data to be contiguous, or batching up my DMA queue to spread across several frames.
What are you trying to speed up?
A blog of my Megadrive programming adventures: http://www.bigevilcorporation.co.uk
Re: 68000 programming optimization tips? (for speed)
Make as less multiplications / divisions as possible.
Make them powers of two as often as possible.
In wrost case use lookup tables.
Example: you want to multiply by five.
1) If it's math, then use lookup table if possible. (actually multiplying speed depends on multiplicator, but I don't wanna dig into details)
2) If it's to locate entry in array or table, then modify array or table in such way (for example add gaps) so step between them become powers of two. If it's again impossible, use lookup tables.
Same with division. Priority:
1) Multiplication / Division by power of two.
2) Lookup table. In most cases takes similar speed as (1) but requires additional space in ROM.
3) Plain multiplication / addition.
Note: despite in modern processors "division by multiplication" gives gain in speed, m68k is not the case.
All common C/C++ optimizations are handy: args by references, args by registers, less memory allocations and so on.
All asm related optimizations look into vasm optimizations flags.
One trick in the end: if variable located in FF8000+ part of RAM, pointer to it may be stored in word.
Many of "modern" games used this trick.
Also similar trick: if you have one of a0-a6 to be $ff8000, then a6(d0) may address whole RAM by using word offset in d0.
In the end: don't fall into premature optimization.
Also don't fall into premature pessimization.
Google both terms.
Make them powers of two as often as possible.
In wrost case use lookup tables.
Example: you want to multiply by five.
1) If it's math, then use lookup table if possible. (actually multiplying speed depends on multiplicator, but I don't wanna dig into details)
2) If it's to locate entry in array or table, then modify array or table in such way (for example add gaps) so step between them become powers of two. If it's again impossible, use lookup tables.
Same with division. Priority:
1) Multiplication / Division by power of two.
2) Lookup table. In most cases takes similar speed as (1) but requires additional space in ROM.
3) Plain multiplication / addition.
Note: despite in modern processors "division by multiplication" gives gain in speed, m68k is not the case.
All common C/C++ optimizations are handy: args by references, args by registers, less memory allocations and so on.
All asm related optimizations look into vasm optimizations flags.
One trick in the end: if variable located in FF8000+ part of RAM, pointer to it may be stored in word.
Many of "modern" games used this trick.
Also similar trick: if you have one of a0-a6 to be $ff8000, then a6(d0) may address whole RAM by using word offset in d0.
In the end: don't fall into premature optimization.
Also don't fall into premature pessimization.
Google both terms.
Re: 68000 programming optimization tips? (for speed)
BigEvilCorporation: Basically, I've written a "fake layer" generator that takes 1 layer of art, shifts it a certain amount of pixels, then overlays it with another layer of art. It generates the data, then DMA's it to VRAM. I have received some help optimizing it, and while it's at the point where there 's not any lag, it still takes up a good chunk of CPU time. This was made for a Sonic 3K hack. I am also looking to optimize the engine some as well.
r57shell: Thanks for the tips. It should be noted that I also know how to do multiplication with only shifts and adding. I'll also be sure to not to fall into premature optimization and pessimization.
r57shell: Thanks for the tips. It should be noted that I also know how to do multiplication with only shifts and adding. I'll also be sure to not to fall into premature optimization and pessimization.
Re: 68000 programming optimization tips? (for speed)
This doesn't sound trivial at all.Ralakimus wrote:BigEvilCorporation: Basically, I've written a "fake layer" generator that takes 1 layer of art, shifts it a certain amount of pixels, then overlays it with another layer of art. It generates the data, then DMA's it to VRAM. I have received some help optimizing it, and while it's at the point where there 's not any lag, it still takes up a good chunk of CPU time. This was made for a Sonic 3K hack. I am also looking to optimize the engine some as well.
1) Keep a copy of it in RAM if you have the space.
2) Also the 68000 is horribly slow at memory accesses (both read and write), so reduce them where possible.
#1 ties into #2 whenever it turns out you can just reuse the existing data. That's memory you don't need to touch.
Sik is pronounced as "seek", not as "sick".
-
- Very interested
- Posts: 3131
- Joined: Thu Nov 30, 2006 9:46 pm
- Location: France - Sevres
- Contact:
Re: 68000 programming optimization tips? (for speed)
I really don't understand that assumption.. 68000 is not any slower than other CPU for memory access. It's just the memory clock is CPU clock / 4 (~1.9 Mhz on the MD), which make sense for a CPU of that period. And from that speed i would say the 68000 makes a *very good usage *of the memory, much better than any 65x0 family CPU or even the Z80.Sik wrote: 2) Also the 68000 is horribly slow at memory accesses (both read and write), so reduce them where possible..
Re: 68000 programming optimization tips? (for speed)
You say you know basic optimization techniques. In most cases "advanced" optimization tricks will be a compromise between performance and code size or, which is worse, between performance and code complexity and readability. I think it will be better to give us some samples you want to optimize.
Note interesting thing: DBcc / DBRA takes 10 cycles when branched and 14 cycles when passed.
Note interesting thing: DBcc / DBRA takes 10 cycles when branched and 14 cycles when passed.
Re: 68000 programming optimization tips? (for speed)
I was talking in comparison to the other operations the 68000 can do. Remember, it's 4 cycles added for each memory access on top of the current operation, and if you're using memory chances are you'll be accessing it a lot, so it adds up really quickly. The more you can avoid it, the better.Stef wrote:I really don't understand that assumption.. 68000 is not any slower than other CPU for memory access. It's just the memory clock is CPU clock / 4 (~1.9 Mhz on the MD), which make sense for a CPU of that period. And from that speed i would say the 68000 makes a *very good usage *of the memory, much better than any 65x0 family CPU or even the Z80.Sik wrote: 2) Also the 68000 is horribly slow at memory accesses (both read and write), so reduce them where possible..
Sik is pronounced as "seek", not as "sick".
-
- Very interested
- Posts: 89
- Joined: Mon Feb 24, 2014 6:04 pm
- Location: Kapuskasing, Ontario, Canada
- Contact:
Re: 68000 programming optimization tips? (for speed)
There are "mundane" optimizations you can figure out by staring at the instruction set long enough, most of which are covered in BigEvilCorporations links shared above.
But as one example, you should try post-incrementing with the address registers instead of pre-decrementing since post-inc is faster. Might seem like a useless optimization, unless you use a similar pre-dec routine every frame to iterate through something.
This is faster than the above (obviously need to adapt your code from pre-dec to post-inc as well but you get the idea):
I've also had lengthy arguments with "professional" software developers over the speed advantages of testing for 0 as often as possible instead of testing for an arbitrary non-zero value since testing for 0 is built into all CPUs by default.
But as one example, you should try post-incrementing with the address registers instead of pre-decrementing since post-inc is faster. Might seem like a useless optimization, unless you use a similar pre-dec routine every frame to iterate through something.
Code: Select all
.Clear:
move.l D0, -(A0) ; pre decrement
dbra D1, .Clear ; repeat
Code: Select all
.Clear:
move.l D0, (A0)+ ; Post-inc faster than Pre-dec
dbra D1, .Clear ; repeat
What does db stand for? Well that's an excellent question...
http://www.db-electronics.ca
http://www.db-electronics.ca
Re: 68000 programming optimization tips? (for speed)
Of all the possible examples you could have picked, you choose one of the only two for which this is false...db-electronics wrote:But as one example, you should try post-incrementing with the address registers instead of pre-decrementing since post-inc is faster. Might seem like a useless optimization, unless you use a similar pre-dec routine every frame to iterate through something.
This is faster than the above (obviously need to adapt your code from pre-dec to post-inc as well but you get the idea):Code: Select all
.Clear: move.l D0, -(A0) ; pre decrement dbra D1, .Clear ; repeat
Code: Select all
.Clear: move.l D0, (A0)+ ; Post-inc faster than Pre-dec dbra D1, .Clear ; repeat
There is an optimization for predecrement of the destination operand of move and movem opcodes; these two cases are no slower than post-increment. For all other opcodes (or predecrement of source operands of move and movem opcodes), predecrement is slower as you state.
-
- Very interested
- Posts: 89
- Joined: Mon Feb 24, 2014 6:04 pm
- Location: Kapuskasing, Ontario, Canada
- Contact:
Re: 68000 programming optimization tips? (for speed)
oops! Thanks for pointing out my mistake!
Edit:
Well wait, here's the table:
http://oldwww.nvg.ntnu.no/amiga/MC680x0 ... mmove.HTML
Oh I see, is optimized, but is not.
Edit:
Well wait, here's the table:
http://oldwww.nvg.ntnu.no/amiga/MC680x0 ... mmove.HTML
Oh I see,
Code: Select all
move.l D0, -(A0)
Code: Select all
move.l -(A0), D0
What does db stand for? Well that's an excellent question...
http://www.db-electronics.ca
http://www.db-electronics.ca
Re: 68000 programming optimization tips? (for speed)
addq.w #2, a0 ← 4 cycles
addq.l #2, a0 ← 8 cycles
subq.w #2, a0 ← 8 cycles
subq.l #2, a0 ← 8 cycles
¯\(º_o)/¯
addq.l #2, a0 ← 8 cycles
subq.w #2, a0 ← 8 cycles
subq.l #2, a0 ← 8 cycles
¯\(º_o)/¯
Sik is pronounced as "seek", not as "sick".
Re: 68000 programming optimization tips? (for speed)
That is a known error in PRM and derived sources. Hardware measurements and microcode analysis both show that the first addq is also 8 cycles. The better source of cycle information for the 68000 is yacht.txt.Sik wrote:addq.w #2, a0 ← 4 cycles
addq.l #2, a0 ← 8 cycles
subq.w #2, a0 ← 8 cycles
subq.l #2, a0 ← 8 cycles
¯\(º_o)/¯
-
- Very interested
- Posts: 209
- Joined: Sat Sep 08, 2012 10:41 am
- Contact:
Re: 68000 programming optimization tips? (for speed)
This is a habit I've gotten into anyway, since I use 0 as 'disabled' for almost everything like enabled/visible flags, states, counts, etc, but found myself needing more and more to add additional valid values and having to swap my "check for value and branch if equal" to "check for zero and branch if not equal" a few times. It's less pretty to read (double negative) but worth the hassle.db-electronics wrote: I've also had lengthy arguments with "professional" software developers over the speed advantages of testing for 0 as often as possible instead of testing for an arbitrary non-zero value since testing for 0 is built into all CPUs by default.
A blog of my Megadrive programming adventures: http://www.bigevilcorporation.co.uk
Re: 68000 programming optimization tips? (for speed)
Something I got into the habit of doing is also to exploit negative values, since you can check for the sign just as easily as you can check for zero (in fact, you can check for both simultaneously).
Sik is pronounced as "seek", not as "sick".