68000 instruction timing from source?
Moderator: BigEvilCorporation
-
- Very interested
- Posts: 209
- Joined: Sat Sep 08, 2012 10:41 am
- Contact:
68000 instruction timing from source?
I'd like a tool that parses 68000 source code, and writes it back out with the instruction timing in the margin (or as a spreadsheet or whatever). Then I could produce a heat map of the most expensive code.
Does such a thing already exist? If not, I'll write one.
I guess an end goal would be to integrate heat map style profiling into an emulator to evaluate realtime code bottlenecks (static analysing won't properly account for loops and branches, etc).
Does such a thing already exist? If not, I'll write one.
I guess an end goal would be to integrate heat map style profiling into an emulator to evaluate realtime code bottlenecks (static analysing won't properly account for loops and branches, etc).
A blog of my Megadrive programming adventures: http://www.bigevilcorporation.co.uk
Re: 68000 instruction timing from source?
If you're going to integrate it to an emulator, it would be most useful to use one of the existing setups: gprof, oprofile, or valgrind. Being compatible with those would allow you to use their tools (GUIs, filters...), and extend into C and C++ code too.
edit: Here's a link to gprof on embedded bare-metal ARM:
https://mcuoneclipse.com/2015/08/23/tut ... -cortex-m/
edit: Here's a link to gprof on embedded bare-metal ARM:
https://mcuoneclipse.com/2015/08/23/tut ... -cortex-m/
Re: 68000 instruction timing from source?
Unfortunately you need to simulate/emulate/execute the code to analyze it: there are loops and they are the hot zones, and you can't detect it with a simple parser. I don't have recollection of a time when really maters to optimize a non-loop code.
Still more there are instructions timing that depend on the context, shift's can go from 4 cycles to about 66 cycles. As modern cpu's, jumping or not on a conditional jump can discard data on load/decode stage, increasing by 2 the cycle count, and so on.
Still more there are instructions timing that depend on the context, shift's can go from 4 cycles to about 66 cycles. As modern cpu's, jumping or not on a conditional jump can discard data on load/decode stage, increasing by 2 the cycle count, and so on.
HELP. Spanish TVs are brain washing people to be hostile to me.
-
- Very interested
- Posts: 484
- Joined: Sat Mar 05, 2011 11:11 pm
- Location: Berlin, Germany
Re: 68000 instruction timing from source?
I began work on a similar tool some time ago. You need to essentially execute the loops and account each time. I have not looked into it but it would make sense to look at Starscream for ideas.
UMDK Fanboy
-
- Very interested
- Posts: 209
- Joined: Sat Sep 08, 2012 10:41 am
- Contact:
Re: 68000 instruction timing from source?
I'll get busy then!
Static analysis isn't properly representative, of course, but it's a useful first pass and can be used for comparison of small sections of code during an optimisation refactor, rather than the whole picture at runtime. Examples of instruction-counted code is a hot topic on this forum and Sega-16, and I see a lot of small snippets demonstrating various optimisations. It would only be a first step to a proper runtime solution, too.
Static analysis isn't properly representative, of course, but it's a useful first pass and can be used for comparison of small sections of code during an optimisation refactor, rather than the whole picture at runtime. Examples of instruction-counted code is a hot topic on this forum and Sega-16, and I see a lot of small snippets demonstrating various optimisations. It would only be a first step to a proper runtime solution, too.
A blog of my Megadrive programming adventures: http://www.bigevilcorporation.co.uk
-
- Very interested
- Posts: 50
- Joined: Tue Dec 24, 2013 1:00 am
Re: 68000 instruction timing from source?
What you really want is some sort of profiler.
Re: 68000 instruction timing from source?
As far as I know the closest thing gennydev has to profiling is random gdb pausing and timing with Kdebug.ehaliewicz wrote:What you really want is some sort of profiler.
I hope I'm wrong though.
Re: 68000 instruction timing from source?
You can grab opcode lengths & opcode timings from any emulator source,
then get gens r57shell mod and put timings into lua, then setup break on PC on whole ROM, and count each opcode timing.
then get gens r57shell mod and put timings into lua, then setup break on PC on whole ROM, and count each opcode timing.
-
- Very interested
- Posts: 619
- Joined: Thu Nov 30, 2006 6:30 am
Re: 68000 instruction timing from source?
I'd be willing to add profiling support to BlastEm, but it would be good to have some input on the best output format. The simplest thing to do would be to have a list of address and the cumulative number of cycles spent executing instructions at those addresses. This would not allow any degree of callstack analysis though which many profiling tools offer. There's also the issue of whether it makes sense to differentiate from time spent actually executing an instruction and time spent waiting for DMA or the like to complete.
-
- Very interested
- Posts: 209
- Joined: Sat Sep 08, 2012 10:41 am
- Contact:
Re: 68000 instruction timing from source?
I'm going to start with a simple Exodus plugin - I already have a SNASM68K COFF reader for it so I can match addresses to source, so this makes sense. I'll just make it dump out accumulative cost per instruction in CSV format for now and go from there.
A blog of my Megadrive programming adventures: http://www.bigevilcorporation.co.uk
Re: 68000 instruction timing from source?
You normally want to know how long it took for the code to execute, so you probably want to take into account the DMAs.Mask of Destiny wrote:There's also the issue of whether it makes sense to differentiate from time spent actually executing an instruction and time spent waiting for DMA or the like to complete.
Sik is pronounced as "seek", not as "sick".
-
- Very interested
- Posts: 619
- Joined: Thu Nov 30, 2006 6:30 am
Re: 68000 instruction timing from source?
Oh certainly, but you might want to be able to separate out the actual execution time from the time spent waiting for DMA to complete e.g. "the move.l at $XXX took N cycles of which 12 were execution and the rest were DMA"
Re: 68000 instruction timing from source?
Any timing of transfer-DMA will be in same func that starts it. In other words, it's not spread around.
So, I don't see any reason to take off account DMA, except if you want to do some DMA copy/fill.
So, I don't see any reason to take off account DMA, except if you want to do some DMA copy/fill.
Re: 68000 instruction timing from source?
If DMA copy/fill is your bottleneck then the profiler would show your 68000 code spending lots of time wherever you put the wait for the DMA flag to clear, wouldn't it?
Sik is pronounced as "seek", not as "sick".