Page 1 of 1

68000 instruction timing from source?

Posted: Sat Oct 01, 2016 4:39 pm
by BigEvilCorporation
I'd like a tool that parses 68000 source code, and writes it back out with the instruction timing in the margin (or as a spreadsheet or whatever). Then I could produce a heat map of the most expensive code.

Does such a thing already exist? If not, I'll write one.

I guess an end goal would be to integrate heat map style profiling into an emulator to evaluate realtime code bottlenecks (static analysing won't properly account for loops and branches, etc).

Re: 68000 instruction timing from source?

Posted: Sat Oct 01, 2016 6:06 pm
by cero
If you're going to integrate it to an emulator, it would be most useful to use one of the existing setups: gprof, oprofile, or valgrind. Being compatible with those would allow you to use their tools (GUIs, filters...), and extend into C and C++ code too.

edit: Here's a link to gprof on embedded bare-metal ARM:
https://mcuoneclipse.com/2015/08/23/tut ... -cortex-m/

Re: 68000 instruction timing from source?

Posted: Sat Oct 01, 2016 9:45 pm
by Miquel
Unfortunately you need to simulate/emulate/execute the code to analyze it: there are loops and they are the hot zones, and you can't detect it with a simple parser. I don't have recollection of a time when really maters to optimize a non-loop code.

Still more there are instructions timing that depend on the context, shift's can go from 4 cycles to about 66 cycles. As modern cpu's, jumping or not on a conditional jump can discard data on load/decode stage, increasing by 2 the cycle count, and so on.

Re: 68000 instruction timing from source?

Posted: Sun Oct 02, 2016 10:30 am
by MintyTheCat
I began work on a similar tool some time ago. You need to essentially execute the loops and account each time. I have not looked into it but it would make sense to look at Starscream for ideas.

Re: 68000 instruction timing from source?

Posted: Sun Oct 02, 2016 2:48 pm
by BigEvilCorporation
I'll get busy then!

Static analysis isn't properly representative, of course, but it's a useful first pass and can be used for comparison of small sections of code during an optimisation refactor, rather than the whole picture at runtime. Examples of instruction-counted code is a hot topic on this forum and Sega-16, and I see a lot of small snippets demonstrating various optimisations. It would only be a first step to a proper runtime solution, too.

Re: 68000 instruction timing from source?

Posted: Tue Oct 04, 2016 7:05 pm
by ehaliewicz
What you really want is some sort of profiler.

Re: 68000 instruction timing from source?

Posted: Fri Oct 07, 2016 1:44 pm
by Grind
ehaliewicz wrote:What you really want is some sort of profiler.
As far as I know the closest thing gennydev has to profiling is random gdb pausing and timing with Kdebug.

I hope I'm wrong though.

Re: 68000 instruction timing from source?

Posted: Fri Oct 07, 2016 5:25 pm
by r57shell
You can grab opcode lengths & opcode timings from any emulator source,
then get gens r57shell mod and put timings into lua, then setup break on PC on whole ROM, and count each opcode timing.

Re: 68000 instruction timing from source?

Posted: Fri Oct 07, 2016 6:51 pm
by Mask of Destiny
I'd be willing to add profiling support to BlastEm, but it would be good to have some input on the best output format. The simplest thing to do would be to have a list of address and the cumulative number of cycles spent executing instructions at those addresses. This would not allow any degree of callstack analysis though which many profiling tools offer. There's also the issue of whether it makes sense to differentiate from time spent actually executing an instruction and time spent waiting for DMA or the like to complete.

Re: 68000 instruction timing from source?

Posted: Fri Oct 07, 2016 8:23 pm
by BigEvilCorporation
I'm going to start with a simple Exodus plugin - I already have a SNASM68K COFF reader for it so I can match addresses to source, so this makes sense. I'll just make it dump out accumulative cost per instruction in CSV format for now and go from there.

Re: 68000 instruction timing from source?

Posted: Sat Oct 08, 2016 3:03 pm
by Sik
Mask of Destiny wrote:There's also the issue of whether it makes sense to differentiate from time spent actually executing an instruction and time spent waiting for DMA or the like to complete.
You normally want to know how long it took for the code to execute, so you probably want to take into account the DMAs.

Re: 68000 instruction timing from source?

Posted: Sun Oct 09, 2016 5:36 am
by Mask of Destiny
Oh certainly, but you might want to be able to separate out the actual execution time from the time spent waiting for DMA to complete e.g. "the move.l at $XXX took N cycles of which 12 were execution and the rest were DMA"

Re: 68000 instruction timing from source?

Posted: Sun Oct 09, 2016 12:54 pm
by r57shell
Any timing of transfer-DMA will be in same func that starts it. In other words, it's not spread around.
So, I don't see any reason to take off account DMA, except if you want to do some DMA copy/fill.

Re: 68000 instruction timing from source?

Posted: Sun Oct 09, 2016 3:44 pm
by Sik
If DMA copy/fill is your bottleneck then the profiler would show your 68000 code spending lots of time wherever you put the wait for the DMA flag to clear, wouldn't it?

Re: 68000 instruction timing from source?

Posted: Mon Oct 10, 2016 7:47 pm
by r57shell
yes it would, so? (in case if you testing that bit)