Confusing crash Address Error.

Ask anything your want about Megadrive/Genesis programming.

Moderator: BigEvilCorporation

r57shell
Very interested
Posts: 478
Joined: Sun Dec 23, 2012 1:30 pm
Location: Russia
Contact:

Post by r57shell » Tue Oct 14, 2014 3:06 am

may be I need high word of d0? but yes, it's fastest way in other cases.
Image

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Tue Oct 14, 2014 6:11 pm

r57shell wrote:may be I need high word of d0? but yes, it's fastest way in other cases.
If you need the high word preserved, I think move.w #0,d0 is the same speed. One of the first things you learn about 68000 assembly is avoid clr whenever possible as it's (unnecessarily) slow, and will even cause problems when used on some hardware registers - NEVER use clr on hardware registers - period. Now in many cases, the difference in speed doesn't matter, but if you're trying to optimize an inner loop or working on hardware, it's one of the things you learn. :D

r57shell
Very interested
Posts: 478
Joined: Sun Dec 23, 2012 1:30 pm
Location: Russia
Contact:

Post by r57shell » Tue Oct 14, 2014 6:42 pm

According to timing:

Code: Select all

moveq #0,d0  =4
clr.l d0     =6
clr.w d0     =4
move.w #0,d0 =8
move.l #0,d0 =12
I don't know why you think clr.w is bad. As far as I know, only moveq beats clr.l, everything else - clr better. For me it's obvious, because any immediate operation requires extension word (immediate value), except that ones with q at the end: moveq, addq, subq... which is one word opcode. And reading extension word requires at least 4 cycles, that's why one word opcode beats two word opcodes.
Image

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Post by Stef » Tue Oct 14, 2014 9:02 pm

CLR instruction is bad on memory as it does perform a read cycle before writing 0. Because of that the instruction is slow and can lock up system when memory port does not support read operation.
On register CLR is not that bad, still CLR.L is slower than MOVEQ

r57shell
Very interested
Posts: 478
Joined: Sun Dec 23, 2012 1:30 pm
Location: Russia
Contact:

Post by r57shell » Wed Oct 15, 2014 4:12 am

As I said, clr beats any of move #0, <ea>. You may check if you want.
It even almost same as move d0,<ea>, and that is strange. (move faster only on -(an), (d8, An, Xn)). So best way for clearing several variables:

Code: Select all

moveq #0,d0
move d0, <ea>
...
Something strange because in Programmer Manual stated
In the MC68000 and MC68008 a memory location is read before
it is cleared.
and in timings that read is invisible. May be RW cycle with one bus access? But as stated in User Manual
The test and set (TAS)
instruction uses this cycle to provide a signaling capability without deadlock between
processors in a multiprocessing environment. The TAS instruction (the only instruction
that uses the read-modify-write cycle) only operates on bytes. Thus, all read-modify-write
cycles are byte operations.
Something wrong in timings or in clr description. :(
Image

Mask of Destiny
Very interested
Posts: 615
Joined: Thu Nov 30, 2006 6:30 am

Post by Mask of Destiny » Wed Oct 15, 2014 4:23 pm

r57shell wrote:As I said, clr beats any of move #0, <ea>. You may check if you want.
I'm not sure how you're timing the instruction, but you're wrong. The overall instruction duration is the same except for the case in which <ea> is a register direct mode. This is quite clear if you look at the microcode. clr uses the standard effective address microcode subroutines which do a separate read first before the instruction specific code runs. I suppose it's possible they changed this between the patent filing and the final chip, but since the manual agrees with the microcode in this case that seems unlikely.

Now where exactly the bus operations occur in the overall duration will be different (too lazy to check the microcode listings at the moment to describe it precisely) and I suppose this could impact the measured performance based on how things sync up with the various refresh delays.
r57shell wrote:Something strange because in Programmer Manual stated
In the MC68000 and MC68008 a memory location is read before
it is cleared.
and in timings that read is invisible. May be RW cycle with one bus access? But as stated in User Manual
The test and set (TAS)
instruction uses this cycle to provide a signaling capability without deadlock between
processors in a multiprocessing environment. The TAS instruction (the only instruction
that uses the read-modify-write cycle) only operates on bytes. Thus, all read-modify-write
cycles are byte operations.
Something wrong in timings or in clr description. :(
Yeah TAS is the only instruction to use the read-modify-write cycle and I don't think that cycle is any faster (or at least not significantly so) than two separate bus operations. It's only purpose is to prevent another bus master from taking the bus betwen the read and the write.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Wed Oct 15, 2014 9:59 pm

In the programmers reference, there's this note for clr:
NOTE
In the MC68000 and MC68008 a memory location is read before
it is cleared.
The hardware manual specifically states that CLR <memory> takes 8 cycles + ea calculation time, does one read cycle, and one write cycle. EA calculation is 0 only for data or address register direct. It's 4 cycles for simple EAs, and as much as 16 cycles for more complex EAs. Note that the extra read is in the calculate EA timing, not the instruction timing. The calculate EA timing for (An), for example, states + 4 cycles + 1 read cycle. So the fastest clr <memory> is 12 cycles, 2 reads, and one write.

CLR is good for clearing a word register, but no better for anything else, and worse for memory.

r57shell
Very interested
Posts: 478
Joined: Sun Dec 23, 2012 1:30 pm
Location: Russia
Contact:

Post by r57shell » Thu Oct 16, 2014 2:36 pm

Mask of Destiny wrote:The overall instruction duration is the same except for the case in which <ea> is a register direct mode. This is quite clear if you look at the microcode.
Where I can look at the microcode. :shock:
Mask of Destiny wrote:Yeah TAS is the only instruction to use the read-modify-write cycle and I don't think that cycle is any faster (or at least not significantly so) than two separate bus operations.
Yeah I checked this cycle timing, and it looks just read and write one after another without releasing bus.

I don't know how I was computing timing previous time, but it's obvious that I was retarded.

Code: Select all

            dn      an      (An)    (An)+      –(An)     (d16, An)  (d8, An, Xn)*  (xxx).W  (xxx).L
clr.w       4(1/0)  4(1/0)   12(2/1) 12(2/1)    14(2/1)  16(3/1)    18(3/1)        16(3/1)  20(4/1)
move.w #0   8(2/0)  8(2/0)   12(2/1) 12(2/1)    12(2/1)  16(3/1)    18(3/1)        16(3/1)  20(4/1)
move.w dn   4(1/0)  4(1/0)   8(1/1)   8(1/1)     8(1/1)  12(2/1)    14(2/1)        12(2/1)  16(3/1)
Image

Mask of Destiny
Very interested
Posts: 615
Joined: Thu Nov 30, 2006 6:30 am

Post by Mask of Destiny » Thu Oct 16, 2014 8:18 pm

r57shell wrote: Where I can look at the microcode. :shock:
Check out : this thread Tasco Deluxe posted some links to patents in the first reply. One of them has a full listing of the micro and nanocode for a pre-production version of the 68000. There are some differences between that and the final microcode (for instance, this version has a different looping instruction called dcnt instead of dbra), but for most instructions it's probably the same. Until someone is able to determine the organization of the individual microcode bits on the die, this is the best we have.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Thu Oct 16, 2014 9:53 pm

r57shell wrote: I don't know how I was computing timing previous time, but it's obvious that I was retarded.

Code: Select all

            dn      an      (An)    (An)+      –(An)     (d16, An)  (d8, An, Xn)*  (xxx).W  (xxx).L
clr.w       4(1/0)  4(1/0)   12(2/1) 12(2/1)    14(2/1)  16(3/1)    18(3/1)        16(3/1)  20(4/1)
move.w #0   8(2/0)  8(2/0)   12(2/1) 12(2/1)    12(2/1)  16(3/1)    18(3/1)        16(3/1)  20(4/1)
move.w dn   4(1/0)  4(1/0)   8(1/1)   8(1/1)     8(1/1)  12(2/1)    14(2/1)        12(2/1)  16(3/1)
Everyone has a brain-fart now and then. I've said some really bone-headed stuff myself on occasion. I try not to get too defensive when people call me on it.

And I like that table. It makes it really easy to see which opcode you should use under which conditions.

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Post by Stef » Fri Oct 17, 2014 7:23 am

That's the same table I'm using when I'm doing some 68k assembly code, very helpful :)

r57shell
Very interested
Posts: 478
Joined: Sun Dec 23, 2012 1:30 pm
Location: Russia
Contact:

Post by r57shell » Fri Oct 17, 2014 10:46 am

Actually there is vasm which can replace opcodes with equal meaninig but faster. There are so many cases.
Image

Post Reply