To NOP or not to NOP

Ask anything your want about Megadrive/Genesis programming.

Moderator: BigEvilCorporation

furrykef
Interested
Posts: 30
Joined: Mon Jul 21, 2008 7:28 pm

To NOP or not to NOP

Post by furrykef »

I'm wondering when exactly you should use NOP. For instance, here's the read_joypad1 routine from genesis.c:

Code: Select all

ushort read_joypad1()
{
    register volatile uchar *pb;
    ushort i, j;

    pb = (uchar *) 0xa10003;

    *pb = 0x40;        /* check joypad */
    asm("nop");
    asm("nop");
    i = *pb & 0x3f;

    *pb = 0;           /* check buttons */
    asm("nop");
    asm("nop");
    j = (*pb & 0x30) << 2;

    return( ~(i|j) );
}
I realize that the NOPs are to wait for the hardware to notice what you're doing. But when exactly is it needed, and how do you know how many cycles to wait? In this particular case, it's reading from a memory register shortly after writing to the same register, so I assume that's why the wait is necessary. But I don't see anything like this specified in sega2f.doc. Were the NOPs written in after trial and error, or is this documented somewhere?

Keep in mind that I'm thinking in terms of the actual hardware, not just emulation.

- Kef
TmEE co.(TM)
Very interested
Posts: 2452
Joined: Tue Dec 05, 2006 1:37 pm
Location: Estonia, Rapla City
Contact:

Post by TmEE co.(TM) »

On actual hardware, you need the NOPs, or at least one of them. Omitting them can lead to some non-responsiveness on real hardware (especially 6-button pads, and when you have some overclocking going on).
Mida sa loed ? Nagunii aru ei saa ;)
http://www.tmeeco.eu
Files of all broken links and images of mine are found here : http://www.tmeeco.eu/FileDen
furrykef
Interested
Posts: 30
Joined: Mon Jul 21, 2008 7:28 pm

Post by furrykef »

Yes, but my question is when they're necessary... how do you know when you should put them? As I said, I haven't found any information on it in the documentation.
TmEE co.(TM)
Very interested
Posts: 2452
Joined: Tue Dec 05, 2006 1:37 pm
Location: Estonia, Rapla City
Contact:

Post by TmEE co.(TM) »

you have to put them after every TH line modification.
Mida sa loed ? Nagunii aru ei saa ;)
http://www.tmeeco.eu
Files of all broken links and images of mine are found here : http://www.tmeeco.eu/FileDen
Shiru
Very interested
Posts: 786
Joined: Sat Apr 07, 2007 3:11 am
Location: Russia, Moscow
Contact:

Post by Shiru »

I'd say, question is 'how long this delay must be'? Because different C compilers with different settings produces different code, so maybe those NOP's actually unneeded (execution of code between accesses to port almost surely makes enough delay).
furrykef
Interested
Posts: 30
Joined: Mon Jul 21, 2008 7:28 pm

Post by furrykef »

you have to put them after every TH line modification.
So it's something specific to that particular register, then? Any other places where I might need to use NOP where it isn't obvious from the documentation?
Shiru wrote:I'd say, question is 'how long this delay must be'? Because different C compilers with different settings produces different code, so maybe those NOP's actually unneeded (execution of code between accesses to port almost surely makes enough delay).
Well, I have the same routine written in ASM and it also used two NOPs. I don't know which version was written first. But I'm willing to bet that it's likely enough that it'll get assembled into essentially the same code. Compiler bloat doesn't always affect every little line of code.

It's still a good question, though: how do you know how much to delay?

- Kef
tomaitheous
Very interested
Posts: 256
Joined: Tue Sep 11, 2007 9:10 pm

Post by tomaitheous »

furrykef wrote:
It's still a good question, though: how do you know how much to delay?
Unless you know about it previously, you don't know :wink:. When coding for a console, it's best to grab as many documents as you can find. People forget to mention things, or sometimes it's assumed you know. Other times the doc authors aren't informed themselves. You need to weed through the docs and what's different. You don't do this in general, but more to a specific area or interface.

Having coded on other systems, the first thing about reading from the controller port would be "does it need a delay?". Especially for a multiplexed controller port.

I think after a while you start to get a feel for what might be timing sensitive communications and ask if it's not mentioned in the docs. A general rule is that anything interfacing with the processor has some sort of timing guidelines at some specific area or stage of the device. The VDP, 2612, z80, I/O ports, etc.
furrykef
Interested
Posts: 30
Joined: Mon Jul 21, 2008 7:28 pm

Post by furrykef »

I found one doc that says you need 16 cycles. Two NOPs is eight cycles. I'm not sure whether or not the doc means you need 16 cycles before the next move instruction that accesses the register, or 16 cycles including the next move instruction. If it includes it, then it should work out to at least 16 cycles total.

- Kef
HardWareMan
Very interested
Posts: 753
Joined: Sat Dec 15, 2007 7:49 am

Post by HardWareMan »

furrykef wrote:I found one doc that says you need 16 cycles. Two NOPs is eight cycles. I'm not sure whether or not the doc means you need 16 cycles before the next move instruction that accesses the register, or 16 cycles including the next move instruction. If it includes it, then it should work out to at least 16 cycles total.

- Kef
Do not forget: you need 16 cycles between changes IO line, not commands M68K. I mean, opcode MOVE #$0000,$A10003 will do change IO lines after its fetching (1w opcode + 1w constant + 2w address = 4 words).
furrykef
Interested
Posts: 30
Joined: Mon Jul 21, 2008 7:28 pm

Post by furrykef »

I have to admit it took me a second to figure out what you meant. So basically you're saying the fetch/decode part of the CPU's fetch/decode/execute sequence for the move instruction should cover it, right? Got it. :)

Or actually... could NOPs actually take 8 cycles rather than 4, hence two NOPs = 16 cycles? NOP takes four cycles to execute, but it should take another four cycles to fetch the NOP instruction in the first place, shouldn't it?

- Kef
Chilly Willy
Very interested
Posts: 2993
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy »

No, the instruction fetch is part of the cycle timing. The only way the fetch would make it longer is if the hardware inserted wait states.
HardWareMan
Very interested
Posts: 753
Joined: Sat Dec 15, 2007 7:49 am

Post by HardWareMan »

I think, every word (system bus is 16 bit) read/write takes 4 clocks (or 8 states) - without additional wait states. First word is opcode, its fetch combined with executing.
Image
So, I think for "MOVE.W #$1234,$A10003" instruction execution flow will be:
Fetch 1 word - opcode MOVE.W
Fetch 2 word - constant #$1234
Fetch 3 word - high address word $00A1
Fetch 4 word - low address word $0003
Write 5 word - write constant #$1234 at address $A10003
And somewhere around write 5 word IO chip do latch constant and set IO port line.
Mask of Destiny
Very interested
Posts: 628
Joined: Thu Nov 30, 2006 6:30 am

Post by Mask of Destiny »

What HardWareMan says is confirmed by the MC68000 User Manual. There are a number of tables in section 8 that give total execution time and the number of read and write cycles. Doing a move.b or move.w with an immediate source and a 32-bit constant address takes 20 cycles total with 4 read and 1 write operation.

Depending on how you do the read, you might not even need any nops. For SLO I use the following code:

Code: Select all

	move.b	#$FF, $a10003	;set TH for controller A
	move.b	$a10003, d7	;CBRLUD
	andi.b	#$3F, d7
	move.b	#0, $a10003
	move.b	$a10003, d6	;SA00UD
	andi.b	#$30, d6
	lsl.b	#2, d6
	or.b	d6, d7		;SACBRLUD
move.b $a10003, d7 should take 16 cycles so depending on exactly when things latch, that's somewhere between 12 and 16 cycles worth of delay. For what it's worth, no one has reported any problems with the controller support in SLO and I've tested it with a number of 3 and 6-button controllers (all 1st party ones though, 3rd party pads could be a problem I suppose).

If you were to do something like this though:

Code: Select all

	lea	$a10003, a0
	move.b	#$FF, (a0)		;set TH for controller A
	move.b	(a0), d7		;CBRLUD
	andi.b	#$3F, d7
	move.b	#0, (a0)
	move.b	(a0), d6		;SA00UD
	andi.b	#$30, d6
	lsl.b	#2, d6
	or.b	d6, d7		;SACBRLUD
You would likely need to add in a nop or two as move.b (a0), d7 should only take 8 cycles. Presumably a decent C compiler would produce something more like the second example rather than the first, but you should check the output if you want to be sure.
furrykef
Interested
Posts: 30
Joined: Mon Jul 21, 2008 7:28 pm

Post by furrykef »

Mask of Destiny wrote:but you should check the output if you want to be sure.
I don't like the idea of relying on examining the compiler output for a decision like that. That gratuitously ties your code down to that particular version of that particular compiler... if you switch to a compiler that produces different code, you may need the NOPs again -- and completely fail to detect the problem, especially if your emulator doesn't require you to wait the 16 cycles. If you put 'em there, the code always works. Considering that I really don't think you need to be worrying about wasting 8 (or 16 for two controllers) cycles every frame, I doubt it's worth the trouble to omit them even if you technically can. If you care that much, write the whole routine in ASM and there'll be no doubt about its performance. ;) But then you should probably write the whole game in ASM in that case.

- Kef
TmEE co.(TM)
Very interested
Posts: 2452
Joined: Tue Dec 05, 2006 1:37 pm
Location: Estonia, Rapla City
Contact:

Post by TmEE co.(TM) »

For any more serious MD dev, you need a flashcart or something else to run your code on the real deal... there's a lot of stuff you can do in emulators and not on real HW. Try using BTST on VDP ;)
Mida sa loed ? Nagunii aru ei saa ;)
http://www.tmeeco.eu
Files of all broken links and images of mine are found here : http://www.tmeeco.eu/FileDen
Post Reply