First a test ROM and source. Essentially, this ROM starts a request with a 1->0 transition on TH and then records values from the IO port in a unrolled loop of move.b instructions to a buffer. This buffer is then analyzed to extract both the data payload and to get a rough estimate of the time between handshake flag transitions. This process is repeated once every 60 frames. A fairly typical run is shown in this screen capture:

Anyway, so I can confirm what is already documented in the Genesis Plus GX source in that TH 1->0 is indeed used to initiate a request, TR=0 indicates valid data and TL toggles each nibble (0 for first and so on). The data order is as follows (MSB->LSB, all buttons are active low)
0: E1, E2, Start, Select
1: (A & A') (B & B') C D
2: Analog L/R High
3: Analog U/D High
4: Always 0
5: Analog Throttle High
6: Analog L/R Low
7: Analog U/D Low
8: Always 0
9: Analog Throttle Low
10: Always F
11: A B A' B'
This almost exactly matches what's in the GPGX source with one exception: there's an extra unused nibble between throttle low and the final A B A' B' nibble. Since this last nibble is presumably only used by games that need to distinguish the "prime" buttons from the normal A & B buttons, this probably does not impact most software.
Now for an interpretation of the delay information. In the screenshot above, there are 3 hex numbers followed by a dash, then another 3, another dash and a final value. The last number is the actual byte read. The first 3 represent delay information related to the first nibble in the byte and the second 3 represent the same information for the second nibble. The first number in a group, is the number of bytes read with the not-ready bit set before the nibble, the second is the number of bytes read before TL toggles and the third is the number of bytes read before TR goes back to indicating the "not ready" state. Both the second and third numbers start counting from the first byte indicating a valid nibble (TR=0) so those delays overlap. As you can see in the screenshot, TR always goes back to 1 before TL toggles for the low nibble (TL=1). What''s not really visible is that the reverse is true for the high nibble (TL=0); however, the gap is much smaller in this case which is why they seem to happen at the same time. In some instances, I'll get a delay of only 6 reads for the TL toggle. As a result, the most reliable method to poll this device is probably to just ignore TL and use TR exclusively.
Since bytes are read from the I/O port with move.b (a2), (a6)+, each byte of delay represents about 12 M68K cycles which is 84 master clock cycles. This gives us the following delay times:
First nibble ready: 3780 cycles, 70 microseconds
TR=0 to TL=1, TR=0 to TR=1: 588 cycles, 11 microseconds
TR=1 to TR=0 (transition from high nibble to low nibble): 252 cycles, 5 microseconds
TR=1 to TR=0 (transition from low nibble to high nibble): 1176 cycles, 22 microseconds
Note that the first delay is sometimes quite a bit longer (as long as 10500 cycles, 196 microseconds), but the rest of the delays are pretty consistent. When I have the chance, I will try to measure these delays with something a bit faster than a Gen/MD so I can get more precise data.