## Faster x mod 384 (x%384)?

**Moderator:** BigEvilCorporation

### Faster x mod 384 (x%384)?

Have you found a way to do "mod 384" faster than doing a division?

Since MD horizontal resolution is 320, not a n^2 number, the closest bigger computer friendly number is 384 (256+128). 512 is too big. As you may know having this kind of number helps a lot in computation, in this case when trying to subdivide playfield into close to screen areas.

Perhaps you have been in this situation before and already have found a solution. Kind to hear your thoughts.

Since MD horizontal resolution is 320, not a n^2 number, the closest bigger computer friendly number is 384 (256+128). 512 is too big. As you may know having this kind of number helps a lot in computation, in this case when trying to subdivide playfield into close to screen areas.

Perhaps you have been in this situation before and already have found a solution. Kind to hear your thoughts.

Not that simple, you have to go much forward to understand it. We simplify too much things we don't believe in.

Where we come from? Perhaps you should begin from that.

In fact there are billions on lassitude, perhaps it means something... since when?

Where we come from? Perhaps you should begin from that.

In fact there are billions on lassitude, perhaps it means something... since when?

### Re: Faster x mod 384 (x%384)?

If you're doing a lot of that, enough that the 140-cycle divu is too slow, you can precompute a table that takes just 2kb space (assuming 16-bit coords).

Code: Select all

```
const u16 table[1024] = { 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, ... };
u16 screen = table[x >> 6];
```

### Re: Faster x mod 384 (x%384)?

Implying that x % 320 = x - floor(x / 320) * 320, all we need is to find floor(x / 320), which is "whole" divisions.

We need cool property.

I'll describe it using decimals system.

Assume you want to divide decimal by 500.

In other words, you want to find floor(x / 500).

Imagine you know already result, and this result is y.

then y*5*100 <= x

in other words, floor of (x/500) is how many y*5 fits into number floor(x / 100) which is just cut off two digits in the end

Now, same in binary: 320 = 32*10 = 32*2*5 = 64*5.

this means, that floor(x/320)=floor(floor(x/64)/5)

where floor(x/64) is simple bitwise shift right by 6. 64 = 2^6.

assume z = floor(x/64), then it is 10 bits long if x is 16 bits long.

now you need only make table to find floor(z/5), for all numbers 10 bits long.

but then you need to multiply result by 320 again.

I suggest using another table for this purpose.

other possible way is some divisibility property.

If latest 6 bits is discardable, all you need is to find remainder of divison by 5.

And, there is only 5 possible remainders: 0, 1, 2, 3, 4, 5.

Somehow (I guess) you may find it as for example in divisibility of 3 in decimals: all digits must have sum divisible by 3.

But, it can be slower.

Edit: easier. If you imagine long division method from school,

you'll find that it will look as:
subtracting 1 0100 0000 equivalent to subtracting 5 (101) shifted by 6.

from opposite, you can't subtract less than 1 0100 0000, this means shift is greater or equal to 6.

This means that it may subtract during process of division at following positions:
and result is equivalent to long division by 5 of shifted x by 6 with goal to get remainder.

result I mean remainder here. Three bits that you'll get, is same. Just because you do literally same.

This means: x % 320 = (floor(x/64)%5) * 64 + ( x % 64 ) = ( ((x >> 6) % 5) << 6) | (x & 0x3F)

where x & 0x3F is bitwise and.

With this method you need to precompute ((x % 5) << 6) table for 10 bits long numbers.

Edit: replaced ceil with floor.

We need cool property.

I'll describe it using decimals system.

Assume you want to divide decimal by 500.

In other words, you want to find floor(x / 500).

Imagine you know already result, and this result is y.

then y*5*100 <= x

in other words, floor of (x/500) is how many y*5 fits into number floor(x / 100) which is just cut off two digits in the end

Now, same in binary: 320 = 32*10 = 32*2*5 = 64*5.

this means, that floor(x/320)=floor(floor(x/64)/5)

where floor(x/64) is simple bitwise shift right by 6. 64 = 2^6.

assume z = floor(x/64), then it is 10 bits long if x is 16 bits long.

now you need only make table to find floor(z/5), for all numbers 10 bits long.

but then you need to multiply result by 320 again.

I suggest using another table for this purpose.

other possible way is some divisibility property.

If latest 6 bits is discardable, all you need is to find remainder of divison by 5.

And, there is only 5 possible remainders: 0, 1, 2, 3, 4, 5.

Somehow (I guess) you may find it as for example in divisibility of 3 in decimals: all digits must have sum divisible by 3.

But, it can be slower.

Edit: easier. If you imagine long division method from school,

you'll find that it will look as:

Code: Select all

```
xxxx xxxx xxxx xxxx |1 0100 0000
-1010 0000 0
.....
-1010 0000 0
....
```

from opposite, you can't subtract less than 1 0100 0000, this means shift is greater or equal to 6.

This means that it may subtract during process of division at following positions:

Code: Select all

```
xxxx xxxx xxxx xxxx |1 0100 0000
-101
-101
-10 1
...
-1 01
```

result I mean remainder here. Three bits that you'll get, is same. Just because you do literally same.

This means: x % 320 = (floor(x/64)%5) * 64 + ( x % 64 ) = ( ((x >> 6) % 5) << 6) | (x & 0x3F)

where x & 0x3F is bitwise and.

With this method you need to precompute ((x % 5) << 6) table for 10 bits long numbers.

Edit: replaced ceil with floor.

Last edited by r57shell on Tue Jul 25, 2017 8:33 am, edited 1 time in total.

### Re: Faster x mod 384 (x%384)?

thanks! r57shell, I will take an slow and carefull look at it.

I prefer 384 over 320 because:

- This number still let you do a small scroll (looks better)

- Playfield, or map in RAM, is 4096x2048 pixels max in size, so using 384 fits better. (In true, playfield can take several sizes but all using numbers of 2^n serie, still 384 fits better).

So we have: 384 = 3*128. Mod of 128 is easy: x AND 127. For "x mod 3": I have found an easy solution:

( From http://homepage.cs.uiowa.edu/~jones/bcd/mod.shtml )

But unfortunately I can't put all together to work because I can't see any correlation. I have to take a better look at it.

I prefer 384 over 320 because:

- This number still let you do a small scroll (looks better)

- Playfield, or map in RAM, is 4096x2048 pixels max in size, so using 384 fits better. (In true, playfield can take several sizes but all using numbers of 2^n serie, still 384 fits better).

So we have: 384 = 3*128. Mod of 128 is easy: x AND 127. For "x mod 3": I have found an easy solution:

( From http://homepage.cs.uiowa.edu/~jones/bcd/mod.shtml )

Code: Select all

```
static inline u32 mod3( u32 a )
{
a = (a >> 16) + (a & 0xFFFF); /* sum base 2**16 digits a <= 0x1FFFE */
a = (a >> 8) + (a & 0xFF); /* sum base 2**8 digits a <= 0x2FD */
a = (a >> 4) + (a & 0xF); /* sum base 2**4 digits a <= 0x3C; worst case 0x3B */
a = (a >> 2) + (a & 0x3); /* sum base 2**2 digits a <= 0x1D; worst case 0x1B */
a = (a >> 2) + (a & 0x3); /* sum base 2**2 digits a <= 0x9; worst case 0x7 */
a = (a >> 2) + (a & 0x3); /* sum base 2**2 digits a <= 0x4 */
if (a > 2) a = a - 3;
return a;
}
```

Not that simple, you have to go much forward to understand it. We simplify too much things we don't believe in.

Where we come from? Perhaps you should begin from that.

In fact there are billions on lassitude, perhaps it means something... since when?

Where we come from? Perhaps you should begin from that.

In fact there are billions on lassitude, perhaps it means something... since when?

### Re: Faster x mod 384 (x%384)?

r57shell, I think you mean floor(), not ceil().

### Re: Faster x mod 384 (x%384)?

Miquel, this C-written mod3 function has about 20 arithmetic operations. I'm afraid, in terms of M68k they will consume more cycles in total than one DIVU

Fast version will require pre-computed array.

Straightforward method is (don't be confused with different meaning of 'x'):

x mod 320 = x - x div 320 * 320.

1. x div 320 = table[x >> 6] - something like 20 cycles

2. x * 320 = (x << 6) * 5

x*5 = x << 2 + x

in asm (something like 24 cycles):

Fast version will require pre-computed array.

Straightforward method is (don't be confused with different meaning of 'x'):

x mod 320 = x - x div 320 * 320.

1. x div 320 = table[x >> 6] - something like 20 cycles

2. x * 320 = (x << 6) * 5

x*5 = x << 2 + x

in asm (something like 24 cycles):

Code: Select all

```
lsl.w #6, x
move.w x, x4
lsl.w #2, x4
add.w x4, x
```

Last edited by GManiac on Mon Jul 24, 2017 9:06 pm, edited 1 time in total.

### Re: Faster x mod 384 (x%384)?

All code below is in Python

A key formula is the "general rule" in the article you linked earlier

Example:
To find residue of dividing by 320 you can find residues of 5 and 64 and then convert them to usual positional notation (it's not trivial)

https://en.wikipedia.org/wiki/Residue_number_system

Conversion example:

http://neo.dmcs.p.lodz.pl/csII/ca2_ResidueNS.pdf

Fast example In Russian:

http://wiki.tgl.net.ru/index.php/%D0%A1 ... 0%BE%D0%B2

Code demo for your task
Anyway, this method takes at least 70 cycles.

A key formula is the "general rule" in the article you linked earlier

Example:

Code: Select all

```
a = 123
m = 5
b = 16
# equal results
print( a % m )
print( ( (b % m) * (a // b) + (a % b) ) % m )
```

https://en.wikipedia.org/wiki/Residue_number_system

Conversion example:

http://neo.dmcs.p.lodz.pl/csII/ca2_ResidueNS.pdf

Fast example In Russian:

http://wiki.tgl.net.ru/index.php/%D0%A1 ... 0%BE%D0%B2

Code demo for your task

Code: Select all

```
# pre-compute table
table = [0]*320
for i in range(320):
table[64 * (i % 5 ) + (i % 64)] = i
a = 65533
mod64 = a & 0x3F;
#a mod 5
a = (a >> 8) + (a & 0xFF) # 256 % 5 = 1, a will become <= 510
a = (a >> 4) + (a & 0x0F) # 16 % 5 = 1, a will become <= 45
if (a > 14): a = a - 15;
if (a > 14): a = a - 15;
if (a > 4): a = a - 5;
if (a > 4): a = a - 5;
mod5 = a
mod320 = table[64*mod5 + mod64]
print(mod320)
```

### Re: Faster x mod 384 (x%384)?

Yes, I mean floor instead ceil. Updated my post.

And, for full desc, here is C code:

Code: Select all

```
u16 rem320(u16 a)
{
return (table[a>>6]<<6) | (a & 0x3F);
}
```

you may run it here: http://cpp.sh/

Second version using *2 instead of <<6, which is faster if your compiler do:

Code: Select all

```
add d0, d0
instead of
lsr #1, d0
```

plus you'll have issues with addressing 2 bytes instead of 1 bytes, because you'll need to discard least significant bit somehow.

you can do all of this with other modulo. results depends on modulo.

### Re: Faster x mod 384 (x%384)?

Great formula.

a = 56959

d1 = 21

d2 = 17

# d1 and d2 are random divisors

a % (d1*d2) == a // d1 % d2 * d1 + a % d1

a % 320 == a // 64 % 5 * 64 + a % 64

a % 320 == (((a >> 6) % 5) << 6) + (a & 0x3F)

r57shell, there's no need in "|" in next line, just use "+".

a = 56959

d1 = 21

d2 = 17

# d1 and d2 are random divisors

a % (d1*d2) == a // d1 % d2 * d1 + a % d1

a % 320 == a // 64 % 5 * 64 + a % 64

a % 320 == (((a >> 6) % 5) << 6) + (a & 0x3F)

r57shell, there's no need in "|" in next line, just use "+".

Code: Select all

`| (a & 0x3F);`

### Re: Faster x mod 384 (x%384)?

To simplify the explanation I imply the operation is with (16bit) integers. But instead is with 16:16 fixed points. So really the operation is something like:

x mod 384<<16 where x can range from 0 to 4096<<16. (Meaning it uses signed 32 bit integers)

I can try to use a lousy trick to get rid off decimals, but I shouldn't...

Not that simple, you have to go much forward to understand it. We simplify too much things we don't believe in.

Where we come from? Perhaps you should begin from that.

In fact there are billions on lassitude, perhaps it means something... since when?

Where we come from? Perhaps you should begin from that.

In fact there are billions on lassitude, perhaps it means something... since when?

### Re: Faster x mod 384 (x%384)?

As I understand, integer part of your 16.16 number is stored in upper word of longword, and fractional part - in lower word.

It's not a problem, just use this formula again

a % (d1*d2) == a // d1 % d2 * d1 + a % d1

384<<16 = 24M (24 mega)

24M = 3 * 128 * 65536

a % 24M == a // 8M % 3 * 8M + a % 8M

a % 24M == (((a >> 23) % 3) << 23) | (a & 0x7FFFFF)

Or, to prevent calculations on longs you can drop fractional part and then restore it.

Code: Select all

```
# save fractional
move.w d0, d1
clr.w d0
swap d0
mod #384, d0
# restore fractional
swap d0
move.w d1, d0
```

Code: Select all

```
move.l $F004, d0 -- read the whole coordinate
move.w $F004, d0 -- read the integer part
```

### Who is online

Users browsing this forum: No registered users and 1 guest