BCD computations ?
Moderator: Stef
BCD computations ?
Since the 68000 has instructions for computing in BCD (Binary Coded Decimal), I was wondering if it was possible to make calculations in BCD with SGDK (for numbers aimed to be displayed in decimal form, it's faster than having to convert them to decimal each time I need to display them).
If it's not implemented, is there a way to do it anyway ?
If it's not implemented, is there a way to do it anyway ?
-
- Very interested
- Posts: 3131
- Joined: Thu Nov 30, 2006 9:46 pm
- Location: France - Sevres
- Contact:
Re: BCD computations ?
The 68000 only has 3 BCD instructions :
ABCD, SBCD and NBCD.
I can add specific functions for it (AddBCD, SubBCD and NegBCD) but it would almost kill the purpose of using it (speed), i think the best solution is still to directly use inline assembly to get direct access to them.
ABCD, SBCD and NBCD.
I can add specific functions for it (AddBCD, SubBCD and NegBCD) but it would almost kill the purpose of using it (speed), i think the best solution is still to directly use inline assembly to get direct access to them.
Re: BCD computations ?
Ok and thanks
If I remember correctly, you told me that using inline assembler with GCC is a real pain in the neck. Is there an exemple among SGDK sources ?
If I remember correctly, you told me that using inline assembler with GCC is a real pain in the neck. Is there an exemple among SGDK sources ?
-
- Very interested
- Posts: 3131
- Joined: Thu Nov 30, 2006 9:46 pm
- Location: France - Sevres
- Contact:
Re: BCD computations ?
joy.c contains some inline assembly but basically just for nop operation, I don't think I used it elsewhere... Unfortunately it's true that using parameter or variable through online assembly is definitely not straightforward, and I'm not sure SGDK has any good examples for that :-/
Re: BCD computations ?
From this thread.
Here it is:
Somewhere online is good docs about inline gcc assembly.
Here it is:
Code: Select all
// abcd d0, d1
static u8 abcd(u8 d0, u8 d1, u8 flags_in, u8* flags_out)
{
u8 res;
asm volatile ("move %4,%%cc\n"
"abcd %3,%0\n"
"move %%sr,%1\n"
: "=d" (res), "=d" (*flags_out)
: "0" (d1), "d" (d0), "d" (flags_in)
: "cc"
);
return res;
}
-
- Very interested
- Posts: 3131
- Joined: Thu Nov 30, 2006 9:46 pm
- Location: France - Sevres
- Contact:
Re: BCD computations ?
Thanks r57shell, that is !
Definitely we cannot say it's really straightforward to do inline assembly when you see that awful syntax :p
Definitely we cannot say it's really straightforward to do inline assembly when you see that awful syntax :p
-
- Very interested
- Posts: 484
- Joined: Sat Mar 05, 2011 11:11 pm
- Location: Berlin, Germany
Re: BCD computations ?
If you want performance switch to Assembly.Stef wrote:Thanks r57shell, that is !
Definitely we cannot say it's really straightforward to do inline assembly when you see that awful syntax :p
UMDK Fanboy
Re: BCD computations ?
The problem is that the alternative in this case is to do lots of divisions and that's awful (each division is 140 cycles or about 1/3 of a scanline, for context). And that's assuming 16-bit values, don't get me started on the mess that would be 32-bit (e.g. for score, which tends to be able to become large enough). This is probably one of those cases where specific optimization is a reasonable idea, C has no easy way to express BCD operations and GCC won't pick it up on complex code.
Also that function definitely needs to be marked inline if you want to avoid problems (in particular, it tells GCC that the function doesn't have to exist in ROM at all and hence not issue it to the linker). Also 16-bit and 32-bit variants (using BCD + rotate) would be nice, as well as functions for substraction.
For the record, not sure if volatile is needed here. The idea behind volatile is to prevent GCC from trying to reorder or optimize the code (which is something that can break when accessing hardware ports that need to be done in a particular order or even a specific timing), but here that shouldn't be a problem, right?
Wait until you see the options x86 has to take. It's really more of a problem memorizing this than actually using it, though.Stef wrote:Thanks r57shell, that is !
Definitely we cannot say it's really straightforward to do inline assembly when you see that awful syntax :p
Also that function definitely needs to be marked inline if you want to avoid problems (in particular, it tells GCC that the function doesn't have to exist in ROM at all and hence not issue it to the linker). Also 16-bit and 32-bit variants (using BCD + rotate) would be nice, as well as functions for substraction.
For the record, not sure if volatile is needed here. The idea behind volatile is to prevent GCC from trying to reorder or optimize the code (which is something that can break when accessing hardware ports that need to be done in a particular order or even a specific timing), but here that shouldn't be a problem, right?
Sik is pronounced as "seek", not as "sick".
Re: BCD computations ?
Have you tried the number-to-string function from Facebook (IIRC)?
It used a lot of ifs instead of division, and it was faster on x86_64.
It used a lot of ifs instead of division, and it was faster on x86_64.
Re: BCD computations ?
@ r57shell : thanks. Indeed it's not straightforward, and I'll need to RTFM sooner or later...
@ MintyTheCat : I believe we already had this discussion elsewhere. I want to design my program in a high-level language (C seems to be the only choice, not particularly high-level) and I'll switch to asm once the design is finished, if performances are needed. This particular point is just a situation where C forces me to use an algorithmically bad decision.
@ sik : I'll make it a macro (since SGDK's gcc doesn't handle inline functions) and extend it to 32 bits.
@ cero : no, I don't see what you're referring to (there's a user named Facebook?)
For the moment, I stay with decimal conversion, it's not performance critical, most of optimizations must be done elsewhere, but I definitely keep your answers somewhere.
@ MintyTheCat : I believe we already had this discussion elsewhere. I want to design my program in a high-level language (C seems to be the only choice, not particularly high-level) and I'll switch to asm once the design is finished, if performances are needed. This particular point is just a situation where C forces me to use an algorithmically bad decision.
@ sik : I'll make it a macro (since SGDK's gcc doesn't handle inline functions) and extend it to 32 bits.
@ cero : no, I don't see what you're referring to (there's a user named Facebook?)
For the moment, I stay with decimal conversion, it's not performance critical, most of optimizations must be done elsewhere, but I definitely keep your answers somewhere.
Re: BCD computations ?
The site O_Otryphon wrote:(there's a user named Facebook?)
I doubt it'll be of much help though since I'm 100% sure it was optimized for modern architectures. I had considered once making a fast 32-bit integer to string subroutine in assembly, but again it requires the BCD instructions (and a good bunch of look-up tables). At least could be called as a normal function. Still, even without the tables (and hence a slower variant) it would probably be much faster than the alternative (lots of divisions).
Sik is pronounced as "seek", not as "sick".
Re: BCD computations ?
There's a user named "The site" ???Sik wrote:The site O_Otryphon wrote:(there's a user named Facebook?)
Joke aside, where precisely on Facebook? AFAIK, neither spritesmind nor SGDK have a Facebook account, and anyway, me neither (and I don't intend to have one).
Re: BCD computations ?
It was plain C, and m68k instruction tables say a branch is 10 cycles, so it'd be faster there as well. No need to go asm when there's algorithmic improvements available in a higher language.Sik wrote:I doubt it'll be of much help though since I'm 100% sure it was optimized for modern architectures. I had considered once making a fast 32-bit integer to string subroutine in assembly, but again it requires the BCD instructions (and a good bunch of look-up tables). At least could be called as a normal function. Still, even without the tables (and hence a slower variant) it would probably be much faster than the alternative (lots of divisions).
It wasn't on Facebook the site, but rather a blog of some Facebook engineer. I don't have a link, sorry.
-
- Very interested
- Posts: 484
- Joined: Sat Mar 05, 2011 11:11 pm
- Location: Berlin, Germany
Re: BCD computations ?
It depends on how you measure it: I write more efficient Assembly code than GCC compiled C but your mileage may differ.
UMDK Fanboy
Re: BCD computations ?
OK decided to look up the Facebook algorithm. It still uses division apparently, it's just that it's handling two digits at once instead of one and tries to bail out early if it notices the number is small (most numbers are). Whoops. (makes sense, the compiler turns the division into a multiplication by the reciprocal, and multiplication is relatively fast on modern hardware).
Anyway, for the sake of completeness: my idea was to check every bit in the integer. If bit 0 is set, add BCD 1. If bit 1 is set, add BCD 2. If bit 2 is set, add BCD 4. If bit 3 is set, add BCD 8... you get the idea. When you're done with all bits you'll have the same value as BCD, which should be trivial to convert into a string. Note that this is the naïve implementation. You also have to take into account the sign (if the integer is signed), but that shouldn't be much of a problem, just negate if the value is negative and add a minus sign to the string if that happened.
I mentioned the use of tables to speed that up. The idea is to take multiple bits at a time and use them as an index into a look-up table. This way instead of doing 31 checks you're doing just a handful of them, which should speed up things greatly. The downside is more ROM usage. Not sure how significant this memory waste is in practice.
Anyway, for the sake of completeness: my idea was to check every bit in the integer. If bit 0 is set, add BCD 1. If bit 1 is set, add BCD 2. If bit 2 is set, add BCD 4. If bit 3 is set, add BCD 8... you get the idea. When you're done with all bits you'll have the same value as BCD, which should be trivial to convert into a string. Note that this is the naïve implementation. You also have to take into account the sign (if the integer is signed), but that shouldn't be much of a problem, just negate if the value is negative and add a minus sign to the string if that happened.
I mentioned the use of tables to speed that up. The idea is to take multiple bits at a time and use them as an index into a look-up table. This way instead of doing 31 checks you're doing just a handful of them, which should speed up things greatly. The downside is more ROM usage. Not sure how significant this memory waste is in practice.
Sik is pronounced as "seek", not as "sick".