Potentially faster PRF aplmod definition
While looking at prf.h
, I stumbled across the following definition for aplmod, which contains some oddities:
#define SIGNUM(x) ((0 == (x)) ? 0 : (0 < (x)) ? 1 : -1)
#define FLRDIV(arg1, arg2) \
((0 == (arg2)) ? (arg1) : (arg1) - ((arg2) * ((arg1) / (arg2))))
#define SAC_ND_PRF_APLMOD(arg1, arg2) \
(((0 != FLRDIV (arg1, arg2)) && ((SIGNUM (arg1) != SIGNUM (arg2)))) \
? FLRDIV (arg1, arg2) + (arg2) \
: FLRDIV (arg1, arg2))
As far as I can tell FLRDIV
is just the remainder, except it allows for 0 values. It should not be called FLRDIV as it implies floor division, which it is not.
It could also be defined in a more concise manner: #define FLRDIV(arg1, arg2) ((0 == (arg2)) ? (arg1) : (arg1) % (arg2))
Looking at the other definitions, they don't seem efficient, but admittedly I'm out of my depth here. It could very well be that they are further optimized somewhere or that my cost model is wrong. Just to get the discussion going, would any of these definitions be more efficient?
This version of SIGNUM avoids branching
#define SIGNUM(x) (((x) > 0) - ((x) < 0))
This version of APLMOD replaces the use of the signum inequality, instead using a xor
and a lt
. It views 0 as positive.
#define SAC_ND_PRF_APLMOD(arg1, arg2) \
(((0 != FLRDIV (arg1, arg2)) && (((arg1) ^ (arg2)) < 0)) \
? FLRDIV (arg1, arg2) + (arg2) \
: FLRDIV (arg1, arg2))
This version of APLMOD does the same but also eliminates branching at the cost of an addition and multiplication.
#define SAC_ND_PRF_APLMOD(arg1, arg2) \
(FLRDIV (arg1, arg2)) \
+ (arg2) * ( (0 != FLRDIV ((arg1), (arg2))) && (((arg1) ^ (arg2)) < 0) )
@thomas You were working on optimizations like this, right? Could you give some insight into the cost of these functions? Would they be worth changing?