Mail Archives: djgpp/1997/02/10/02:27:12
In article <N DOT 020797 DOT 081335 DOT 66 AT hrv1-4 DOT worldaccess DOT nl>,
frabb AT worldaccess DOT nl writes
>It is also clear that the 'shorts' of float and long double line up nicely,
>you only have to do some truncation or inserting zero shorts to do the
>conversion. The double however has an offset of shorts + 1 bit. This will
>always make bitshifting necessary when converting. That is the reason why
>programs using double run slightly slower than programs using float or long
>double.
1: doubles are sometimes slower for 1 main reason: they are twice as big
and moving twice as many bytes usually takes longer! On a 387 or 486
moving a 64 bit value across a 32 bit bus explicitly takes more clocks.
On a P5 there *may* be delays caused by cache filling.
2: (With 1 exception) there is *NO* cost to 'converting' any float
format during reads or writes from the fpu. None, Zero clocks. Is there
any other way I can say it? All ops end up as long double during
calculations, so only load/store actions have any difference anyway.
It really does come down to how many bytes get shifted.
Loading and storing long doubles is particularly expensive because it
needs 3x32 bit access's on a 486 or 2x64 bit ones on a P5. Its slower
even though *no* bit format conversion occurs.
The exception: pass a float to a routine expecting a double and gcc will
have to load it through the fpu to do the conversion, with compatible
types gcc simply pushes the raw binary value. This is not as big a
problem as it seems because a: inlined routines won't do this, b: the
values are likely to be in the fpu anyway.
If I tell you I just spent the last 3 months optimising P5 fpu code (for
a 3D geometry pipeline) will you start believing me?
---
Paul Shirley: shuffle chocolat before foobar for my real email address
- Raw text -