Mail Archives: djgpp/1997/02/27/11:54:57
On Wed, 26 Feb 1997, Jesse W. Bennett wrote:
To: jbennett AT ti DOT com
CC: jbennett AT ti DOT com, djgpp AT delorie DOT com
In-reply-to: <Pine DOT LNX DOT 3 DOT 91 DOT 970226105830 DOT 29585A-100000 AT lenny DOT dseg DOT ti DOT com>
(jesse AT lenny DOT dseg DOT ti DOT com)
Subject: Re: Netlib code [was Re: flops...]
--text follows this line--
> I tried this on a Linux box with gcc 2.6.3 and 2.7.2 and the results were
> encouraging, but the pointer based code was still slightly faster.
Did you try to experiment with the various optimization-related
switches to gcc? There are a plethora of them, all described in
section called "Optimize Options" of the gcc on-line docs. I suggest
to try those which seem relevant to your inner loops, looking at the
generated assembly and timing the results, until you find the best
combination.
> L13:
> movl (%edi),%edx
> movl (%esi),%eax
> fld %st(0)
> fmull (%eax,%ecx,8)
> faddl (%edx,%ecx,8)
> fstpl (%edx,%ecx,8)
> incl %ecx
> cmpl %ecx,12(%ebp)
> jg L13
>
> It is not clear to me why the edx and eax registers are being reloaded
> each iteration.
Maybe because GCC allows `a' or `b' to be the same as `c' at the
caller side? Try declaring `a' and `b' const and see if that helps.
- Raw text -