Mail Archives: djgpp/1998/12/10/05:01:53
On Wed, 9 Dec 1998, Leonid Pauzner wrote:
> > Somebody needs to establish how to fix it. Please consider debugging
> > this problem on your machine, since no one of the people who work on
> > DJGPP development have access to a machine with no FPU.
>
> Well, I have a very limited experience with debugging
It doesn't take too much experience to crack these problems, it only
takes some motivation and a bit of mundane work. I even wrote a
section in the FAQ (section 12.2) that explains how to start a
debugging session using the crash traceback info.
Here's what I've done in this case, to find out the reason (see my
other mail for its description):
- Compiled and linked with -g a simple program that used floating
point.
- Set 387=n (since that was on a machine with an FPU) and set
emu387=c:/djgpp/bin/emu387.dxe.
- Run the program and get the crash, then run `symify' on it. The
traceback pointed to the first FP instruction in the program.
Since the crash says SIGNOFP, meaning the emulator is not present,
the immediate guess was that the emulator is somehow not
installed (as opposed to the case where it is installed but
doesn't work correctly: that case would probably yield a SIGFPE).
- Looked at library sources of the function that installs the
emulator (file npxsetup.c). There are several possible causes for
the failure to load the emulator, so sources alone were not enough
to find the reason.
- Run the program under a debugger, set a breakpoint inside the
function `npxsetup' and stepped through its instructions (you need
either an assembly-level debugger, such as FSDB, or to use
assembly-level commands of GDB/RHIDE, like `stepi', `nexti', etc.;
I used FSDB). This clearly showed that the problem happens
because a call to `_dxe_load' returns a NULL pointer, meaning that
it failed to load the emulator.
- Run the program under a debugger again, this time set a breakpoint
inside `_dxe_load', and stepped through it. This clearly shows
that the test of the magic signature "DXE1" in the emulator header
fails.
- Looked at the file emu387.dxe with Less (you can use any other
program that displays a binary file, e.g. `od' from Textutils).
This immediately made evident that the signature is wrong: it's
"1EXD" in the version supplied with v2.02, whereas emu387.dxe from
v2.01 has the correct signature "DXE1".
- Edited emu387.dxe with a binary editor (I used the `hexl' feature
of Emacs, but any other binary editor will do) and changed the
signature to the right one. Run my test program again; it crashed
again.
- Run the test program under the debugger yet again. This time,
`_dxe_load' passes the signature test, but fails later, when it
uses other fields in the DXE header.
- Looked closer at the two versions of emu387.dxe (from v2.02 as
opposed to v2.01). This time, I saw that ALL the other fields of
the DXE header, which are 4-byte integers, are byte-reversed,
which also explained how the "DXE1" signature got reversed.
- Concluded that `dxegen', the program that generates emu387.dxe,
somehow didn't put the bytes in the correct order (since v2.02 was
built on a Unix box with a big-endian byte order).
I don't think the above is a complicated procedure. I think anybody
with enough motivation should be able to do it.
- Raw text -