[lsb-discuss] Checking Firefox for LSB-compliance

Jeff Licquia jeff at licquia.org
Fri Apr 20 12:57:53 PDT 2007


On Fri, 2007-04-20 at 09:41 -0400, Robert Schweikert wrote:
> I have reproduced the problem on x86_64 with the firefox-bin recently 
> downloaded from mozilla for version 2.0.0.3. The good news is appcheck 
> still dumps, the bad new is it dumps in a different place as compared to 
> the stack trace posted by Jeff.
> 
> However, the dumps may be related.

Fun.

> The rot cause for the dump is a bad address calculation hdr.c on line 
> 28.  Here we add the address stored in the ElfFile struct  in the addr  
> member (file1->addr) and the address we find in the Elf64_Ehdr struct 
> for the shared offset (e_shoff) (hdr1->e_shoff).  While the earlier 
> calculation for paddr worked (3 lines previous) the shared off set 
> calculation fails. THerefore we are setting the saddr member in the 
> ElfFile struct to garbage. Once this is accessed things of course blow up.
> 
> I don't know enough about ELF to keep debugging this effectively on my own.

Here's the offending code:

if( hdr1->e_shoff ) {
	file1->saddr=(Elf_Shdr *)((caddr_t)file1->addr+hdr1->e_shoff);
	file1->numsh = hdr1->e_shnum;
	}

"file1" is the ElfFile * in question.  It has an "addr" member, which is
set to point to an mmap() of the ELF file header.  "hdr1" is actually
set to file1->addr, typecase to Elf_Phdr *.

So if e_shoff is coming through incorrectly, that could mean that appchk
is misreading the ELF header from the Mozilla executable.

One possible reason for that is if we're checking an ia32 executable
with an x86_64 appchk, or vice versa.  What architecture did the Mozilla
binary represent itself as?

> The stack trace posted by Jeff points to a "garbage pointer" culprit. 
> The call that actually causes the dump in (I presume IA32) the original 
> post is the strdup() call. strdup of course allocates and frees memory. 
> Looks to me we are passing garbage to strdup(). This garbage could be 
> triggered by a similar miscalculation of the saddr, by luck we don't 
> blow up on the other platform as early as we do on x86_64 (my best guess).

If we're passing garbage to strdup() in my trace, then I can't figure
out where the garbage is.  Here's the full function being called:

void output_purpose_start(unsigned int activity, unsigned int tpnumber,
                          const char *message)
{
    current_tpnum = tpnumber;
    if ((message != NULL) && (strlen(message) > 0))
        current_purpose = strdup(message);
    else
        current_purpose = NULL;
}

current_purpose and current_tpnum are globals.  The "message" parameter
is reported to be OK by gdb (the backtrace output lists its contents as
"XSetInputFocus").

On the other hand, here's the code which calls that function:

    symbol_name = ElfGetStringIndex(file, syms1[i].st_name,
                                    file->dynsymhdr->sh_link);

    PURPOSE_START(tetj_activity_count, tetj_tp_count, symbol_name);

(PURPOSE_START is a macro which, in this case, evaluates to the above
function.)

So, yes, the ultimate source for that string is an ElfFile *.  Which,
given Robert's evaluation, is very suspect.  I wonder if we might be
using the memory after freeing it, or something.

Incidentally, if the program is run under valgrind, it does not throw a
SIGABRT.  Ditto for MALLOC_CHECK_ (even at 2, which should immediately
abort at these kinds of crashes).  As far as I can tell, building with
mtrace() and running with MALLOC_TRACE on does not seem to make a
difference, although it doesn't seem to tell us anything interesting
when it does crash.




More information about the lsb-discuss mailing list