Discussion:
[Bug gdb/21603] New: powerpc-linux-gnu-gdb throws internal error on remote debugging: 'gdbarch!=NULL' failed
david at engster dot org
2017-06-16 12:41:02 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=21603

Bug ID: 21603
Summary: powerpc-linux-gnu-gdb throws internal error on remote
debugging: 'gdbarch!=NULL' failed
Product: gdb
Version: 8.0
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: gdb
Assignee: unassigned at sourceware dot org
Reporter: david at engster dot org
Target Milestone: ---

This is with latest gdb 8.0, configured solely with
'--target=powerpc-linux-gnu' on a GNU/Linux 64bit host system. When I try to
remote debug through gdbserver on a PowerPC e500v2 system, I get the following
assertion:

(gdb) target remote 172.20.5.224:2345
Remote debugging using 172.20.5.224:2345
Reading symbols from /bsp/sysroot/lib/ld.so.1...(no debugging symbols
found)...done.
gdbarch.c:3228: internal-error: int
gdbarch_elf_make_msymbol_special_p(gdbarch*): Assertion `gdbarch != NULL'
failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) y

Version 7.12.1 has the same problem, but version 7.11 works, so this must be
due to a change between those two versions.
--
You are receiving this mail because:
You are on the CC list for the bug.
lassi.niemisto at wapice dot com
2017-08-25 09:43:01 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=21603

Lassi Niemistö <lassi.niemisto at wapice dot com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |lassi.niemisto at wapice dot com

--- Comment #1 from Lassi Niemistö <lassi.niemisto at wapice dot com> ---
I get the same with:
* MPC8309E
* Debugged SW compiled with GCC 4.8.2 -ggdb3
* Remote debugging or core dump analysis
* Custom GDB 8.0 build with
--enable-targets=powerpc-linux,powerpc-freebsd,powerpc-elf,powerpc-eabi

Attaching a sample binary and sample core file for easy recreation.
--
You are receiving this mail because:
You are on the CC list for the bug.
lassi.niemisto at wapice dot com
2017-08-25 09:44:41 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=21603

--- Comment #2 from Lassi Niemistö <lassi.niemisto at wapice dot com> ---
Created attachment 10364
--> https://sourceware.org/bugzilla/attachment.cgi?id=10364&action=edit
Core file created by running on PowerPC
--
You are receiving this mail because:
You are on the CC list for the bug.
lassi.niemisto at wapice dot com
2017-08-25 09:46:15 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=21603

--- Comment #3 from Lassi Niemistö <lassi.niemisto at wapice dot com> ---
Created attachment 10365
--> https://sourceware.org/bugzilla/attachment.cgi?id=10365&action=edit
Crashing demo program for PowerPC
--
You are receiving this mail because:
You are on the CC list for the bug.
lassi.niemisto at wapice dot com
2017-08-25 09:47:34 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=21603

--- Comment #4 from Lassi Niemistö <lassi.niemisto at wapice dot com> ---
Created attachment 10366
--> https://sourceware.org/bugzilla/attachment.cgi?id=10366&action=edit
Source code for the crashing sample program
--
You are receiving this mail because:
You are on the CC list for the bug.
lassi.niemisto at wapice dot com
2018-07-10 02:18:54 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=21603

--- Comment #5 from Lassi Niemistö <lassi.niemisto at wapice dot com> ---
This would be very important to get fixed.. prevents usage on powerpc..
--
You are receiving this mail because:
You are on the CC list for the bug.
lassi.niemisto at wapice dot com
2018-08-02 03:39:42 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=21603

--- Comment #6 from Lassi Niemistö <lassi.niemisto at wapice dot com> ---
Still happens on 8.1.1
--
You are receiving this mail because:
You are on the CC list for the bug.
sergiodj at redhat dot com
2018-08-02 23:04:11 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=21603

Sergio Durigan Junior <sergiodj at redhat dot com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |sergiodj at redhat dot com

--- Comment #7 from Sergio Durigan Junior <sergiodj at redhat dot com> ---
Out of curiosity, could you please try with git HEAD?
--
You are receiving this mail because:
You are on the CC list for the bug.
lassi.niemisto at wapice dot com
2018-08-03 06:34:21 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=21603

--- Comment #8 from Lassi Niemistö <lassi.niemisto at wapice dot com> ---
Yes it happens. Building on Ubuntu 14.04 64bit if relevant.
--
You are receiving this mail because:
You are on the CC list for the bug.
lassi.niemisto at wapice dot com
2018-08-03 13:05:12 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=21603

--- Comment #9 from Lassi Niemistö <lassi.niemisto at wapice dot com> ---
Decided to dig a bit deeper:

The problematic scenario starts from function add_vsyscall_page, which calls
symbol_file_add_from_memory with this strange "filename" 'system-supplied DSO
at %s'. This call chain ends up to gdbarch_elf_make_msymbol_special_p(gdbarch*)
and hits the assert.

symbol_file_add_from_bfd
symbol_file_add_with_addrs
syms_from_objfile
reread_symbols
read_symbols
sym_read
elf_read_minimal_symbols
elf_symtab_read ST_REGULAR
get_objfile_arch returns NULL --> passed to gdbarch_elf_make_msymbol_special_p

Big question for a complete gdb dev noob like me is whether or not the gdbarch
should be filled for this kind of strange DSO object? If it should, is the
correct place get_objfile_bfd_data which eventually queries it from
gdbarch_find_by_info?

The built-in debug prints of gdbarch_find_by_info are the following during the
problematic scenario:
gdbarch_find_by_info: info.bfd_arch_info powerpc:vle
gdbarch_find_by_info: info.byte_order 0 (big)
gdbarch_find_by_info: info.osabi 5 (GNU/Linux)
gdbarch_find_by_info: info.abfd 0x3d5dbf0
gdbarch_find_by_info: info.tdep_info 0x0
gdbarch_find_by_info: Target rejected architecture

Whereas earlier when symbols are processed for my main binary, it has printed:
gdbarch_find_by_info: info.bfd_arch_info powerpc:common
gdbarch_find_by_info: info.byte_order 0 (big)
gdbarch_find_by_info: info.osabi 5 (GNU/Linux)
gdbarch_find_by_info: info.abfd 0x3cd0e90
gdbarch_find_by_info: info.tdep_info 0x0
gdbarch_find_by_info: New architecture 0x3dc7380 (powerpc:common) selected

So for some whatever reason the DSO symbol has different architecture info and
this architecture is "not supported" by my GDB build? Even though I build it
with --enable-targets=all.
--
You are receiving this mail because:
You are on the CC list for the bug.
simon.marchi at ericsson dot com
2018-08-03 16:09:24 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=21603

Simon Marchi <simon.marchi at ericsson dot com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |simon.marchi at ericsson dot com

--- Comment #10 from Simon Marchi <simon.marchi at ericsson dot com> ---
(In reply to Lassi Niemistö from comment #9)
Post by lassi.niemisto at wapice dot com
The problematic scenario starts from function add_vsyscall_page, which calls
symbol_file_add_from_memory with this strange "filename" 'system-supplied
DSO at %s'. This call chain ends up to
gdbarch_elf_make_msymbol_special_p(gdbarch*) and hits the assert.
symbol_file_add_from_bfd
symbol_file_add_with_addrs
syms_from_objfile
reread_symbols
read_symbols
sym_read
elf_read_minimal_symbols
elf_symtab_read ST_REGULAR
get_objfile_arch returns NULL --> passed to
gdbarch_elf_make_msymbol_special_p
Big question for a complete gdb dev noob like me is whether or not the
gdbarch should be filled for this kind of strange DSO object? If it should,
is the correct place get_objfile_bfd_data which eventually queries it from
gdbarch_find_by_info?
The built-in debug prints of gdbarch_find_by_info are the following during
gdbarch_find_by_info: info.bfd_arch_info powerpc:vle
gdbarch_find_by_info: info.byte_order 0 (big)
gdbarch_find_by_info: info.osabi 5 (GNU/Linux)
gdbarch_find_by_info: info.abfd 0x3d5dbf0
gdbarch_find_by_info: info.tdep_info 0x0
gdbarch_find_by_info: Target rejected architecture
Whereas earlier when symbols are processed for my main binary, it has
gdbarch_find_by_info: info.bfd_arch_info powerpc:common
gdbarch_find_by_info: info.byte_order 0 (big)
gdbarch_find_by_info: info.osabi 5 (GNU/Linux)
gdbarch_find_by_info: info.abfd 0x3cd0e90
gdbarch_find_by_info: info.tdep_info 0x0
gdbarch_find_by_info: New architecture 0x3dc7380 (powerpc:common) selected
So for some whatever reason the DSO symbol has different architecture info
and this architecture is "not supported" by my GDB build? Even though I
build it with --enable-targets=all.
The first one (the rejected one) has "powerpc:vle", whereas the second one has
"powerpc:common". Why did you expect "powerpc:vle" to be chosen?
Instinctively, I would expect the same arch to be chosen for the vsyscall page
than for the main objfile. Can you see if the architecture "powerpc:common"
has also been rejected for the vsyscall page? If so, can you step in the
decision process to see why?
--
You are receiving this mail because:
You are on the CC list for the bug.
simon.marchi at ericsson dot com
2018-08-03 16:22:40 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=21603

--- Comment #11 from Simon Marchi <simon.marchi at ericsson dot com> ---
(In reply to Simon Marchi from comment #10)
Post by simon.marchi at ericsson dot com
The first one (the rejected one) has "powerpc:vle", whereas the second one
has "powerpc:common". Why did you expect "powerpc:vle" to be chosen?
Instinctively, I would expect the same arch to be chosen for the vsyscall
page than for the main objfile. Can you see if the architecture
"powerpc:common" has also been rejected for the vsyscall page? If so, can
you step in the decision process to see why?
Actually, forget about this, I did not understand the process right. I was
able to reproduce the crash using the core you provided.

The "vle" mach comes from the BFD library. We open the BFD from memory, and
the BFD library decides it's of the "powerpc" arch, "vle" microarch (the
numerical value is 84). Then, we try to look up a gdbarch corresponding to
that BFD arch. However, GDB knows nothing bfd_mach_ppc_vle. The powerpc
gdbarch init code looks in this table of variants:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=gdb/rs6000-tdep.c;h=e78de49b2e69808966fa77d0e1ba3b071dfe540e;hb=HEAD#l3029

but vle is not present there. So either:

1. BFD is wrong about the micro architecture, it should not be vle
2. GDB should know about the vle microarchitecture
--
You are receiving this mail because:
You are on the CC list for the bug.
palves at redhat dot com
2018-08-06 11:28:00 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=21603

Pedro Alves <palves at redhat dot com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |palves at redhat dot com

--- Comment #12 from Pedro Alves <palves at redhat dot com> ---
Sounds related to bug 19797.
--
You are receiving this mail because:
You are on the CC list for the bug.
lassi.niemisto at wapice dot com
2018-08-07 04:53:53 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=21603

--- Comment #13 from Lassi Niemistö <lassi.niemisto at wapice dot com> ---
Thanks for comments. Some more findings:
* The executable file load does not cause issues, it is the core file load part
* Also gdb 7.9.1 (last version working fine) involves loading this "system
supplied DSO" as the last thing upon core file load but with it it ends up
searching for powerpc:common and not powerpc:vle
--
You are receiving this mail because:
You are on the CC list for the bug.
lassi.niemisto at wapice dot com
2018-08-07 05:18:41 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=21603

--- Comment #14 from Lassi Niemistö <lassi.niemisto at wapice dot com> ---
And I can confirm our architecture has nothing to do with vle, so it would mean
it is the BFD lib who gets this wrong. Is the BFD library statically built into
GDB as I can see at least some of its sources under binutils-gdb?
--
You are receiving this mail because:
You are on the CC list for the bug.
lassi.niemisto at wapice dot com
2018-08-07 07:57:28 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=21603

--- Comment #15 from Lassi Niemistö <lassi.niemisto at wapice dot com> ---
The file bfd/elf32-ppc.c has been modified between the versions and there is
now a new function

/* When defaulting arch/mach, decode apuinfo to find a better match. */
_bfd_elf_ppc_set_arch (bfd *abfd)

..which thinks to find PPC_APUINFO_VLE
--
You are receiving this mail because:
You are on the CC list for the bug.
lassi.niemisto at wapice dot com
2018-08-07 09:12:22 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=21603

Lassi Niemistö <lassi.niemisto at wapice dot com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |amodra at gmail dot com

--- Comment #16 from Lassi Niemistö <lassi.niemisto at wapice dot com> ---
Adding Alan Modra to the CC list if we could get some comment on this.
--
You are receiving this mail because:
You are on the CC list for the bug.
amodra at gmail dot com
2018-08-07 18:53:43 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=21603

--- Comment #17 from Alan Modra <amodra at gmail dot com> ---
The core file load1 segment contains an image of a kernel vdso that has a
.PPC.EMB.apuinfo section of size 24. That section contains

p/x contents[0]@24
$2 = {0x0, 0x0, 0x0, 0x8, 0x0, 0x0, 0x0, 0x4, 0x0, 0x0, 0x0, 0x2, 0x41, 0x50,
0x55, 0x69, 0x6e, 0x66, 0x6f, 0x0, 0x1, 0x4, 0x0, 0x1}

So there is a single apuinfo word, 0x01040001 saying PPC_APUINFO_VLE (high 16
bits) revision 1 (low 16 bits).

BFD is therefore correctly setting arch/mach to "powerpc:vle" for this object.

So the question becomes how did PPC_APUINFO_VLE become set? I wonder how old a
toolchain was used to build your kernel. If gas had VLE support but lacked git
commit fbd940576f 2014-08-22 the that might be the cause. See
https://sourceware.org/ml/binutils/2014-08/msg00217.html
--
You are receiving this mail because:
You are on the CC list for the bug.
lassi.niemisto at wapice dot com
2018-08-08 06:14:08 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=21603

--- Comment #18 from Lassi Niemistö <lassi.niemisto at wapice dot com> ---
Thanks Alan for the analysis!

We are running binutils 2.24 plus some patches on top of it. Git log for this
2.24 tag tells it has at least the main commit of VLE support:
b9c361e0ad33f2c841067fd4bf0959a72ad5a265 Add support for PowerPC VLE.

And indeed the fixup commit fbd940576f seems absent (2.24 is dated 2013-12-02).
The first version with the fix seems to be 2.25.

This should explain the results and we shall primarily mitigate the issue in
our project by updating the toolchain. To achieve a one-gdb-for-all build that
works with legacy branches, we might patch the gdb to skip parsing apuinfo for
good as we know the architecture anyway in this case.
--
You are receiving this mail because:
You are on the CC list for the bug.
david at engster dot org
2018-08-08 07:46:08 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=21603

--- Comment #19 from David Engster <david at engster dot org> ---
Yes, our toolchain is very old as well, we're still using Sourcery G++ Lite
2011.03-38, which has binutils 2.20. Thank you Lassi for digging into this!
--
You are receiving this mail because:
You are on the CC list for the bug.
Loading...