[Bug threads/22882] New: GDB hangs when calling inferior functions in threaded code

Discussion:

stephen.roberts at arm dot com

2018-02-23 11:13:26 UTC

https://sourceware.org/bugzilla/show_bug.cgi?id=22882

Bug ID: 22882
Summary: GDB hangs when calling inferior functions in threaded
code
Product: gdb
Version: HEAD
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: threads
Assignee: unassigned at sourceware dot org
Reporter: stephen.roberts at arm dot com
Target Milestone: ---

The hang occurs when GDB tries to call inferior functions on two different
threads with scheduler-locking turned on. The first call works fine, with the
call to infrun_async(1) causing the signal_handler to be marked and the event
to be handled, but then the event loop resets the "ready" member to zero,
while leaving infrun_is_async set to 1. As a result, GDB hangs if the user
switches to another thread and calls a second function because calling
infrun_async(1) a second time has no effect, meaning the inferior call events
are never handled.

On closer inspection, the threads which hang are always ones which did not hit
the breakpoint but were stopped when another thread did hit a breakpoint.
Threads which are stopped at breakpoints are immune to this issue. Loading the
system allows more threads to reach the breakpoint before they are stopped by
gdb.

This issue affects all versions after 7.12 (including HEAD). For reference,
this is a simplified reproducer for this issue, where get_value is a global
function in a multi-threaded binary:

break after_thread_creation
run
set scheduler-locking on
thread 1
call get_value()
thread 2
call get_value()
# GDB hangs here

--
You are receiving this mail because:
You are on the CC list for the bug.

stephen.roberts at arm dot com

2018-02-23 11:18:32 UTC

Permalink

https://sourceware.org/bugzilla/show_bug.cgi?id=22882

--- Comment #1 from Stephen Roberts <stephen.roberts at arm dot com> ---
Created attachment 10844
--> https://sourceware.org/bugzilla/attachment.cgi?id=10844&action=edit
Testcase to reproduce this issue

This testcase manages to reproduce the issue reliably. It's close to 100%, but
in cases of heavy load it is possible for all threads to reach the breakpoint
before GDB can stop them, in which case the bug won't reproduce. In any case,
once this bug is fixed the testcase will pass 100% of the time.

--
You are receiving this mail because:
You are on the CC list for the bug.

cvs-commit at gcc dot gnu.org

2018-06-12 21:09:24 UTC

Permalink

https://sourceware.org/bugzilla/show_bug.cgi?id=22882

--- Comment #2 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Andrew Burgess <***@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=9516f85aea1d9a34d1cd3f59b7b9eeb590e58c70

commit 9516f85aea1d9a34d1cd3f59b7b9eeb590e58c70
Author: Andrew Burgess <***@embecosm.com>
Date: Wed May 23 14:25:20 2018 +0100

gdb: Mark async event handler when event is already pending

In PR22882 inferior functions are called on different threads while
scheduler-locking is turned on. This results in a hang. This was
discussed in this mailing list thread:

https://sourceware.org/ml/gdb/2017-10/msg00032.html

The problem is that when the thread is set running in order to execute
the inferior call, a call to target_async is made. If the target is
not already registered as 'target_async' then this will install the
async event handler, AND unconditionally mark the handler as having an
event pending.

However, if the target is already registered as target_async then the
event handler is not installed (its already installed) and the
handler is NOT marked as having an event pending.

If we try to set running a thread that already has a pending event,
then we do want to set target_async, however, there will not be an
external event incoming (the thread is already stopped) so we rely on
manually marking the event handler as having a pending event in order
to see the threads pending stop event. This is fine, if, at the point
where we call target_async, the target is not already marked as async.
But, if it is, then the event handler will not be marked as ready, and
the threads pending stop event will never be processed.

A similar pattern of code can be seen in linux_nat_target::resume,
where, when a thread has a pending event, the call to target_async is
followed by a call to async_file_mark to ensure that the pending
thread event will be processed, even if target_async was already set.

gdb/ChangeLog:

PR gdb/22882
* infrun.c (resume_1): Add call to mark_async_event_handler.

gdb/testsuite/ChangeLog:

* gdb.threads/multiple-successive-infcall.exp: Remove kfail case,
rewrite test to describe action performed, rather than possible
failure.

--
You are receiving this mail because:
You are on the CC list for the bug.

cvs-commit at gcc dot gnu.org

2018-06-12 21:09:29 UTC

Permalink

https://sourceware.org/bugzilla/show_bug.cgi?id=22882

--- Comment #3 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Andrew Burgess <***@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=1840d81a201932a2d5ad5b089aad85943a5a0a82

commit 1840d81a201932a2d5ad5b089aad85943a5a0a82
Author: Andrew Burgess <***@embecosm.com>
Date: Wed May 23 17:06:02 2018 +0100

gdb: Run INF_EXEC_COMPLETE handler for additional cases

When making an inferior call, and non-stop mode is off, then, once the
inferior call is complete all threads will be stopped, and we should
run the INF_EXEC_COMPLETE handler. This will result in a call to
'target_async(0)' to remove the event handlers for the target.

This was discussed by Yao Qi in this mailing list thread:

https://sourceware.org/ml/gdb/2017-10/msg00032.html

Without this then the target event handlers are left in place even
when the target is stopped, which is different to what happens during
a standard stop proceedure (for example when one thread hits a
breakpoint).

gdb/ChangeLog:

PR gdb/22882
* infrun.c (fetch_inferior_event): If GDB is not proceeding then
run INF_EXEC_COMPLETE handler, even when not calling normal_stop.
Move should_notify_stop local into more inner scope.

--
You are receiving this mail because:
You are on the CC list for the bug.