Discussion:
[Bug gdb/18945] New: gdbserver cannot be interrupted on linux when pgid doesn't match pid
jmgao at google dot com
2015-09-09 23:28:00 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=18945

Bug ID: 18945
Summary: gdbserver cannot be interrupted on linux when pgid
doesn't match pid
Product: gdb
Version: 7.10
Status: NEW
Severity: normal
Priority: P2
Component: gdb
Assignee: unassigned at sourceware dot org
Reporter: jmgao at google dot com
Target Milestone: ---

Created attachment 8595
--> https://sourceware.org/bugzilla/attachment.cgi?id=8595&action=edit
patch

In gdbserver/linux-low.c, linux_request_interrupt uses kill(-pid, SIGINT) to
interrupt a process, but this fails when the attached process is a member of
another process group.

Is there any reason that sending the signal to the entire process group is
desired, or can we get away with changing it to kill just the single process?

Trivial patch attached
--
You are receiving this mail because:
You are on the CC list for the bug.
xdje42 at gmail dot com
2015-09-14 16:45:15 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=18945

Doug Evans <xdje42 at gmail dot com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |xdje42 at gmail dot com

--- Comment #1 from Doug Evans <xdje42 at gmail dot com> ---
A description of why things are the way they are can be found in the commit
message of commit 78708b7c8ccc2138880217de9bd60eceff683f10.

Basically, the main thread could have exited, in which case sending a signal
just to it will be a no-op.

Signals can be sent across process groups, are you sure that's the problem?
--
You are receiving this mail because:
You are on the CC list for the bug.
xdje42 at gmail dot com
2015-09-14 16:45:32 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=18945

Doug Evans <xdje42 at gmail dot com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |palves at redhat dot com
--
You are receiving this mail because:
You are on the CC list for the bug.
jmgao at google dot com
2015-09-14 17:29:40 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=18945

--- Comment #2 from Josh Gao <jmgao at google dot com> ---
If the attached process is in some other process group, then a signal sent to
-pid doesn't get sent to anyone (e.g.: http://ideone.com/fnz3oK )

(Some additional context: all applications in android share the process group
of the zygote which it's forked off of, of which there only 1 or 2, depending
on architecture (32-bit only vs 32 + 64))
--
You are receiving this mail because:
You are on the CC list for the bug.
xdje42 at gmail dot com
2015-09-15 01:24:59 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=18945

--- Comment #3 from Doug Evans <xdje42 at gmail dot com> ---
Ah, so what we need to do is have gdbserver track inferior process groups.
[if one wants to keep the current scheme of sending SIGINT to the process
group,
which would be nice, though it's not clear that's the only or best solution
here.
I can also imagine preferring different solutions at different times, which gdb
itself is already doing.]

Still, the attached patch suffers from the issue that the current code tries to
avoid. As a local patch, maybe it's ok to trade one set of issues for another.

I also notice that gdb is using set_sigint_trap + pass_signal (trunk,
amd64-linux) when attached to an inferior, and this also doesn't send SIGINT to
the pgrp (setting aside the "interrupt" command, which uses
linux_nat_interrupt).
--
You are receiving this mail because:
You are on the CC list for the bug.
jmgao at google dot com
2015-09-15 01:33:27 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=18945

--- Comment #4 from Josh Gao <jmgao at google dot com> ---
Sending the signal to the process group feels somewhat wrong to me, since it
seems pretty likely that if PGID != PID, there're going to be other processes
in that group which we're not attached to.

Do you think iteratively pgkill'ing all of the threads in /proc/<pid>/task/*
would work? There's a race condition there if all of the threads in the process
disappear in between reading the directory entries and actually killing them,
but that seems a bit esoteric.
--
You are receiving this mail because:
You are on the CC list for the bug.
xdje42 at gmail dot com
2015-09-15 01:52:10 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=18945

--- Comment #5 from Doug Evans <xdje42 at gmail dot com> ---
Iterating over all threads is actually what gdb does in non-stop mode for the
interrupt command (ref: linux_nat_interrupt). Though even that simplistic
response is inaccurate if one dives a bit deeper (e.g., if !nonstop it sends
SIGINT to the pgrp, otherwise it sends SIGSTOP to each lwp).
This bit of gdb needs some design cleanup, and then some implementation
cleanup.

For reference sake, I wouldn't iterate over /proc/$pid/tasks: It's critical
that gdb be attached to the thread first. There are still races here, but gdb
already has to deal with them so this isn't a new issue: The patch for this bug
should be able to just use the existing APIs and leave the latter to deal with
lower level issues.
So iterating over all threads may be a reasonable thing to do in all cases (not
sure what kind of performance hit there might be if there are 1000 threads;
it's not a blocking issue, but we can't ignore it either).
--
You are receiving this mail because:
You are on the CC list for the bug.
xdje42 at gmail dot com
2015-09-15 02:02:11 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=18945

--- Comment #6 from Doug Evans <xdje42 at gmail dot com> ---
... though if one were doing the iterate_over_lwps method, SIGINT feels like a
problematic choice (e.g., we don't want to cause/report each thread got
SIGINT).

Seems like what we need first is a spec of what ^c means for all variations.
Is there one somewhere? Dunno.
--
You are receiving this mail because:
You are on the CC list for the bug.
jmgao at google dot com
2015-09-16 21:53:08 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=18945

--- Comment #7 from Josh Gao <jmgao at google dot com> ---
Hmm, are you sure the attached patch suffers from the problem that 78708b7 was
addressing? The code prior to 78708b7 was using tkill(pid, sig), which indeed
would fail if the main thread exited. 78708b7 changed it to kill the process
group, but is that any better if a thread that gdb isn't attached to gets the
signal?
--
You are receiving this mail because:
You are on the CC list for the bug.
rarul at rarul dot com
2015-10-27 16:40:06 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=18945

rarul <rarul at rarul dot com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |rarul at rarul dot com

--- Comment #8 from rarul <rarul at rarul dot com> ---
I have faced the same problem.
My short conclusion is: I agree with Josh's patch "gdbserver.patch".

------------------------------------------------------
The problem is like this:
- 0) Considering to debug remote process by using gdbserver.
- 1) Attach the existing process by gdbserver's attach.
- 2) Start local gdb and connect to remote gdbserver by "target remote"
- 3) Continue without adding any break points.
- 4) Request interrupt by sending Ctrl+C to gdb. Then gdbserver sends SIGINT
to the attached procdess.

In 4), gdbserver sends SIGINT not to the process, but to the process group
(-signal_pid).
But the attached process is not always a process group leader.
If not, "kill (-signal_pid, SIGINT)" returns error and fails to interrupt the
attached process.
- A) We cannot interrupt the process attached with gdbserver who is not a
process group leader.

If so, we can interrupt the attached process.
In addition, SIGINT is sent to children of the attached process.
I think this is also the problem.

Note that in 1) if we start the new process (not attach), then gdbserver
fork(), setpgid(), exec(),
so the new process is always a process group leader.

This problem was created by the commit 78708b7c8c
The commit fixed the bug.
- B) We cannot interrupt the process attached with gdbserver whose main thread
exits (pthread_exit()).

Josh's patch can solve both A) and B)
(at least in recent linux version, newer or equal to 2.6, I think...)
------------------------------------------------------
Here I think there is one more point.
- C) Sending Ctrl+C on gdb should perform as if we send Ctrl+C to console.
or
- C) Sending Ctrl+C on gdb, then SIGINT should be sent to a foreground process
group.
I think this SHOULD NOT.
- gdb console is not same as a normal shell console.
- Programmers are debugging the process, not the process group.
I believe C) is wrong.
------------------------------------------------------
--
You are receiving this mail because:
You are on the CC list for the bug.
prosup at 163 dot com
2018-09-11 03:32:38 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=18945

prosup <prosup at 163 dot com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |prosup at 163 dot com

--- Comment #9 from prosup <prosup at 163 dot com> ---
*** Bug 23621 has been marked as a duplicate of this bug. ***
--
You are receiving this mail because:
You are on the CC list for the bug.
prosup at 163 dot com
2018-09-11 06:18:49 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=18945

--- Comment #10 from prosup <prosup at 163 dot com> ---
Created a bug before I found out this one.

If we use gdb locally,it will stop the process normally,because ctrl+c
triggered sending sigint to the program directly(?).
While using gdbserver ,it has to "translate messages" from host gdb to sigint.

I'v tried gdb gdbserver,and call kill(pid,2) directly,it worked.
I tried inspect the signal send to the attached program,call kill(pid,2) or
kill 2 pid in bash,will send sigint(2) to the "attached" pid,send sigstop(19)
to child process.

prove patch provided by Josh is the right approach.
--
You are receiving this mail because:
You are on the CC list for the bug.
Loading...