Discussion:
[Bug c++/23639] New: QtCreator stall and timeout after 40 seconds due to GDB
mgarvin at panix dot com
2018-09-12 13:30:44 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23639

Bug ID: 23639
Summary: QtCreator stall and timeout after 40 seconds due to
GDB
Product: gdb
Version: 8.2.1
Status: UNCONFIRMED
Severity: critical
Priority: P2
Component: c++
Assignee: unassigned at sourceware dot org
Reporter: mgarvin at panix dot com
Target Milestone: ---

Environment: Tested with QtCreator (two versions), GCC and Clang++, OpenCV

Symptoms: After a breakpoint is set (anywhere within a certain range of code
statements), QtCreator breaks correctly at the breakpoint, but when 'Continue'
button is pressed, QtCreator relinquishes focus to GDB, which is apparently
stalled. QtCreator detects this, and after 40 seconds, affords the opportunity
to continue or abort GDB. If continued, GDB never returns.

This was first encountered with GDB v7.11, so a new version was compiled and
installed (v8.2). Also, in an attempt to figure this out, QtCreator was
updated to latest. GCC was switched to Clang++.

The latter change initially seemed to fix the problem, since breakpoints set in
the original locations no longer exhibited the error. But the problem had
evidently just shifted, and breakpoints set at a different spot in the code now
exhibit the same behavior.

This does seem like a bug in GDB, or at least in the way that it is failing to
execute certain stretches of code properly. And again, this has shifted, so
it's not as if certain bits of code have triggered it; it seems
position-sensitive.

In all cases, executing the code without setting breakpoints shows no
functional problems. The code itself behaves as expected. Simply setting a
breakpoint and then continuing execution afterward seems to trigger this.
--
You are receiving this mail because:
You are on the CC list for the bug.
mgarvin at panix dot com
2018-09-12 13:32:28 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23639

Mark Garvin <mgarvin at panix dot com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |mgarvin at panix dot com
--
You are receiving this mail because:
You are on the CC list for the bug.
simon.marchi at ericsson dot com
2018-09-12 15:18:19 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23639

Simon Marchi <simon.marchi at ericsson dot com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |simon.marchi at ericsson dot com

--- Comment #1 from Simon Marchi <simon.marchi at ericsson dot com> ---
Hi Mark,

This would require some more concrete info. Do you have an actual example of
what GDB does that should be considered a bug? For example, a sequence of
commands applied on a particular test file that lead to an unexpected behavior
or output?

Simon
--
You are receiving this mail because:
You are on the CC list for the bug.
mgarvin at panix dot com
2018-09-12 15:49:46 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23639

--- Comment #2 from Mark Garvin <mgarvin at panix dot com> ---
Hi Simon,

I'd love to have found some way to reproduce odd behavior, but unfortunately it
remains elusive. In fact, I thought that converting code from GCC to Clang++
had fixed it, and even posted on the Qt forum site to that effect. Then had to
take that back, as it popped up elsewhere.

The only way to describe this is that setting a breakpoint within a certain
stretch of source code would break correctly, but then continuing execution
afterward will stall forever. QtCreator has a watchdog that pops up after 40
seconds of no response. If told to continue, GDB goes away, never to return.

The exact placement of that break statement does not seem to matter, as long as
it is within a certain range of source statements. And the location of that
range seems to shift. At least it did after the conversion to Clang++. That is
why I initially thought it was fixed.

When the same code is executed without setting breakpoints, it runs without a
glitch, exactly as expected. The places where this occurs is relatively
generic OpenCV C++ code. Pretty much textbook stuff.

If there was some way to break out of GDB's infinite loop (or whatever), I
might be able to gain more perspective.

Please let me know if you can think of any way to gather additional info.
--
You are receiving this mail because:
You are on the CC list for the bug.
simon.marchi at ericsson dot com
2018-09-12 21:25:20 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23639

--- Comment #3 from Simon Marchi <simon.marchi at ericsson dot com> ---
One way to (maybe) get a lead would be to analyze the MI communication between
GDB and QtCreator. There is probably a way to have QtCreator display it, then
you can add it as an attachment here.
--
You are receiving this mail because:
You are on the CC list for the bug.
mgarvin at panix dot com
2018-09-13 01:52:50 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23639

--- Comment #4 from Mark Garvin <mgarvin at panix dot com> ---
Simon, I did locate Qt's log file that has some info about interaction with
GDB. This seems to be the relevant line at the point where it stalls:

--------------------------------------------------------------------------

d252: python
theDumper.fetchVariables({"autoderef":1,"context":"","displaystringlimit":"100","dyntype":1,"expanded":["return","inspect","local","watch"],"fancy":1,"formats":{},"nativemixed":0,"partialvar":"","passexceptions":0,"qobjectnames":1,"resultvarname":"","stringcutoff":"10000","token":252,"typeformats":{},"watchers":[]})
d253: -exec-continue
dTIMED OUT WAITING FOR GDB REPLY. COMMANDS STILL IN PROGRESS: "-exec-continue",
"python
theDumper.fetchVariables({"autoderef":1,"context":"","displaystringlimit":"100","dyntype":1,"expanded":["return","inspect","local","watch"],"fancy":1,"formats":{},"nativemixed":0,"partialvar":"","passexceptions":0,"qobjectnames":1,"resultvarname":"","stringcutoff":"10000","token":252,"typeformats":{},"watchers":[]})"

----------------------------------------------------------------------------

Not sure why I'm seeing "python theDumper" in a log for a C++ debug session.
Does that make sense? Perhaps GDB uses a python script somewhere in the
process.

I don't see anything that I recognize in "COMMANDS STILL IN PROGRESS: " so
these are probably names for internal variables and processes.

I can send more of the log file, but this seems like a good start. Please let
me know if you can think of any tests that can be done to correlate this with
QtCreator's debugger, or with C++ source code.

I still find it odd that setting a breakpoint anywhere in a certain range of
C++ source will cause the bug to occur. So setting it on any line from 100 to
150 will cause the stall, but setting it on line 99, or on line 151 will behave
normally.

It's especially unsettling in that I need to guarantee the integrity of the
code that I'm writing, and this cannot be localized to any single C++ source
statement or function.
--
You are receiving this mail because:
You are on the CC list for the bug.
simon.marchi at ericsson dot com
2018-09-13 02:39:29 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23639

--- Comment #5 from Simon Marchi <simon.marchi at ericsson dot com> ---
Can you attach the full log file?
--
You are receiving this mail because:
You are on the CC list for the bug.
mgarvin at panix dot com
2018-09-13 06:26:05 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23639

--- Comment #6 from Mark Garvin <mgarvin at panix dot com> ---
Created attachment 11238
--> https://sourceware.org/bugzilla/attachment.cgi?id=11238&action=edit
Debugger log file for GDB stall problem
--
You are receiving this mail because:
You are on the CC list for the bug.
mgarvin at panix dot com
2018-09-13 06:28:30 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23639

--- Comment #7 from Mark Garvin <mgarvin at panix dot com> ---
Hi Simon, Debugger log file is attached to previous message. Please let me
know if you notice anything out of place.
--
You are receiving this mail because:
You are on the CC list for the bug.
simon.marchi at ericsson dot com
2018-09-13 19:25:22 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23639

--- Comment #8 from Simon Marchi <simon.marchi at ericsson dot com> ---
Hi Mark,

Not sure how familiar you are with GDB's MI protocol, but here is what I see.
The last command that GDB successfully responded to is:

25-stack-select-frame 0

to which the response is:

25^done

(The 25 here is just a command id decided by the frontend, GDB just puts back
the same id in the response. Most front-ends just use an incrementing
counter.)

The following command that hangs is:

26python
theDumper.fetchVariables({"autoderef":1,"context":"","displaystringlimit":"100","dyntype":1,"expanded":["return","inspect","local","watch"],"fancy":1,"formats":{},"nativemixed":0,"partialvar":"","passexceptions":0,"qobjectnames":1,"resultvarname":"","stringcutoff":"10000","token":26,"typeformats":{},"watchers":[]})


This executes code in the Python code provided by QtCreator, most likely this:
https://github.com/qt-creator/qt-creator/blob/master/share/qtcreator/debugger/gdbbridge.py#L651


We need to determine why it hangs. The problem might be in GDB, or it might be
in QtCreator's Python code. The next step I would suggest is to see if you can
reproduce the hang outside of QtCreator, by sending the exact same commands as
QtCreator sends. According to your log file, QtCreator starts GDB like this:

/usr/bin/gdb --tty=/tmp/QtCreator-UGutGL/outputcollector.RXzCem -i mi

though you can probably leave out the --tty part when running it by hand,
leaving us with:

/usr/bin/gdb -i mi

Then run the MI commands mentioned in the log file. You can extract them from
the log file with:

grep '^<' 2018sep12b_GDB_stall_bug_single_complete_run.txt | cut -c 2-

and run them by copy pasting them in gdb's terminal (or you can probably put
them in a file and feed them to gdb's standard input). We would expect you to
see the same hang as you see in the log file, where you never see the response
for command 26.

If you are able to reproduce the problem reliably like this, it will be much
easier to investigate than when running behind QtCreator.
--
You are receiving this mail because:
You are on the CC list for the bug.
mgarvin at panix dot com
2018-09-15 04:21:25 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23639

--- Comment #9 from Mark Garvin <mgarvin at panix dot com> ---
Thanks for the extensive followup, Simon. I have not dealt with GDB directly,
so I'm not familiar with MI protocol. I have to interact with the source
program in order to load a target file, so I'll have to explore ways of issuing
those commands afterward.

From what you see in the log file, do you think there is any chance that this
could be caused indirectly by the C++ code itself? Granted that it should not
be able to crash GDB or QtCreator, but my main concern was that there could be
some reason for that crash beyond just some glitch within the debuggers.
--
You are receiving this mail because:
You are on the CC list for the bug.
simon.marchi at ericsson dot com
2018-09-15 10:05:54 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23639

--- Comment #10 from Simon Marchi <simon.marchi at ericsson dot com> ---
(In reply to Mark Garvin from comment #9)
Post by mgarvin at panix dot com
Thanks for the extensive followup, Simon. I have not dealt with GDB
directly, so I'm not familiar with MI protocol. I have to interact with the
source program in order to load a target file, so I'll have to explore ways
of issuing those commands afterward.
Ok, so you need to interact with the debugged program on the terminal at some
point? In this case, the --tty option would be useful. What you can do is
have GDB in one terminal, and the debugged program's input/output on another
terminal.

In the terminal in which you want to interact with your program, use the "tty"
command to get the name of the terminal device, then do a command like "tail -F
/dev/null" to avoid having the shell on that terminal read what you input (you
want it to go to the program you debug). For example:

$ tty
/dev/pts/5
$ tail -F /dev/null

Then start gdb with the --tty switch:

$ gdb -i mi --tty /dev/pts/5
Post by mgarvin at panix dot com
From what you see in the log file, do you think there is any chance that
this could be caused indirectly by the C++ code itself? Granted that it
should not be able to crash GDB or QtCreator, but my main concern was that
there could be some reason for that crash beyond just some glitch within the
debuggers.
The command that hangs is a custom Python script that runs inside GDB, but is
supplied by QtCreator. Its purpose seems to be to dump the local variables in
a format that QtCreator expects. A common problem is that when dealing with
uninitialized variables, you can end up with infinite loops (or just very long
ones). For example, an uninitialized std::vector can happen to report a
ridiculously long size, making the debugger appear stuck while it tries to read
billions of elements. That's only a guess, the only way to find out is to
close in step by step towards the root of the problem.
--
You are receiving this mail because:
You are on the CC list for the bug.
mgarvin at panix dot com
2018-09-19 07:41:57 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23639

--- Comment #11 from Mark Garvin <mgarvin at panix dot com> ---
Hi again, Simon.

It looks like you are very familiar with QtCreator's interface to GDB! That's
a pleasant surprise. I had expected this to fall through the cracks after
seeing very little feedback on Qt forums.

It does appear that I'll need to get familiar with GDB and its commands, so
I've been looking into it. Given the touchy nature of this bug, I'm not sure
if it will be easily reproduced when running directly in GDB. Tiny, seemingly
insignificant, changes to C++ source can make the difference between whether
GDB stalls or not.

The latest oddity: I went back to a previous version that did not exhibit the
'stall after breakpoint' behavior. I incrementally updated it to get closer to
my newest code, testing for the stall bug at every step. I arrived at a
version of the source that stalls if a single -declaration- for an OpenCV Mat
object is inserted. In other words:

Mat nullMat;

will stall GDB, even if nullMat is never used or referenced. That is about as
generic an OpenCV statement as can be written. No way should that be able to
trigger anything. Commenting that line suppresses the GDB stall. In the years
that I've been coding, don't recall ever seeing anything like this.

Moving the "Mat nullMat;" declaration to other locations nearby will still
stall GDB.

Again, the breakpoint does not need to be set on any particular statement, as
long as it is within a range of code in the function. Execution stops
correctly on the breakpoint, but continuing execution afterward stalls GDB
indefinitely.

I confirmed that GDB/QtCreator can be left running for 8 hours or longer, and
it never returns.

If a breakpoint is not set, the code does not crash or behave strangely in any
way. It appears to run perfectly. It is the temporary stop at the breakpoint
that triggers this.

Also odd: This is a program for processing graphic images (preprocessing for
neural nets). Loading certain images will suppress the stall bug. No idea
why, as they are often very similar to other images that, when loaded, result
in the stall. Again, -none- of the images causes any odd behavior if the
simple OpenCV Mat declaration is commented out.

I thought it was worth reiterating some of this, as I've had a chance to repeat
the tests quite a few times. Perhaps the additional clues above will help to
clarify the problem.

I realize it is a subjective question, but does any of this sound like it could
be induced by the code itself? That is my main concern. To me, it looks more
and more like some corner case within the QtCreator/GDB debugger itself, but
you'd be able to make a more informed guess. Have you seen anything like this
before?

MG
NYC

PS: Interaction with the program is via Qt GUI, so it's not as simple as
enabling console commands. I'll have to hard-code commands to load files and
such, and hope that the bug is still triggered.
--
You are receiving this mail because:
You are on the CC list for the bug.
simon.marchi at ericsson dot com
2018-09-19 14:01:17 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23639

--- Comment #12 from Simon Marchi <simon.marchi at ericsson dot com> ---
(In reply to Mark Garvin from comment #11)
Post by mgarvin at panix dot com
Hi again, Simon.
It looks like you are very familiar with QtCreator's interface to GDB!
That's a pleasant surprise. I had expected this to fall through the cracks
after seeing very little feedback on Qt forums.
It does appear that I'll need to get familiar with GDB and its commands, so
I've been looking into it. Given the touchy nature of this bug, I'm not
sure if it will be easily reproduced when running directly in GDB. Tiny,
seemingly insignificant, changes to C++ source can make the difference
between whether GDB stalls or not.
Well, if you can manage to reproduce the bug reliably in QtCreator, you should
be able to reproduce it reliably out of QtCreator. As long as it's not a
time-sensitive issue (race condition), giving the exact same inputs (same
binary, same commands) to GDB should yield the same results.

If you need help to try to reproduce the bug outside of QtCreator, feel free to
hop on the #gdb IRC channel to get some more interactive help.
Post by mgarvin at panix dot com
The latest oddity: I went back to a previous version that did not exhibit
the 'stall after breakpoint' behavior. I incrementally updated it to get
closer to my newest code, testing for the stall bug at every step. I
arrived at a version of the source that stalls if a single -declaration- for
Mat nullMat;
will stall GDB, even if nullMat is never used or referenced. That is about
as generic an OpenCV statement as can be written. No way should that be able
to trigger anything. Commenting that line suppresses the GDB stall. In the
years that I've been coding, don't recall ever seeing anything like this.
Moving the "Mat nullMat;" declaration to other locations nearby will still
stall GDB.
In all these cases, it's a local variable in the function where you are
stopped?
Post by mgarvin at panix dot com
Again, the breakpoint does not need to be set on any particular statement,
as long as it is within a range of code in the function. Execution stops
correctly on the breakpoint, but continuing execution afterward stalls GDB
indefinitely.
That seems to confirm my theory QtCreator's script that reads and prints the
local variables is stuck in an endless loop.
Post by mgarvin at panix dot com
I confirmed that GDB/QtCreator can be left running for 8 hours or longer,
and it never returns.
If a breakpoint is not set, the code does not crash or behave strangely in
any way. It appears to run perfectly. It is the temporary stop at the
breakpoint that triggers this.
Yes, because it's not your program that is stuck, it's the debugger.
Post by mgarvin at panix dot com
Also odd: This is a program for processing graphic images (preprocessing for
neural nets). Loading certain images will suppress the stall bug. No idea
why, as they are often very similar to other images that, when loaded,
result in the stall. Again, -none- of the images causes any odd behavior if
the simple OpenCV Mat declaration is commented out.
I thought it was worth reiterating some of this, as I've had a chance to
repeat the tests quite a few times. Perhaps the additional clues above will
help to clarify the problem.
I realize it is a subjective question, but does any of this sound like it
could be induced by the code itself? That is my main concern. To me, it
looks more and more like some corner case within the QtCreator/GDB debugger
itself, but you'd be able to make a more informed guess. Have you seen
anything like this before?
It's most certainly not a problem with your code.

I have seen similar problems with the libstdc++ pretty printers (the things
that print std::vector nicely, for example). When stopping before the
std::vector is initialized, the pretty printer can behave very badly, including
entering a seemingly endless loop.

So in your case, do you ever encounter the bug if the breakpoint is placed
after the declaration of all Mat objects?
Post by mgarvin at panix dot com
MG
NYC
PS: Interaction with the program is via Qt GUI, so it's not as simple as
enabling console commands. I'll have to hard-code commands to load files
and such, and hope that the bug is still triggered.
--
You are receiving this mail because:
You are on the CC list for the bug.
Loading...