Discussion:
[Bug symtab/23288] New: Symbol loading performance regression.
mikesart at fastmail dot com
2018-06-14 06:11:01 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23288

Bug ID: 23288
Summary: Symbol loading performance regression.
Product: gdb
Version: 8.1
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: symtab
Assignee: unassigned at sourceware dot org
Reporter: mikesart at fastmail dot com
Target Milestone: ---

I built chromium using the following instructions:


https://chromium.googlesource.com/chromium/src/+/master/docs/linux_build_instructions.md

With gdb v8.0.1, starting gdb, setting a breakpoint on main, running to that
location, printing a backtrace and exiting takes 17 seconds on my machine:

$ time gdb --batch -q --ex "b main" --ex "r" --ex "bt" --args
out/Default/chrome

Turning off auto-solib-add takes 4.7 seconds:

$ time gdb --batch -q --ex "set auto-solib-add off" --ex "b main" --ex "r"
--ex "bt" --args out/Default/chrome

With gdb v8.1, the first test case with symbol loading is now 32 seconds. I've
narrowed it down to the following set of commits:

// 32 seconds
3f563c8 Optimize .gdb_index symbol name searching

// 20 seconds
b5ec771 Introduce lookup_name_info and generalize Ada's FULL/WILD name
matching

// 17 seconds
5ffa079 Per-language symbol name hashing algorithm

Notes:

I'm investigating symbol loading performance for Valve Software, where games
are having similar issues to Chromium. This makes it quite painful for
turnaround time with debugging sessions, even before this v8.1 regression.

Chromium loads 308 DSOs, all of which have .gdb_index sections.

Running w/ Debian Test:
Linux mikesart-cos 4.16.0-2-amd64 #1 SMP Debian 4.16.12-1 (2018-05-27) x86_64
GNU/Linux
Dual Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz, 128GB RAM

Thanks.
--
You are receiving this mail because:
You are on the CC list for the bug.
mikesart at fastmail dot com
2018-06-19 03:31:12 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23288

--- Comment #1 from Michael Sartain <mikesart at fastmail dot com> ---
I marked gdb up with ftrace and gpuvis markers, and it looks like the new sort
calls in mapped_index_base::build_name_component() are taking ~13 seconds.

std::sort (this->name_components.begin (),
this->name_components.end (),
name_comp_compare);

For this chromium test case, it's being called 214 times and sorting 25,450,266
total items.

perf tool reports that 26% of the time is spent in __strcmp_sse2_unaligned,
which I believe is all the string compares for the sorting.

I'll try to attach a gpuvis screenshot, which shows a visual call stack for
bkpt_decode_location. The largest "sort name_components: 4240734" bar in this
picture is sorting 4,240,734 and taking 3.15 seconds.
--
You are receiving this mail because:
You are on the CC list for the bug.
mikesart at fastmail dot com
2018-06-19 03:33:47 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23288

--- Comment #2 from Michael Sartain <mikesart at fastmail dot com> ---
Created attachment 11082
--> https://sourceware.org/bugzilla/attachment.cgi?id=11082&action=edit
ftrace gpuvis screenshot of bkpt_decode_location call
--
You are receiving this mail because:
You are on the CC list for the bug.
mikesart at fastmail dot com
2018-06-20 16:44:19 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23288

--- Comment #3 from Michael Sartain <mikesart at fastmail dot com> ---
Some notes for this chromium test case.

There are 11,680,768 total symbol names
4,749,945 of these are single char symbol names like '\001', '\003', '\026'.
Seem invalid? Skipping these saves ~1.5 seconds.

That leaves 6,930,823 total symbol names
There are 20,700,321 total symbol names when you include sub-components.

Removing the cp_find_first_component inner loop and only adding full symbol
names saves ~10 seconds.

The std::sort takes ~7 seconds.

These symbols are a hell case for string compares.
~12k symbols start with "base::BindOnce<void ("
~1400 symbols start with "absl::optional_internal::optional_data_base"
~160 start with "(anonymous namespace)::GenerateValue<(lambda at
gen/chrome/service/service_process_catalog_source.cc"

There are 72,028 symbols over 500 bytes.
The longest symbol is 6,181 bytes, and has 5 components.

Couple friends (hat tip Fabian/Charles/Sean) recommended looking at multi-key
sorts and adaptive radix trees. I've printed the symbols out to files and will
try implementing these in a test app. Also thinking/hoping going wild matching
for C++ will allow us to eliminate the cp_find_first_component() calls.

void mapped_index_base::build_name_components ()
{
...

auto count = this->symbol_name_count ();
for (offset_type idx = 0; idx < count; idx++)
{
if (this->symbol_name_slot_invalid (idx))
continue;

const char *name = this->symbol_name_at (idx);

//$$$ mikesart: 4,749,945 of 11,680,768 total symbols are single char
non-alpha?
// This condition saves ~1.5 seconds
if (!name[0] || (!name[1] && !isalpha(name[0])))
continue;

/* Add each name component to the name component table. */
unsigned int previous_len = 0;

//$$$ mikesart: skipping this loop saves ~10 seconds
for (unsigned int current_len = cp_find_first_component (name);
name[current_len] != '\0';
current_len += cp_find_first_component (name + current_len))
{
gdb_assert (name[current_len] == ':');

this->name_components.push_back ({previous_len, idx});

/* Skip the '::'. */
current_len += 2;
previous_len = current_len;
}

this->name_components.push_back ({previous_len, idx});
}

...

//$$$ total sort time is ~7 seconds
std::sort (this->name_components.begin (),
this->name_components.end (),
name_comp_compare);
--
You are receiving this mail because:
You are on the CC list for the bug.
simon.marchi at ericsson dot com
2018-06-20 20:01:31 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23288

Simon Marchi <simon.marchi at ericsson dot com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |simon.marchi at ericsson dot com
--
You are receiving this mail because:
You are on the CC list for the bug.
mikesart at fastmail dot com
2018-06-22 22:40:46 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23288

--- Comment #4 from Michael Sartain <mikesart at fastmail dot com> ---
If we split the symbols into components using ':', ',', and ' ', we can hash
those strings and create an array which lists which symbols entries contain
each component. For the chromium+dso test case app, build_name_component() time
drops to 6.9 seconds from 15.3 seconds, and memory usage drops significantly as
well.

So new method (we can toss hash table after creating array of components):

idx Symbol
------------------
0 func
1 ns1::a::b::func
2 ns1::b::func
3 ns2::func

; hash table only used in build_name_components() to create below array
hash('func') -> 0
hash('ns1') -> 1
hash('a') -> 2
hash('b') -> 3
hash('ns2') -> 4

; array of symbol components (offset + length) and symbol indices which use
them (we'll sort this)
[0] offset:0 length:4 (0, 1, 2, 3)
[1] offset:0 length:3 (1, 2)
[2] offset:5 length:1 (1)
[3] offset:8 length:1 (1, 2)
[4] offset:0 length:3 (3)

When doing the search, we can break the user string into components and get a
list of symbol indices which match each component.

This theoretically will drop our test case from 32 seconds down to 24, which is
still awful. Hopefully we'll be able to thread a lot of this as well.
--
You are receiving this mail because:
You are on the CC list for the bug.
Loading...