译者: bzhaoopenstack
作者: Amit Dattatray Khandekar
原文链接: https://amitdkhan-pg.blogspot.com/2020/07/backtraces-in-postgresql.html
PGSQL 13引入了Backtraces特性,方便客户和管理者定位疑难问题,来看社区大牛Amit带你玩转它!
PostgreSQL 13引入了一个简单但非常有用的功能,在发生错误时将堆栈跟踪记录到服务器日志中。让我们看看细节
有一个 GUC 可以启用 生成 stacktrace: backtrace_functions. 设置为逗号隔开的function名字。
SET backtrace_functions TO ‘func1,func2’;
如果从这些函数之一抛出错误,将生成回溯跟踪并记录到服务器日志中。
注意只有超级用户可以设置该GUC。可以在本地会话中设置,也可以在postgresql.conf文件中全局设置。
在客户报告错误消息的情况下,很容易看出它是如何起作用的。我们可以在源代码中通过 grep 找到它的来源。但除此之外,就只能靠猜测了。现在不是了,随着这个功能的出现,现在,您可以要求客户将 backtrace_functions设置为所有发出此错误消息的函数,并获取堆栈跟踪。在大多数情况下,错误的根本原因并不是发生错误的函数; 它位于堆栈中间的某个位置; 因此堆栈跟踪是关键的
这个功能已经在其他数据库中使用,比如 MySQL,Greenplum,Oracle。
但在 PostgreSQL 中仍然缺少的东西——这个问题也存在于大多数其他数据库中——是能够在服务器后端由于内存区段错误或其他类似意外信号崩溃时生成堆栈跟踪,或者当服务器由于某种原因而宕机时生成堆栈跟踪。这种能力会带来更大的不同。我们将摆脱生成核心文件的解释步骤。更重要的是,这在崩溃只是随机发生的情况下有所帮助。即使发生了单个意外的崩溃,客户也总是准备好了回溯。我希望这能在 PostgreSQL 的下一个主要版本中实现
让我们看看 PostgreSQL 堆栈跟踪日志是什么样的。我们将尝试使用一个不存在的类型来创建表。假设我们知道“ type does not exist”错误来自源代码中的 typenameType ()。所以我们这样做:
postgres=# set backtrace_functions TO ‘typenameType’;
postgres=# create table tab (id invalidtype);
ERROR: type “invalidtype” does not exist
LINE 1: create table tab (id invalidtype);
以下是服务器日志中的一个片段: :
2020-07-28 20:17:01.482 CST [22454] ERROR: type “invalidtype” does not exist at character 22
2020-07-28 20:17:01.482 CST [22454] BACKTRACE:
postgres: amit postgres [local] CREATE TABLE(typenameType+0xa4) [0xaaaaafcd2ac4]
postgres: amit postgres [local] CREATE TABLE(+0x20f550) [0xaaaaafcd4550]
postgres: amit postgres [local] CREATE TABLE(transformCreateStmt+0x53c) [0xaaaaafcd7a10]
postgres: amit postgres [local] CREATE TABLE(+0x44df20) [0xaaaaaff12f20]
postgres: amit postgres [local] CREATE TABLE(standard_ProcessUtility+0x16c) [0xaaaaaff1225c]
postgres: amit postgres [local] CREATE TABLE(+0x44a4e4) [0xaaaaaff0f4e4]
postgres: amit postgres [local] CREATE TABLE(+0x44af88) [0xaaaaaff0ff88]
postgres: amit postgres [local] CREATE TABLE(PortalRun+0x198) [0xaaaaaff10ed8]
postgres: amit postgres [local] CREATE TABLE(+0x44764c) [0xaaaaaff0c64c]
postgres: amit postgres [local] CREATE TABLE(PostgresMain+0x970) [0xaaaaaff0d3d4]
postgres: amit postgres [local] CREATE TABLE(+0x3b3be4) [0xaaaaafe78be4]
postgres: amit postgres [local] CREATE TABLE(PostmasterMain+0xdc0) [0xaaaaafe79b70]
postgres: amit postgres [local] CREATE TABLE(main+0x480) [0xaaaaafb82510]
/lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0xe0) [0xffffaac956e0]
postgres: amit postgres [local] CREATE TABLE(+0xbd5d8) [0xaaaaafb825d8]
2020-07-29 18:01:02.726 CST [28776] STATEMENT: create table tab (id invalidtype);
回溯的每一行都有函数名、该函数的偏移量和该帧的返回地址
对于某些堆栈帧,函数名不存在; 相反,函数地址存在。这些都是静态函数。对于这样的函数,函数名不会公开。但是我们可以通过 addr2line 命令行工具从他们的地址中获取他们的名字:
$ addr2line 0x20f550 0x44df20 -a -f -e which postgres
0x000000000020f550
transformColumnDefinition
:?
0x000000000044df20
ProcessUtilitySlow.constprop.0
:?
如果是调试版本,甚至会打印文件名和偏移量
现在让我们看看这个简单的特性是如何实现的
在包括 PostgreSQL、 Greenplum、 MySQL 在内的大多数 RDBMS 中,这个特性都是通过一个简单的函数 backtrace ()来生成 stacktrace:
int backtrace(void **buffer, int size);
这个函数只返回帧的所有返回地址。因此,接下来应该调用 backtrace_symbols () ,该函数将 backtrace ()返回的地址转换成字符串,如果可用的话,使用函数名来描述地址:
char **backtrace_symbols(void *const *buffer, int size);
这些函数的所有细节都在其man手册页中得到了很好的描述。这些功能在大多数平台上都是可用的.
注意以下几点:
\1. 要使函数名可用于 backtrace_symbols () ,必须使用链接器选项构建可执行文件,这些链接器选项允许将所有这些符号添加到“动态符号表”中。这些选项可以通过以下方式之一给出(这些是 gcc 编译器选项) :
gcc -rdynamic
gcc -Wl,-E
\2. 当使用 gcc -O2或更高的优化级别编译时,有时可能会丢失特定的堆栈帧。例如,检查这个示例程序backtrace.c 反向追踪 从backtrace() man手册页.
不使用 -O2来进行编译 :
amit:pg:error$ gcc -rdynamic -o backtrace backtrace.c
I get the full stack :
amit:pg:error$ ./backtrace 6
backtrace() returned 11 addresses
./backtrace(myfunc3+0x2c) [0xaaaad6b2edc0]
./backtrace(+0xe84) [0xaaaad6b2ee84]
./backtrace(myfunc+0x2c) [0xaaaad6b2eebc]
./backtrace(myfunc+0x24) [0xaaaad6b2eeb4]
./backtrace(myfunc+0x24) [0xaaaad6b2eeb4]
./backtrace(myfunc+0x24) [0xaaaad6b2eeb4]
./backtrace(myfunc+0x24) [0xaaaad6b2eeb4]
./backtrace(myfunc+0x24) [0xaaaad6b2eeb4]
./backtrace(main+0x60) [0xaaaad6b2ef28]
/lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0xe0) [0xffff8c5ba6e0]
./backtrace(+0xcc4) [0xaaaad6b2ecc4]
使用 -O2来进行编译 :
amit:pg:error$ gcc -O2 -rdynamic -o backtrace backtrace.c
amit:pg:error$ ./backtrace 6
backtrace() returned 4 addresses
./backtrace(myfunc3+0x38) [0xaaaac7183e40]
./backtrace(main+0x4c) [0xaaaac7183cfc]
/lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0xe0) [0xffffb91286e0]
./backtrace(+0xd38) [0xaaaac7183d38]
myfunc2()和 myfunc ()没有框架。一种可能性是编译器用尾端调用 myfunc3()替换了 myfunc ()和 myfunc2()的递归调用,这被称为尾端调用优化.
重点是: 我们需要意识到在一些情况下这种缺失的框架
PostgreSQL 13 has introduced a simple but extremely useful capability to log a stack trace into the server logs when an error is reported. Let’s see the details.
There is a GUC to enable stacktrace generation : backtrace_functions. Set it to a comma-separated function names.
SET backtrace_functions TO ‘func1,func2’;
If the error is thrown from one of these functions, a backtrace will be generated and logged into the server log.
Note that only superusers can set the backtrace_functions GUC. It can be set locally in a session, or can be included in postgresql.conf file to globally set it.
It’s easy to see how it would help in a situation where a customer reports an error message. We can find from where it came from by grep’ing for it in the source code. But beyond that, it was all guess work. Not anymore. Now, you can ask the customer to set backtrace_functions to all such functions which are emitting this error message, and get the stack trace. In most cases, the root cause of the error is not in the function which emits the error; its located somewhere in the middle of the stack; hence the stack trace is critical.
This capability is already available in many other databases like MySQL, Greenplum, Oracle.
What’s still missing in PostgreSQL - and is present in most of these other databases - is being able to generate stack trace when a server backend crashes with a segmentation fault or other such unexpected signals, or when the server PANICs due to some reason. This capability would make a much bigger difference. We will get rid of having to explain steps to generate core file. More importantly, this helps in situations where the crash happens only randomly. Even with a single unexpected crash, the customer would always be ready with a backtrace. I am hopeful this would be implemented in the next major release of PostgreSQL.
Let’s see how a PostgreSQL stack trace log looks like. We will try to use a non-existent type to create a table. Supposing we know that the “type does not exist” error comes from typenameType() in the source code. So we do this :
postgres=# set backtrace_functions TO ‘typenameType’;
postgres=# create table tab (id invalidtype);
ERROR: type “invalidtype” does not exist
LINE 1: create table tab (id invalidtype);
Here’s a snippet from the server log :
2020-07-28 20:17:01.482 CST [22454] ERROR: type “invalidtype” does not exist at character 22
2020-07-28 20:17:01.482 CST [22454] BACKTRACE:
postgres: amit postgres [local] CREATE TABLE(typenameType+0xa4) [0xaaaaafcd2ac4]
postgres: amit postgres [local] CREATE TABLE(+0x20f550) [0xaaaaafcd4550]
postgres: amit postgres [local] CREATE TABLE(transformCreateStmt+0x53c) [0xaaaaafcd7a10]
postgres: amit postgres [local] CREATE TABLE(+0x44df20) [0xaaaaaff12f20]
postgres: amit postgres [local] CREATE TABLE(standard_ProcessUtility+0x16c) [0xaaaaaff1225c]
postgres: amit postgres [local] CREATE TABLE(+0x44a4e4) [0xaaaaaff0f4e4]
postgres: amit postgres [local] CREATE TABLE(+0x44af88) [0xaaaaaff0ff88]
postgres: amit postgres [local] CREATE TABLE(PortalRun+0x198) [0xaaaaaff10ed8]
postgres: amit postgres [local] CREATE TABLE(+0x44764c) [0xaaaaaff0c64c]
postgres: amit postgres [local] CREATE TABLE(PostgresMain+0x970) [0xaaaaaff0d3d4]
postgres: amit postgres [local] CREATE TABLE(+0x3b3be4) [0xaaaaafe78be4]
postgres: amit postgres [local] CREATE TABLE(PostmasterMain+0xdc0) [0xaaaaafe79b70]
postgres: amit postgres [local] CREATE TABLE(main+0x480) [0xaaaaafb82510]
/lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0xe0) [0xffffaac956e0]
postgres: amit postgres [local] CREATE TABLE(+0xbd5d8) [0xaaaaafb825d8]
2020-07-29 18:01:02.726 CST [28776] STATEMENT: create table tab (id invalidtype);
Each line of the backtrace has the function name, an offset into that function, and the return address of that frame.
For some stack frames, the function name is not present; instead, the function address is present. These are static functions. For such functions, the function names are not exposed. But we may be able to get their names from their addresses, with the help of addr2line command-line tool :
$ addr2line 0x20f550 0x44df20 -a -f -e which postgres
0x000000000020f550
transformColumnDefinition
:?
0x000000000044df20
ProcessUtilitySlow.constprop.0
:?
If it’s a debug build, even the file name and offset is printed.
Now let’s see how this simple feature is implemented.
In most of the RDBMS’s including PostgreSQL, Greenplum, MySQL, the feature is implemented using a simple function backtrace() to generate the stacktrace:
int backtrace(void **buffer, int size);
This function only returns all the return addresses of the frames. So it should be followed by a call to backtrace_symbols() that converts the addresses returned by backtrace() into strings that describe the addresses using the function names if available :
char **backtrace_symbols(void *const *buffer, int size);
All the details of these functions are nicely described in their man pages. These functions are available in most of the platforms.
Note a couple of points :
For the function names to be available for backtrace_symbols(), the executable has to be built using linker options that allow adding all these symbols into a “dynamic symbol table”. These options can be given with one of the following ways (these are gcc compiler options) :
gcc -rdynamic
gcc -Wl,-ESometimes particular stack frames might be missing, when compiled with gcc -O2 or higher optimization level. E.g. check this sample program backtrace.c from the backtrace() man pages.
I compile it without -O2 :
amit:pg:error$ gcc -rdynamic -o backtrace backtrace.c
I get the full stack :
amit:pg:error$ ./backtrace 6
backtrace() returned 11 addresses
./backtrace(myfunc3+0x2c) [0xaaaad6b2edc0]
./backtrace(+0xe84) [0xaaaad6b2ee84]
./backtrace(myfunc+0x2c) [0xaaaad6b2eebc]
./backtrace(myfunc+0x24) [0xaaaad6b2eeb4]
./backtrace(myfunc+0x24) [0xaaaad6b2eeb4]
./backtrace(myfunc+0x24) [0xaaaad6b2eeb4]
./backtrace(myfunc+0x24) [0xaaaad6b2eeb4]
./backtrace(myfunc+0x24) [0xaaaad6b2eeb4]
./backtrace(main+0x60) [0xaaaad6b2ef28]
/lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0xe0) [0xffff8c5ba6e0]
./backtrace(+0xcc4) [0xaaaad6b2ecc4]
Now I compile it with -O2 :
amit:pg:error$ gcc -O2 -rdynamic -o backtrace backtrace.c
amit:pg:error$ ./backtrace 6
backtrace() returned 4 addresses
./backtrace(myfunc3+0x38) [0xaaaac7183e40]
./backtrace(main+0x4c) [0xaaaac7183cfc]
/lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0xe0) [0xffffb91286e0]
./backtrace(+0xd38) [0xaaaac7183d38]
There is no frame for myfunc2() and myfunc(). One possibility is that the compiler has replaced the recursive calls of myfunc() and also myfunc2() call with the tail end call myfunc3(), which is called tail call optimization.
The point being: we need to be aware of such missing frames in a few scenarios.