CoreDumpDebugging
Originally published at https://rakyll.org/coredumps/.
Debugging is highly useful to examine the execution flow and to understand the current state of a program.
A core file is a file that contains the memory dump of a running process and its process status. It is primarily used for post-mortem debugging of a program, as well as to understand a program's state while it is still running. These two cases make debugging of core dumps a good diagnostics aid to postmortem and analyze production services.
I will use a simple hello world web server in this article, but in real life our programs might get very complicated easily. The availability of core dump analysis gives you an opportunity to resurrect a program from specific snapshot and look into cases that might only reproducible in certain conditions/environments.
Note: This flow only works on Linux at this point end-to-end, I am not quite sure about the other Unixes but it is not yet supported on macOS. Windows is not supported at this point.
Before we begin, you need to make sure that your ulimit for core dumps are at a reasonable level. It is by default 0 which means the max core file size can only be zero. I usually set it to unlimited on my development machine by typing:
$ ulimit -c unlimited
Then, make sure you have delve installed on your machine.
Here is a main.go
that contains a simple handler and it starts an HTTP server.
$ cat main.go
package main
import (
"fmt"
"log"
"net/http"
)
func main() {
http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
fmt.Fprint(w, "hello world\n")
})
log.Fatal(http.ListenAndServe("localhost:7777", nil))
}
Let's build this and have a binary.
$ go build .
Let’s assume, in the future, there is something messy going on with this server but you are not so sure about what it might be. You might have instrumented your program in various ways but it might not be enough for getting any clue from the existing instrumentation data.
Basically, in a situation like this, it would be nice to have a snapshot of the current process, and then use that snapshot to dive into to the current state of your program with your existing debugging tools.
There are several ways to obtain a core file. You might have been already familiar with crash dumps, these are basically core dumps written to disk when a program is crashing. Go doesn't enable crash dumps by default but gives you this option on Ctrl+backslash when GOTRACEBACK
env variable is set to "crash".
$ GOTRACEBACK=crash ./hello
(Ctrl+\)
It will crash the program with stack trace printed and core dump file will be written.
Another option is to retrieve a core dump from a running process without having to kill a process. With gcore
, it is possible to get the core files without crashing. Let’s start the server again:
$ ./hello &
$ gcore 546 # 546 is the PID of hello.
$ dlv core ./hello core.546
Alright, this is it! This is no different than the typical delve interactive. You can backtrace, list, see variables, and more. Some features will be disabled given a core dump is a snapshot and not a currently running process, but the execution flow and the program state will be entirely accessible.
(dlv) bt
0 0x0000000000457774 in runtime.raise
at /usr/lib/go/src/runtime/sys_linux_amd64.s:110
1 0x000000000043f7fb in runtime.dieFromSignal
at /usr/lib/go/src/runtime/signal_unix.go:323
2 0x000000000043f9a1 in runtime.crash
at /usr/lib/go/src/runtime/signal_unix.go:409
3 0x000000000043e982 in runtime.sighandler
at /usr/lib/go/src/runtime/signal_sighandler.go:129
4 0x000000000043f2d1 in runtime.sigtrampgo
at /usr/lib/go/src/runtime/signal_unix.go:257
5 0x00000000004579d3 in runtime.sigtramp
at /usr/lib/go/src/runtime/sys_linux_amd64.s:262
6 0x00007ff68afec330 in (nil)
at :0
7 0x000000000040f2d6 in runtime.notetsleep
at /usr/lib/go/src/runtime/lock_futex.go:209
8 0x0000000000435be5 in runtime.sysmon
at /usr/lib/go/src/runtime/proc.go:3866
9 0x000000000042ee2e in runtime.mstart1
at /usr/lib/go/src/runtime/proc.go:1182
10 0x000000000042ed04 in runtime.mstart
at /usr/lib/go/src/runtime/proc.go:1152
(dlv) ls
> runtime.raise() /usr/lib/go/src/runtime/sys_linux_amd64.s:110 (PC: 0x457774)
105: SYSCALL
106: MOVL AX, DI // arg 1 tid
107: MOVL sig+0(FP), SI // arg 2
108: MOVL $200, AX // syscall - tkill
109: SYSCALL
=> 110: RET
111:
112: TEXT runtime·raiseproc(SB),NOSPLIT,$0
113: MOVL $39, AX // syscall - getpid
114: SYSCALL
115: MOVL AX, DI // arg 1 pid