Reproducing Go reportZombies by editting ELF
0x01 Intro
I randomly see some reportZombies
‘ in Go1.17. The program does have some data race of the following kinds:
- When logging, the upper layer using the same log object in multiple Goroutines, causing a string is raced by multiple Goroutines.
- Using global struct, maybe write by one Goroutine and read by many Goroutines.
Those kind of data race really can cause a memory corruption. But in my tests, it can’t be stably reproduced. While the race detector can easily find those data race, but the GC not reportZombies as expected. In out team, what you can prove you can’t call it guilty. So I start my not very long journey to reproduce that issue.
0x02 Ask for help
As a new learner to the Go runtime, I found the Go GC is quite elegant, still not easy to get the key point of this kind of issue. So I seeded help in the community.
This is the answer I got from the go-nuts mail-list.
A data race in a string can confuse the garbage collector. A string
is a length and a pointer. If goroutine A updates the length in
parallel with goroutine B updating the pointer, then the garbage
collector can see a length and a pointer that do not correspond. If
the length is longer than the pointer expects, that can cause the GC
to mark some memory as allocated although it isn’t. If that memory is
not possible to allocate, because it is part of the GC metadata, the
collector will crash. That is just one possible way to crash the
program with a race.
As a practical matter, it’s a waste of time to try to reason about GC
behavior when you know that the program has a race. As Kurtis says,
fix the race first.
So, I’ve understood why a string data race can lead to a GC crash. Next, I need to reproduce it. To reproduce that, some powerful hacking is needed.
0x03 Try to reproduce
A string in Golang is a struct defined as below.
1 | type stringStruct struct { |
It consist of a pointer and a length. The pointer is the start address, and the length is address space size in byte. In AMD64 environment, the pointer a 
machine word 8-bytes, and the int is also a 8-bytes long.
Basically a string assign statement’s assembly code is shown bellow in Go1.17. One can get that by objdump -S elf.
4e8851: 48 8b 84 24 f8 00 00 mov 0xf8(%rsp),%rax
4e8858: 00
4e8859: 48 89 47 18 mov %rax,0x18(%rdi) // 1. assign the len
4e885d: 83 3d dc e0 12 00 00 cmpl $0x0,0x12e0dc(%rip) # 616940 <runtime.writeBarrier>
4e8864: 75 0e jne 4e8874 <github.com/tal-tech/go-zero/core/logx.(*traceLogger).write+0x134>
4e8866: 48 8b 84 24 f0 00 00 mov 0xf0(%rsp),%rax
4e886d: 00
4e886e: 48 89 47 10 mov %rax,0x10(%rdi)
4e8872: eb 1a jmp 4e888e <github.com/tal-tech/go-zero/core/logx.(*traceLogger).write+0x14e>
4e8874: 48 8d 47 10 lea 0x10(%rdi),%rax
4e8878: 48 89 f9 mov %rdi,%rcx
4e887b: 48 89 c7 mov %rax,%rdi
4e887e: 48 8b 94 24 f0 00 00 mov 0xf0(%rsp),%rdx
4e8885: 00
4e8886: e8 f5 e0 f7 ff callq 466980 <runtime.gcWriteBarrierDX>
We can see that the real instructions divides into 2 steps. 
- Assign the len member, happened at 4e8859.
- Assign the pointer member, happened after.
Why a pointer assignment is so complicated? That’s what the Go GC implement need, it need to track pointer changes during mark phase.
There are a lot of articles explaining that very well, that’s not my emphasis here. For now the only thing we need to notice is that the two steps.
As well known that the two steps can lead to a mismatch between the pointer and the len. What we need is to increase the probability of the mismatch with GC mark.
I tried the follow ways but failed.
- Create a lot of Goroutines to make the data race.
- Create a recursive call of a data race function to make the stack grow and expand.
I guass the mismatch window is too tiny to be hit with GC mark.
So the ultimate way is to modify the code to insert a time.Sleep
in the middle of the two steps. This turned out to be effective but a little tedious, and quit interesting.
First I want modify the Go compiler to insert such a call between the code. Like the one I once did to verify issue https://github.com/golang/go/issues/54247 to insert a SaveDX
function. But after some study, I found it is too difficult to understand the whole IR generation logic. I started to take a direct way to modify the ELF file to do the trick.
0x04 Edit the ELF file
Before I can edit the ELF file, I need a some basic binary machine instruction. So I added a doSleep
function before the string assignment, like this.
1 |
|
The assembly code got from objdump is this:
4e888e: e8 6d fe ff ff callq 4e8700 <github.com/tal-tech/go-zero/core/logx.doSleep>
4e8893: 48 8b 44 24 38 mov 0x38(%rsp),%rax
4e8898: 48 8b 8c 24 d8 00 00 mov 0xd8(%rsp),%rcx
4e889f: 00
4e88a0: 48 89 41 38 mov %rax,0x38(%rcx)
4e88a4: 83 3d 95 e0 12 00 00 cmpl $0x0,0x12e095(%rip) # 616940 <runtime.writeBarrier>
What I need is to move the callq right before the cmpl which is the runtime.writeBarrier comparison. This involves some instruction adjustment.
First one may learn the callq details. The first byte is ‘e8’ indicating a callq instruction, be follow 4 bytes is the address relative of target adress with callq itself, knows as offset.
If we want callq still work after the movement, we need to adjust the offset. I used a Python script to do the job, and it’s easy to verify automaticlly that way. The procedures are quite simple.
- Load the ELF, to find the pattern we want to adjust, not include the ‘4e88a4’ cmpl line. Make sure there is only exactly one. Recording the offset.
- Split the callq 5 bytes, update the callq offset, update the 5 bytes.
- Remove the old 5 bytes from head, and append the new 5 bytes callq instruction to the end.
- Do a replacement with old pattern and new pattern. Then save the new ELF.
It’s not so hard.
But after the replacement I got a panic when invoking the ELF file.
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x4e8cd9]
goroutine 19 [running]:
github.com/tal-tech/go-zero/core/logx.traceIdFromContext({0x0, 0x0})
go-zero/core/logx/tracelogger.go:128 +0x39
github.com/tal-tech/go-zero/core/logx.(*traceLogger).write(0xc0001ba5b0, {0x7fd3f697d958, 0xc000098c40}, {0x512a16, 0x4}, {0xc0000a4ad0, 0x7})
Whoa, there must be something I missed. After some analysis, I known the reason. When I move the callq instruction, the register RCX might get taint by the doSleep
call. I need a way to restore the RCX register. As we can see the RCX is assigned with line 4e8898, from stack. If we can replay that assignment after doSleep, then we call solve that problem.
After some trying, I’ve found a int assignment to a global variable would do the trick. This is the original assembly code after the assignment added.
The objdump has some issue, where the 4e889a line is also a part of moveq, it split it into a new line. There is one problem left. A RCX restore instruction is 8 bytes, but the moveq is 11 bytes, what we going to do with the extra 3 bytes. Do remember that we are editing an ELF file, if we miss 3 bytes, we need to adjust all the jump/call instruction offsets.
4e888e: e8 6d fe ff ff callq 4e8700 <github.com/tal-tech/go-zero/core/logx.doSleep>
4e8893: 48 c7 05 b2 1a 0e 00 movq $0x8,0xe1ab2(%rip) # 5ca350 <github.com/tal-tech/go-zero/core/logx.num>
4e889a: 08 00 00 00
4e889e: 48 8b 44 24 38 mov 0x38(%rsp),%rax
4e88a3: 48 8b 8c 24 d8 00 00 mov 0xd8(%rsp),%rcx
4e88aa: 00
4e88ab: 48 89 41 38 mov %rax,0x38(%rcx)
4e88af: 83 3d 8a e0 12 00 00 cmpl $0x0,0x12e08a(%rip) // # 616940 <runtime.writeBarrier>
Luckily there a nop instruction in AMD64, we replaced the additional 00 with 90 to keep the
consistency. Finally the binary code we got is this:
4e888b: 48 89 cf mov %rcx,%rdi
4e888e: 48 8b 44 24 38 mov 0x38(%rsp),%rax
4e8893: 48 8b 8c 24 d8 00 00 mov 0xd8(%rsp),%rcx
4e889a: 00
4e889b: 48 89 41 38 mov %rax,0x38(%rcx)
4e889f: e8 5c fe ff ff callq 4e8700 <github.com/tal-tech/go-zero/core/logx.doSleep>
4e88a4: 48 8b 8c 24 d8 00 00 mov 0xd8(%rsp),%rcx
4e88ab: 00
4e88ac: 90 nop
4e88ad: 90 nop
4e88ae: 90 nop
4e88af: 83 3d 8a e0 12 00 00 cmpl $0x0,0x12e08a(%rip) # 616940 <runtime.writeBarrier>
We’ve successfully move the callq right before the cmpl.
The effect is very good. The reportZombies
can be triggered in a very short time with 5 Goroutines.
 0x05 Summary
In a word, playing with ELF is quite interesting, and also very powerful. We can say that the doSleep
call simply delay the pointer assignment and makes no impact on the correctness of the program.