生成二进制可执行文件

首先,写一个go main函数,这里就简单输出以下hello world。

package main

import "fmt"

func main()  {
	fmt.Println("hello word")
}

执行go build -gcflags "-N -l" -ldflags=-compressdwarf=false -o main main.go生成可执行二进制文件。

开启gdb调试

执行gdb main开始gdb调试。通过i files查看程序入口地址,再这个地址打上断点。

Loading Go Runtime support.
(gdb) i files
Symbols from "/Users/vector/go/src/alg/source/main/main".
Local exec file:
        `/Users/vector/go/src/alg/source/main/main', file type mach-o-x86-64.
        Entry point: 0x1063c80
        0x0000000001001000 - 0x00000000010a6b73 is .text
        0x00000000010a6b80 - 0x00000000010ee254 is __TEXT.__rodata
        0x00000000010ee260 - 0x00000000010ee386 is __TEXT.__symbol_stub1
        0x00000000010ee3a0 - 0x00000000010eeb40 is __TEXT.__typelink
        0x00000000010eeb40 - 0x00000000010eebb0 is __TEXT.__itablink
        0x00000000010eebb0 - 0x00000000010eebb0 is __TEXT.__gosymtab
        0x00000000010eebc0 - 0x0000000001155c85 is __TEXT.__gopclntab
        0x0000000001156000 - 0x0000000001156020 is __DATA.__go_buildinfo
        0x0000000001156020 - 0x00000000011561a8 is __DATA.__nl_symbol_ptr
        0x00000000011561c0 - 0x00000000011646c0 is __DATA.__noptrdata
        0x00000000011646c0 - 0x000000000116b7f0 is .data
        0x000000000116b800 - 0x000000000119b830 is .bss
        0x000000000119b840 - 0x000000000119df08 is __DATA.__noptrbss
(gdb) b *0x1063c80
Breakpoint 1 at 0x1063c80: file /usr/local/go/src/runtime/rt0_darwin_amd64.s, line 8.

执行run程序停在断点处,说明程序入口在/usr/local/go/src/runtime/rt0_darwin_amd64.s的第8行。

(gdb) run
Starting program: /Users/vector/go/src/alg/source/main/main 
[New Thread 0xc03 of process 99850]
[New Thread 0x2903 of process 99850]
warning: unhandled dyld version (16)

Thread 2 hit Breakpoint 1, _rt0_amd64_darwin () at /usr/local/go/src/runtime/rt0_darwin_amd64.s:8
8               JMP     _rt0_amd64(SB)

用编辑器打开go源码,入口程序执行_rt0_amd64(SB)

TEXT _rt0_amd64_darwin(SB),NOSPLIT,$-8
	JMP	_rt0_amd64(SB)

gdb输入s继续执行,找到_rt0_amd64()的位置

(gdb) s
_rt0_amd64 () at /usr/local/go/src/runtime/asm_amd64.s:15
15              MOVQ    0(SP), DI       // argc
(gdb) 

_rt0_amd64函数源码, 这部分主要是读取命令行参数argc、argv,分别读取到寄存器di、si,然后跳到runtime·rt0_go(SB)

TEXT _rt0_amd64(SB),NOSPLIT,$-8
	MOVQ	0(SP), DI	// argc
	LEAQ	8(SP), SI	// argv
	JMP	runtime·rt0_go(SB)

gdb 继续执行,找到runtime.rt0_go

_rt0_amd64 () at /usr/local/go/src/runtime/asm_amd64.s:15
15              MOVQ    0(SP), DI       // argc
[...]
(gdb) s
runtime.rt0_go () at /usr/local/go/src/runtime/asm_amd64.s:89
89              MOVQ    DI, AX          // argc
(gdb) 

runtime.rt0_go 部分代码比较长,我们分块来看,首先是将命令行参数放到堆栈中,将栈顶寄存器SP进行16字节对齐。

TEXT runtime·rt0_go(SB),NOSPLIT,$0
	// copy arguments forward on an even stack
	MOVQ	DI, AX		// argc 把argc放到AX
	MOVQ	SI, BX		// argv 把argv方法BX
	SUBQ	$(4*8+7), SP // 2args 2auto
	ANDQ	$~15, SP    // 内存16字节对齐
	MOVQ	AX, 16(SP)  // argc 放到 SP + 16字节处
	MOVQ	BX, 24(SP)  // argv 放到 SP + 24字节处

通过gdb调试看下这里sp地址的变化,首先是执行SUBQ $(4*8+7), SP前后,执行前0x7ffeefbff330,执行后地址0x7ffeefbff309,变化前后的十进制差是39=4*8+7,也就是这里通过移动SP指针分配39字节的内存。至于为什么要分配内存,应该是为了保存argc,argv。 MOVQ BX, 24(SP)是移动8字节的BX到SP+24字节处,这也就是为什么要分配4*8+7内存,要大于32字节。

91              SUBQ    $(4*8+7), SP            // 2args 2auto
(gdb) i frame
Stack level 0, frame at 0x7ffeefbff338:
 rip = 0x10607e6 in runtime.rt0_go (/usr/local/go/src/runtime/asm_amd64.s:91); saved rip = 0x1
 called by frame at 0x7ffeefbff340
 source language asm.
 Arglist at 0x7ffeefbff328, args: 
 Locals at 0x7ffeefbff328, Previous frame's sp is 0x7ffeefbff338
 Saved registers:
// 执行前地址0x7ffeefbff330
  rip at 0x7ffeefbff330
(gdb) s
runtime.rt0_go () at /usr/local/go/src/runtime/asm_amd64.s:92
92              ANDQ    $~15, SP
(gdb) i frame
Stack level 0, frame at 0x7ffeefbff311:
 rip = 0x10607ea in runtime.rt0_go (/usr/local/go/src/runtime/asm_amd64.s:92); saved rip = 0x11bf0
 called by frame at 0x7ffeefbff319
 source language asm.
 Arglist at 0x7ffeefbff301, args: 
 Locals at 0x7ffeefbff301, Previous frame's sp is 0x7ffeefbff311
 Saved registers:
// 执行后地址 0x7ffeefbff309
  rip at 0x7ffeefbff309

下面继续看16字节对齐操作,执行ANDQ $~15, SP按位&将0x7ffeefbff309后16位变成0得到0x7ffeefbff300变成16的整数倍,这样做主要是因为CPU中的SSE指令一般都要求16字节对齐。

runtime.rt0_go () at /usr/local/go/src/runtime/asm_amd64.s:92
92              ANDQ    $~15, SP
(gdb) i frame
Stack level 0, frame at 0x7ffeefbff311:
 rip = 0x10607ea in runtime.rt0_go (/usr/local/go/src/runtime/asm_amd64.s:92); saved rip = 0x11bf0
 called by frame at 0x7ffeefbff319
 source language asm.
 Arglist at 0x7ffeefbff301, args: 
 Locals at 0x7ffeefbff301, Previous frame's sp is 0x7ffeefbff311
 Saved registers:
  rip at 0x7ffeefbff309
(gdb) s
runtime.rt0_go () at /usr/local/go/src/runtime/asm_amd64.s:93
93              MOVQ    AX, 16(SP)
(gdb) i frame
Stack level 0, frame at 0x7ffeefbff308:
 rip = 0x10607ee in runtime.rt0_go (/usr/local/go/src/runtime/asm_amd64.s:93); saved rip = 0x7ffeefbff328
 called by frame at 0x7ffeefbff310
 source language asm.
 Arglist at 0x7ffeefbff2f8, args: 
 Locals at 0x7ffeefbff2f8, Previous frame's sp is 0x7ffeefbff308
 Saved registers:
  rip at 0x7ffeefbff300

下面是针对g0的一些操作也是初始goroutine, g0的栈初始大小大约64k,从下面的代码中可以看到g_stackguard0是开启CGO时会用到的。

	MOVQ	$runtime·g0(SB), DI //将g0放到 DI
	LEAQ	(-64*1024+104)(SP), BX // 将SP-64*1024+104的地址放到BX
	MOVQ	BX, g_stackguard0(DI) // 将BX赋值给g0.g_stackguard0
	MOVQ	BX, g_stackguard1(DI) // 将BX赋值给g0.g_stackguard1
	MOVQ	BX, (g_stack+stack_lo)(DI) // 将BX赋值g0.g_stack.stack_lo goroutine栈底部
	MOVQ	SP, (g_stack+stack_hi)(DI) // 将SP赋值g0.g_stack.stack_hi goroutine栈顶部

再往下就是关于cpu信息的处理以及CGO的初始化,这部分就略过,接着往下看是根据操作系统类型判断是否进行TLS的初始化,如果不满足这几个操作系统就执行TLS初始化并校验是否支持TLS,满足就直接执行ok部分的代码。

#ifdef GOOS_plan9
	// skip TLS setup on Plan 9
	JMP ok
#endif
#ifdef GOOS_solaris
	// skip TLS setup on Solaris
	JMP ok
#endif
#ifdef GOOS_illumos
	// skip TLS setup on illumos
	JMP ok
#endif
#ifdef GOOS_darwin
	// skip TLS setup on Darwin
	JMP ok
#endif
    // 将m0的m_tls地址放到DI寄存器
	LEAQ	runtime·m0+m_tls(SB), DI
    // 对m0设置tls
	CALL	runtime·settls(SB)

	// store through it, to make sure it works
    // 将tls地址放到寄存器BX,也就是m0.m_tls[1]的地址
	get_tls(BX)
    // 把常量0x123拷贝到BX保存的地址指向的位置也就是m0.m_tls
	MOVQ	$0x123, g(BX)
    // 将m0.m_tls的值拷贝到AX
	MOVQ	runtime·m0+m_tls(SB), AX
    // 比较是否相等,支持TLS的话这里就是相等的
	CMPQ	AX, $0x123
	JEQ 2(PC)
    // 不支持TLS就退出程序
	CALL	runtime·abort(SB)

继续看ok部分的代码,这部分主要是进行g0和m0的绑定,变量类型校验,获取命令行参数,进行osinit,schedinit,最后启动一个新的goroutine,执行main函数

ok:
	// set the per-goroutine and per-mach "registers"
    // 进行g0和m0的双向绑定
	get_tls(BX)
	LEAQ	runtime·g0(SB), CX
	MOVQ	CX, g(BX)
	LEAQ	runtime·m0(SB), AX
	// save m->g0 = g0
	MOVQ	CX, m_g0(AX)
	// save m0 to g0->m
	MOVQ	AX, g_m(CX)

	CLD				// convention is D is always left cleared
    // 进行变量类型校验
	CALL	runtime·check(SB)

    // 解析命令行参数
	MOVL	16(SP), AX		// copy argc
	MOVL	AX, 0(SP)
	MOVQ	24(SP), AX		// copy argv
	MOVQ	AX, 8(SP)
	CALL	runtime·args(SB)
    // 进行系统信息获取,cpu核数,内存页大小
	CALL	runtime·osinit(SB)
    // 进行各种初始化内存分配,gc等
	CALL	runtime·schedinit(SB)

	// 启动系统监控任务
	MOVQ	$runtime·mainPC(SB), AX		// entry
	PUSHQ	AX
	PUSHQ	$0			// arg size
    // 创建一个新的goroution放到p中
	CALL	runtime·newproc(SB)
	POPQ	AX
	POPQ	AX

	// 启动m,执行调度循环,执行goroutine
	CALL	runtime·mstart(SB)

	CALL	runtime·abort(SB)	// mstart should never return
	RET

	// Prevent dead-code elimination of debugCallV1, which is
	// intended to be called by debuggers.
	MOVQ	$runtime·debugCallV1(SB), AX
	RET

小结

大概了解了go启动流程,其中runtime.schedinit是启动过程内容最多的一块,下一步需要详细看下这部分的内容。