dushenda

李德胜大大粉丝

dushenda

报文格式

以太网物理帧

packet-beta
title 以太网物理层头
0-11: "Inter Frame Gap(至少12Byte)"
12-18: "Preamble"
19-20: "FSD"
字段 长度 含义
帧间隙 至少12字节 每个以太帧之间都要有帧间隙(Inter Frame Gap),即每发完一个帧后要等待一段时间才能再发另外一个帧,以便让帧接收者对接收的帧做必要的处理(如调整缓存的指针、更新计数、通知对报文进行处理等等)。在以太网标准中规定最小帧间隙是12个字节,其数据为全1。对于个别的接口,可减少到64(GE)或40比特(10GE),其他的接口都不应该小于12字节。
前同步码 7字节 以太网标准中规定前导码为10101010 10101010 10101010 10101010 10101010 10101010 10101010(二进制),共7字节。
帧开始定界符 1字节 以太网标准中规定帧开始定界符为10101011(二进制),共1字节。

以太网帧头

packet-beta
title Ethernet II
0-5: "DMAC"
6-11: "SMAC"
12-20: "Data(46~1500)"
21-24: "FCS"
字段 长度 含义
DMAC 6字节 目的MAC地址,IPv4为6字节,该字段标识帧的接收者。
SMAC 6字节 源MAC地址,IPv4为6字节,该字段标识帧的发送者。
Type 2字节 协议类型。表1-3列出了链路直接封装的协议类型。
Data 46~1500字节 数据字段,标识帧的负载(可能包含填充位)。

数据字段的最小长度必须为46字节以保证帧长至少为64字节,这意味着传输1字节信息也必须使用46字节的数据字段。

如果填入该字段的信息少于46字节,该字段的其余部分也必须进行填充。数据字段的最大长度为1500字节。

以太帧的长度必须为整数字节,因此帧的负载长度不足整数字节,需插入填充字段以保证数据帧的长度为整数字节。
FCS 4字节 帧校验序列FCS(Frame Check Sequence)是为接收者提供判断是否传输错误的一种方法,如果发现错误,丢弃此帧。

FCS只是通用叫法,具体的FCS还可以细分多种校验方法。在以太帧中,FCS通常采用循环冗余码校验CRC(Cyclical Redundancy Check)。

TCP报文头

packet-beta
title "TCP Packet"
0-15: "Source Port"
16-31: "Destination Port"
32-63: "Sequence Number"
64-95: "Acknowledgment Number"
96-99: "Data Offset"
100-105: "Reserved"
106: "URG"
107: "ACK"
108: "PSH"
109: "RST"
110: "SYN"
111: "FIN"
112-127: "Window"
128-143: "Checksum"
144-159: "Urgent Pointer"
160-191: "(Options and Padding)"
192-255: "Data (<=MSS)"
字段 长度 含义
Source Port 16比特 源端口,标识哪个应用程序发送。
Destination Port 16比特 目的端口,标识哪个应用程序接收。
Sequence Number 32比特 序号字段。TCP链接中传输的数据流中每个字节都编上一个序号。序号字段的值指的是本报文段所发送的数据的第一个字节的序号。
Acknowledgment Number 32比特 确认号,是期望收到对方的下一个报文段的数据的第1个字节的序号,即上次已成功接收到的数据字节序号加1。只有ACK标识为1,此字段有效。
Data Offset 4比特 数据偏移,即首部长度,指出TCP报文段的数据起始处距离TCP报文段的起始处有多远,以32比特(4字节)为计算单位。最多有60字节的首部,若无选项字段,正常为20字节。
Reserved 4比特 保留,必须填0。
CWR 1比特 拥塞窗口减少标识
ECE 1比特 ECN回声标识
URG 1比特 紧急指针有效标识。它告诉系统此报文段中有紧急数据,应尽快传送(相当于高优先级的数据)。
ACK 1比特 确认序号有效标识。只有当ACK=1时确认号字段才有效。当ACK=0时,确认号无效。
PSH 1比特 标识接收方应该尽快将这个报文段交给应用层。接收到PSH = 1的TCP报文段,应尽快的交付接收应用进程,而不再等待整个缓存都填满了后再向上交付。
RST 1比特 重建连接标识。当RST=1时,表明TCP连接中出现严重错误(如由于主机崩溃或其他原因),必须释放连接,然后再重新建立连接。
SYN 1比特 同步序号标识,用来发起一个连接。SYN=1表示这是一个连接请求或连接接受请求。
FIN 1比特 发端完成发送任务标识。用来释放一个连接。FIN=1表明此报文段的发送端的数据已经发送完毕,并要求释放连接。
Window 16比特 窗口:TCP的流量控制,窗口起始于确认序号字段指明的值,这个值是接收端期望接收的字节数。窗口最大为65535字节。
Checksum 16比特 校验字段,包括TCP首部和TCP数据,是一个强制性的字段,一定是由发端计算和存储,并由收端进行验证。在计算检验和时,要在TCP报文段的前面加上12字节的伪首部。
Urgent Pointer 16比特 紧急指针,只有当URG标志置1时紧急指针才有效。TCP的紧急方式是发送端向另一端发送紧急数据的一种方式。紧急指针指出在本报文段中紧急数据共有多少个字节(紧急数据放在本报文段数据的最前面)。
Options 可变 选项字段。TCP协议最初只规定了一种选项,即最长报文段长度(只包含数据字段,不包括TCP首部),又称为MSS。MSS告诉对方TCP“我的缓存所能接收的报文段的数据字段的最大长度是MSS个字节”。

新的RFC规定有以下几种选型:选项表结束,空操作,最大报文段长度,窗口扩大因子,时间戳。

- 选项表结束。
- 空操作:没有特殊含义,一般用于将TCP选项的总长度填充为4字节的整数倍。
- 最大报文段长度:又称为MSS,只包含数据字段,不包括TCP首部。
- 窗口扩大因子:3字节,其中一个字节表示偏移值S。新的窗口值等于TCP首部中的窗口位数增大到(16+S),相当于把窗口值向左移动S位后获得实际的窗口大小。
- 时间戳:10字节,其中最主要的字段是时间戳值(4字节)和时间戳回送应答字段(4字节)。
data 可变 TCP负载。小于等于MSS

MTU、MSS、PMTU和分片

以太网的MTU 1500B、PPPoE 1492B、ATM 9180B。TCP中分割之后的段叫TCP报文段,这时候能够使用的最大报文段称为MSS。

(P)MTU = IP_head + TCP_head + MSS

TCP需要获取PMTU大小,用来规划自己的MSS,PMTU是通过IP层的ICMP协议获取的。分片是IP层完成的事情,如果TCP MSS规划不合理,那么送到IP层的报文依旧存在大报文,会走IP分片报文。

TCP几个窗口概念

TCP窗口大小的单位是字节​ 。这意味着窗口大小衡量的是可以发送的数据字节数,而不是报文段或帧的数量。

下表总结了TCP通信中几个关键的窗口类型:

窗口类型 英文全称 控制方 核心作用
接收窗口 (RWND) Receiver Window 接收端 流量控制,反映接收方应用程序的缓存剩余空间
拥塞窗口 (CWND) Congestion Window 发送端 拥塞控制,根据网络状况动态调整,避免造成网络过载
发送窗口 (SWND) Sender Window 发送端 实际发送数据的上限,其大小由RWND和CWND共同决定
  • 为何以字节为单位:采用字节作为单位提供了极大的灵活性,允许TCP适应各种不同的应用需求(如大文件传输或短小交互命令),而不受底层网络数据包大小分割的限制 。核心目的是实现流量控制,确保发送方的发送速率不会超过接收方的处理能力 。
  • 流量控制过程:接收方通过TCP报文首部中的16位“窗口大小”字段,动态地向发送方通告自己的接收窗口剩余容量 。如果接收方缓冲区已满,它会通告一个零窗口,此时发送方会暂停发送,并通过持续计时器定时发送窗口探测报文,以等待接收方缓冲区空出后恢复传输 。
  • 窗口大小的限制与扩展:由于窗口字段是16位,理论上最大窗口值为65535字节。对于高带宽、高延迟的网络(如卫星链路),这可能成为瓶颈。因此,TCP使用了窗口缩放选项,通过一个缩放因子,可以将实际窗口大小最大扩展到约1GB,从而显著提高吞吐量 。

窗口之间的关系

SWND = min (CWND,RWND)

CWND

CWND是为了避免TCP拥塞产生的,以“不清楚网络情况”为前提,使用几种算法来探测和避免拥塞的。

  • 慢启动
  • 拥塞避免
  • 快速恢复

慢启动

收到一个ACK,就将CWND的值增加一个MSS。那么在几轮RTT的表现下,CWND的增加呈现指数级增长。

CWND = CWND + MSS

xychart-beta
        title "慢启动"
        x-axis "RTT" [0, 1, 2, 3, 4, 5, 6]
        y-axis "CWND" 
        line [1, 2, 4, 8, 16, 32, 64]

拥塞避免

TCP的拥塞避免是TCP拥塞控制算法的核心阶段之一,其主要目的是在网络接近饱和时,平稳地利用带宽,避免因数据注入过快而导致网络拥塞。

CWND = CWND + MSS/CWND

  1. t1~t4:原始cwnd=8,sshthresh=8,慢启动到达cwnd=8
  2. t4~t7:到达cwnd后,开始拥塞避免
  3. t7~t8:有报文超时重传了,cwnd变成1,ssthresh为cwnd一半为5
  4. t8~t10:慢启动算法到cwnd为4
  5. t10~:拥塞避免算法启动
    xychart-beta
            title "拥塞避免"
            x-axis "RTT" [0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14]
            y-axis "CWND" 
            line [0, 1, 2, 4, 8, 9, 10, 1, 2, 4, 5, 6, 7, 8, 9]

快重传

简单来说,快重传通过监听接收端发回的重复确认(Duplicate ACK)来推断数据包可能已丢失。其标准流程如下图所示,它展示了从发现丢包到恢复正常传输的完整过程:

其背后的详细原理和步骤是:

  1. 触发条件三个重复ACK:TCP使用累积确认机制。如果接收方收到了乱序的数据包(例如,包1、3、4、5都到了,但包2没到),它每次收到乱序包时,都会立即重复发送它期望收到的那个序列号的ACK(即针对包2的ACK)。发送方一旦连续收到3个或3个以上针对同一数据的重复ACK,就强烈暗示该数据包已经丢失,而非仅仅是网络延迟或乱序。这时,发送方不会等待重传定时器超时,而是立即行动。
  2. 核心动作立即重传:一旦判定丢包,发送方会立刻重传那个被重复ACK指明的、推测已丢失的数据包

如:

  1. M1~M2正常发送
  2. M3丢失
  3. M4~M6正常发送,但是接收方一直ACK M3
  4. 3个重复的ACK之后发送方对M3重传
  5. 接收方收到M3后ACK M6,完成传输

快重传的3个连续ACK时间要小于RTO

快速恢复

TCP快速恢复是拥塞控制中一项关键机制,它的核心目标是在检测到数据包丢失后(通常是收到三个重复确认ACK),避免让连接退回到慢启动阶段,从而维持较高的网络吞吐量。下面这个表格可以快速抓住演进的精髓。

特性 TCP Tahoe TCP Reno TCP NewReno TCP with SACK
核心机制 仅有快速重传 快速重传 + 基础快速恢复 改进的快速恢复(应对多包丢失) 选择性确认 + 更精确的恢复
触发条件 3个重复ACK或超时 3个重复ACK 3个重复ACK 3个重复ACK(利用SACK信息)
主要动作 ssthresh = cwnd/2, cwnd = 1,进入慢启动 ssthresh = cwnd/2, cwnd = ssthresh + 3,进入快速恢复 ssthresh = cwnd/2,持续重传直至所有丢失包被确认 基于SACK信息精确重传丢失包,使用pipe变量控制发送
多包丢失处理 效率低,引发多次重置 效率低,可能多次进入恢复或超时 显著改善,可在一次恢复期内重传多个丢失包 最优,能同时识别和重传多个丢失包
主要优势 实现简单 避免空窗,提升单包丢失恢复效率 有效处理同一窗口内多个包丢失,减少超时 恢复最精确、延迟最低,吞吐量高
主要局限 网络利用率低,震荡大 多包丢失时性能差,易致超时 每个RTT只能恢复一个丢失包 实现复杂,CPU开销相对高

深入理解工作原理

快速恢复机制建立在几个关键原则和步骤之上:

  1. 触发与初始化:当发送端连续收到3个重复的ACK时,会推断有数据包丢失,从而触发快速恢复。随后,它会将慢启动阈值(ssthresh)设置为当前拥塞窗口(cwnd)的一半(但不小于2),为恢复阶段设定新的窗口上限。此时,拥塞窗口(cwnd)会被设置为新的 ssthresh值加上3个MSS(最大报文段长度),即 cwnd = ssthresh + 3 * MSS。加3是因为收到3个重复ACK,意味着有3个数据包已经离开网络,可以为新数据或重传数据腾出空间。
  2. “数据包守恒”原则:这是快速恢复阶段的核心思想。在恢复期间,每收到一个重复的ACK,就意味着又有一个数据包被成功接收(尽管是乱序的),发送方就可以再发送一个新数据包(无论是重传丢失包还是发送新数据)。因此,cwnd会随着重复ACK的到达而线性增加(通常每个重复ACK使 cwnd增加1个MSS),以此维持网络中的数据流量,避免管道完全排空。
  3. 恢复完成与退出:当发送端收到一个确认了新数据的ACK(即序列号高于触发快速恢复的那个ACK)时,意味着此次丢失的数据包已被成功重传并确认,恢复阶段结束。此时,cwnd会被设置为 ssthresh的值,然后TCP连接进入拥塞避免阶段,cwnd开始线性增长。

算法演进与比较

不同的TCP版本对快速恢复的实现有所差异,了解其演进有助于理解为何NewReno和SACK更为高效:

  • Tahoe的局限:TCP Tahoe版本仅有快速重传而没有快速恢复。一旦检测到丢包(3个重复ACK或超时),无论原因为何,它都会将 cwnd重置为1,并重新进入慢启动阶段。这种“一刀切”的方式在非严重拥塞的随机丢包情况下会严重降低网络利用率,造成吞吐量急剧下降。
  • Reno的改进与不足:TCP Reno引入了快速恢复阶段,在单数据包丢失场景下表现良好。但其主要缺陷在于处理同一发送窗口内多个数据包丢失时显得力不从心。如果同一个窗口内有多个包丢失,Reno在收到第一个新数据的ACK(即部分确认,Partial ACK)后就会退出快速恢复。对于后续丢失的包,可能因为没有足够的重复ACK来再次触发快速重传,而不得不等待超时,这会导致性能下降。
  • NewReno的增强:TCP NewReno针对Reno的缺陷进行了关键改进。在快速恢复期间,即使收到部分确认(Partial ACK),它也不会立即退出快速恢复状态。而是会重传该部分确认指示的下一个丢失包,并继续停留在快速恢复阶段,直到该窗口内所有在快速恢复开始时未被确认的数据包都被成功确认后,才退出恢复。这确保了一次性能恢复多个丢失的数据包,显著减少了超时发生的概率。
  • SACK的精确控制:选择性确认(SACK)选项允许接收方明确告知发送方哪些数据块(包括不连续的)已经成功接收。这使得发送方能够精确知道哪些包真正丢失,从而在一个RTT内重传所有已知的丢失包,大大加快了恢复速度。SACK在快速恢复中使用一个名为 pipe的变量来估算网络中正在传输的数据包数量,从而更精细地控制发送时机。因此,SACK在处理多包丢失时效率最高。

例子

总结

拥塞避免算法建立在以下几个关键概念之上:

  • 核心目标:其核心思想是谨慎地探测网络剩余带宽。与慢启动阶段的指数增长不同,在拥塞避免阶段,发送方每收到一个确认(ACK),拥塞窗口(cwnd)大约增加 1/cwnd 个最大报文段(MSS)。这样,每个往返时延(RTT)内,cwnd仅线性增加1个MSS,增长曲线变得平缓,从而有效避免网络过载。
  • 触发条件:当拥塞窗口(cwnd)达到或超过慢启动阈值(ssthresh) 时,TCP就会从慢启动阶段切换到拥塞避免阶段。这个ssthresh值并非固定不变,它会根据网络状况动态调整。
  • 对拥塞的响应:当网络出现拥塞迹象(如发生超时或收到重复确认)时,TCP会采取行动。具体来说,会将ssthresh更新为发生拥塞时窗口值的一半(但不小于2个MSS),然后根据不同的拥塞指示进入不同的处理阶段。
    flowchart LR
        A[TCP连接建立] --> B[慢启动<br>cwnd指数增长]
        B --> C{到达ssthresh?}
        C -- 是 --> D[拥塞避免<br>cwnd线性增长]
        D -- 收到3个重复ACK --> E[快速重传]
        E --> F[快速恢复<br>ssthresh = cwnd/2<br>cwnd = ssthresh + 3]
        F -- 收到新数据的ACK --> D
        D -- 超时重传 --> G[ssthresh = cwnd/2<br>cwnd重置为1]
        G --> B

使用方法

1
probe[,probe,...] /filter/ { action }

探针指定要检测哪些事件,过滤器是可选的,可以根据布尔表达式筛选事件,而操作是要运行的小程序。

例子hello world:

1
bpftrace -e 'BEGIN { printf("Hello eBPF!\n"); }'
探针是BEGIN,这是一个特殊的探针,会在程序开始时运行(类似于 awk)。没有过滤器。操作是printf()语句。

举个实际例子:

1
bpftrace -e 'kretprobe:vfs_read /pid == 181/ { @bytes = hist(retval); }'
这段代码使用 kretprobe 来检测 sys_read() 内核函数的返回值。如果进程 ID 为 181,则会将一个特殊的映射变量@bytes填充为一个以 sys_read() 返回值retval为参数的 log2 直方图函数。这样就能生成进程 ID 为 181 的读取大小的直方图。你的应用是否执行了大量 1 字节的读取操作?或许可以对此进行优化。

探针类型

这些是相关的探针库。目前支持的类型有:

Alias Type Description
t tracepoint Kernel static instrumentation points
U usdt User-level statically defined tracing
k kprobe Kernel dynamic function instrumentation (standard)
kr kretprobe Kernel dynamic function return instrumentation (standard)
f kfunc Kernel dynamic function instrumentation (BPF based)
fr kretfunc Kernel dynamic function return instrumentation (BPF based)
u uprobe User-level dynamic function instrumentation
ur uretprobe User-level dynamic function return instrumentation
s software Kernel software-based events
h hardware Hardware counter-based instrumentation
w watchpoint Memory watchpoint events
p profile Timed sampling across all CPUs
i interval Timed reporting (from one CPU)
iter Iterator tracing over kernel objects
BEGIN Start of bpftrace
END End of bpftrace

动态插桩允许在不重启程序的情况下跟踪正在运行的二进制文件中的任何软件函数。但是,它所暴露的函数并非稳定的 API,因为它们可能会随软件版本的变化而改变,从而导致您开发的 bpftrace 工具失效。请尽可能使用静态探针类型,因为它们通常能提供一定的稳定性。

变量类型

Variable Description
@name global
@name [key] hash
@name [tid] thread-local
$name scratch

以“@”为前缀的变量使用 BPF 映射,其行为类似于关联数组。它们可以通过以下两种方式之一进行填充:

  • 变量赋值:@name = x;
  • 函数赋值:@name = hist(x); 内置了多种地图填充函数,可以快速汇总数据。

内置变量

Variable Description
pid Process ID
tid Thread ID
uid User ID
username Username
comm Process or command name
curtask Current task_struct as a u64
nsecs Current time in nanoseconds
elapsed Time in nanoseconds since bpftrace start
kstack Kernel stack trace
ustack User-level stack trace
arg0…argN Function arguments
args Tracepoint arguments
retval Function return value
func Function name
probe Full probe name
1...N Positional parameters
cgroup Default cgroup v2 ID

内置函数

Function Description
printf(“…”) Print formatted string
time(“…”) Print formatted time
join(char *arr[]) Join array of strings with a space
str(char *s [, int length]) Return string from s pointer
buf(void *p [, int length]) Return a hexadecimal string from p pointer
strncmp(char s1, char s2, int length) Compares two strings up to length
sizeof(expression) Returns the size of the expression
kstack([limit]) Kernel stack trace up to limit frames
ustack([limit]) User-level stack trace up to limit frames
ksym(void *p) Resolve kernel address to symbol
usym(void *p) Resolve user-space address to symbol
kaddr(char *name) Resolve kernel symbol name to address
uaddr(char *name) Resolve user-space symbol name to address
ntop([int af,]int|char[4:16] addr) Convert IP address data to text
reg(char *name) Return register value
cgroupid(char *path) Return cgroupid for /sys/fs/cgroup/… path
time(“…”) Print formatted time
system(“…”) Run shell command
cat(char *filename) Print file content
signal(char[] sig | int sig) Send a signal to the current task
override(u64 rc) Override a kernel function return value
exit() Exits bpftrace
@ = count() Count events
@ = sum(x) Sum the value
@ = hist(x) Power-of-2 histogram for x
@ = lhist(x, min, max, step) Linear histogram for x
@ = min(x) Record the minimum value seen
@ = max(x) Record the maximum value seen
@ = stats(x) Return the count, average, and total for this value
delete(@x [key]) Delete the map element
clear(@x) Delete all keys from the map

https://jvns.ca/perf-cheat-sheet.pdf

https://paulgorman.org/technical/linux-iproute2-cheatsheet.html

https://access.redhat.com/sites/default/files/attachments/rh_ip_command_cheatsheet_1214_jcs_print.pdf

https://linux-audit.com/cheat-sheets/ip/#subcommands

ip和ifconfig比较

修改ip地址

管理路由表

管理ARP表

查询ip地址

多播管理

https://zwischenzugs.com/2018/01/06/ten-things-i-wish-id-known-about-bash/

1)  `` vs $()

这两完成了一样的事情:

1
2
$ echo `ls`
$ echo $(ls)
`` 是Unix时代引入的,$()更现代和易读易写。 比如:
1
2
$ echo `echo \`echo \\\`echo inside\\\`\``
$ echo $(echo $(echo $(echo inside)))

2) globbing vs regexps

glob和regexps是不一样的东西,glob是通配符,regexp是正则表达式。 看以下的表达式:

1
$ rename -n 's/(.*)/new$1/' *
这里有两个*号 - 第一个在引号内部,作为正则表达式的一部分 - 第二个在外部,作为通配符的部分
1
2
$ ls *
$ ls .*
所以第二个看着像是正则表达式,但是实际上并不是

3) Exit Codes

所有的命令都会返回值给shell

1
2
$ grep not_there /dev/null
$ echo $?
- 0就是命令执行正确,非0就是执行失败 - $?是一个特殊的符号,用来从shell里获取上一条命令的返回值

4) if statements, [ and [[

  • [(单方括号)实际上是test命令的别名,是一个内置命令。它用于基本的条件测试,需要遵循严格的语法格式(如括号内必须有空格)。
  • [[(双方括号)​ 是Bash的关键字,提供了比[更强大的功能。它不是普通命令,而是Bash的语法结构,因此具有更灵活和直观的行为。
    1
    2
    3
    4
    5
    6
    7
    8
    9
    # 使用[(需要引号防止空变量错误)
    if [ "$name" = "John" ]; then
    echo "Hello John"
    fi

    # 使用[[(更安全,无需引号)
    if [[ $name == "John" ]]; then
    echo "Hello John"
    fi
    举个例子
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    # 使用[]
    if [ $(grep not_there /dev/null) = '' ]
    then
        echo -n hi
    else
        echo -n lo
    fi

    # 使用[[]]
    if [[ $(grep not_there /dev/null) = '' ]]
    then
        echo -n hi
    else
        echo -n lo
    fi
  • 第一个会报错,因为$(grep not_there /dev/null)返回值是空,那么最后就会变成[ = '' ]直接报错,这就是为什么经常能看到这些语句
    1
    if [ x$(grep not_there /dev/null) = 'x' ]
  • 推荐使用第二种用法,因为它更安全、功能更丰富

5) set

当脚本中任何一条命令的退出状态码($?非零(即执行失败)时,脚本立即终止运行,后续命令不再执行

1
set -e
建议使用trap来做一些清理工作,例如
1
2
3
4
5
6
7
8
9
#!/bin/bash
set -e

cleanup() {
echo "脚本在第 $1 行出错,正在执行清理..."
# 清理临时文件等操作
}
# 设置陷阱,在收到 ERR 信号时(即命令失败)调用 cleanup 函数
trap 'cleanup $LINENO' ERR
set -e管道命令的盲点:默认只检查管道中最后一个命令的退出状态。如果管道中前面的命令失败,但最后一个命令成功,脚本不会退出 。解决方案是结合 set -o pipefail。推荐设置如下
1
2
#!/bin/bash
set -euo pipefail # -e: 出错退出; -u: 使用未定义变量时报错; -o pipefail: 处理管道错误
调试命令,打印执行的命令
1
set -x
因此这个是设置一些选项
1
2
3
4
5
#!/bin/bash
set -e
set -x
grep not_there /dev/null <-- 错误了之后直接退出
echo $?

6) ​​<()

<()输出的是文件,比如

1
2
3
4
5
6
7
# diff比较grep完file1和file2的结果
$ grep somestring file1 > /tmp/a
$ grep somestring file2 > /tmp/b
$ diff /tmp/a /tmp/b

# 直接等价于下面的命令,更优雅
diff <(grep somestring file1) <(grep somestring file2)

7) Quoting

引用在 bash 中是一个棘手的主题,就像在许多软件环境中一样。

1
2
3
A='123'  
echo "$A" # <-- 123
echo '$A' # <-- $A
非常简单——双引号取消引用变量,而单引号则直接引用 下面的表格清晰地总结了两者的主要差异:

特性 单引号 (') 双引号 (")
变量替换 不进行替换,原样输出 进行替换,输出变量值
命令替换 不执行,原样输出 执行,输出命令结果
转义字符 不解析(也视为普通字符) | 解析(如 `\n`, `\t`等,需配合 `echo -e`) | | **特殊字符**​ | 所有字符(`$`, ``, ``等)均失去特殊含义 | 仅部分字符($, `` , `"`,)保留特殊含义

简单来说,单引号提供严格的字面意义保护,而双引号在保护字符串整体的同时,允许变量和命令等动态内容的插入。

1
2
3
4
5
mkdir -p tmp
cd tmp
touch a
echo "*" # 输出"*"
echo '*' # 输出"*",*不保留特殊含义

  • 使用单引号 ('):当你的字符串是纯粹的文本常量,不需要任何变量或命令替换,也不需要解析转义字符时。例如,固定的提示信息、符号文字等。
  • 使用双引号 ("):当你的字符串中需要包含变量、命令执行结果,或者需要处理包含空格的参数时。这是更常见的情况,也是更安全的做法,因为它可以防止字符串被意外分割。
  • 避免无引号:除非是极简单的连续字符(如数字、路径),否则为变量赋值或传递参数时,强烈建议总是使用引号,这能有效预防许多难以排查的错误。

8) Top three shortcuts

功能 语法 作用 经典使用场景
获取最后参数 !$ 代表上一条命令的最后一个参数 当你执行 ls /a/very/long/path后,想进入该目录,只需 cd !$,等效于 cd /a/very/long/path
获取参数范围 !:1-$ 代表上一条命令中从第1个到最后一个的所有参数 误将 tar -czf archive.tar.gz file1 file2打成了 zip,可快速改为 tar -czf archive.tar.gz !:1-$
提取目录路径 :h
(修饰符)
移除路径中的最后一级(文件名或目录名),返回纯目录路径。 操作文件失败时(如 cat /etc/nginx/sites-available/my_site),用 cd !$:h直接切换到文件所在目录 /etc/nginx/sites-available查看。

!$:特殊变量,上一条命令的最后一个参数。如果你在处理文件,懒得一遍又一遍地重新输入命令,这可以省下很多工作:

1
2
grep somestring /long/path/to/some/file/or/other.txt
vi !$
!:1-$:这个组合使这更进一步。 它获取上一个命令的所有参数并将它们放入。所以:
1
2
3
grep isthere /long/path/to/some/file/or/other.txt
egrep !:1-$
fgrep !:1-$
> :是分隔符,1-$指的是第一个到最后一个参数,!指的是上一个命令

我也经常使用这个。 如果将其放在文件名后面,它将更改该文件名以删除该文件夹中的所有内容。 像这样:

1
2
grep isthere /long/path/to/some/file/or/other.txt
cd !$:h # 进入了/long/path/to/some/file/or/ 目录

9) startup order

bash的启动顺序如下图 它显示 bash 决定根据有关 bash 运行上下文的决策(决定要遵循的颜色)从顶部运行哪些脚本。

因此,如果处于本地(非远程)、非登录、交互式 shell 中(例如,当您从命令行运行 bash 本身时),您就位于“绿色”行,这些是读取文件的顺序:

1
2
3
4
/etc/bash.bashrc
~/.bashrc
# [bash runs, then terminates]
~/.bash_logout

10) getopts (cheapci)

Bash 的 getopts是一个内置命令,用于在 Shell 脚本中规范地解析命令行选项和参数。

getopts的基本命令格式为:

1
getopts optstring name [args]

  • optstring:定义脚本识别的选项字符。

    • 如果一个字符后跟冒号(:),表示该选项需要参数,如 a:表示 -a需要跟一个参数。
    • 如果 optstring以冒号(:)开头,则开启静默错误模式,抑制默认错误信息,便于自定义错误处理。
  • name:每次调用 getopts时,会将解析到的选项字符(不带 -)存入该变量。

  • [args]:可选参数。若提供,getopts会解析这些参数而非脚本的位置参数($1, $2, ...)。 getopts依赖两个重要的环境变量:

  • OPTARG:当选项需要参数时,该参数值存放在 OPTARG变量中。

  • OPTIND:保存下一个要被处理的参数索引。初始值为 1,处理完选项后,可用 shift $(($OPTIND - 1))跳过已处理的选项,使 $1指向第一个非选项参数。

通常将 getopts放入 while循环,结合 case语句处理不同选项。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#!/bin/bash

verbose=false
input_file=""
output_dir=""

while getopts ":vi:o:" opt; do
case $opt in
v)
verbose=true
echo "详细模式已开启"
;;
i)
input_file="$OPTARG"
echo "输入文件设置为: $input_file"
;;
o)
output_dir="$OPTARG"
echo "输出目录设置为: $output_dir"
;;
\?)
echo "错误:不支持的选项 -$OPTARG" >&2
exit 1
;;
:)
echo "错误:选项 -$OPTARG 需要一个参数" >&2
exit 1
;;
esac
done

# 处理剩余的非选项参数
shift $(($OPTIND - 1))

if [ $# -gt 0 ]; then
echo "剩余的非选项参数: $@"
fi
脚本用法示例
1
2
3
4
5
$ ./myscript.sh -i data.txt -o /tmp/output -v file1 file2
输入文件设置为: data.txt
输出目录设置为: /tmp/output
详细模式已开启
剩余的非选项参数: file1 file2

  1. 选项处理顺序与组合 getopts按顺序解析选项,遇到非选项参数(不以 -开头)或 --时停止。选项可以组合,如 -vi file等效于 -v -i file

  2. 错误处理模式 通过在 optstring前加冒号(:)开启静默模式。此模式下:

    • 遇到无效选项,name被设置为 ?OPTARG为无效选项字符。
    • 遇到缺少参数的选项,name被设置为 :OPTARG为对应选项字符。
  3. 处理完选项后,使用 shift $(($OPTIND - 1))来移除已处理的选项和其参数,便于后续处理剩余的非选项参数。

  4. 重置 OPTIND 如果在同一脚本中需要多次调用 getopts解析不同参数集,必须在每次解析新参数集前手动重置 OPTIND=1,因为 Shell 不会自动重置它。

  • getopts主要用于解析短选项(如 -a, -l)。虽然可以通过一些技巧模拟处理长选项(如 --help),但过程较为复杂,通常需要借助 getopt命令(注意,不是内置的 getopts)。
  • getopts是 Shell 内置命令,执行效率高。而 getopt是外部命令,功能更强大(如直接支持长选项),但使用也更复杂,且不同系统上的实现可能有差异。

If you go deep with bash, you might end up writing chunky utilities in it. If you do, then getting to grips with getopts can pay large dividends.

For fun, I once wrote a script called cheapci which I used to work like a Jenkins job.

The code here implements the reading of the two required, and 14 non-required arguments. Better to learn this than to build up a bunch of bespoke code that can get very messy pretty quickly as your utility grows.

Bash快捷键

Moving

command description
ctrl + a Goto BEGINNING of command line
ctrl + e Goto END of command line
ctrl + b move back one character
ctrl + f move forward one character
alt + f move cursor FORWARD one word
alt + b move cursor BACK one word
ctrl + xx Toggle between the start of line and current cursor position
ctrl + ] + x Where x is any character, moves the cursor forward to the next occurance of x
alt + ctrl + ] + x Where x is any character, moves the cursor backwards to the previous occurance of x

Edit / Other

command description
ctrl + d Delete the character under the cursor
ctrl + h Delete the previous character before cursor
ctrl + u Clear all / cut BEFORE cursor
ctrl + k Clear all / cut AFTER cursor
ctrl + w delete the word BEFORE the cursor
alt + d delete the word FROM the cursor
ctrl + y paste (if you used a previous command to delete)
ctrl + i command completion like Tab
ctrl + l Clear the screen (same as clear command)
ctrl + c kill whatever is running
ctrl + d Exit shell (same as exit command when cursor line is empty)
ctrl + z Place current process in background
ctrl + _ Undo
ctrl + x ctrl + u Undo the last changes. ctrl+ _ does the same
ctrl + t Swap the last two characters before the cursor
esc + t Swap last two words before the cursor
alt + t swap current word with previous
esc + .
esc + _
alt + [Backspace] delete PREVIOUS word
alt + < Move to the first line in the history
alt + > Move to the end of the input history, i.e., the line currently being entered
alt + ? display the file/folder names in the current path as help
alt + * print all the file/folder names in the current path as parameter
alt + . print the LAST ARGUMENT (ie “vim file1.txt file2.txt” will yield “file2.txt”)
alt + c capitalize the first character to end of word starting at cursor (whole word if cursor is at the beginning of word)
alt + u make uppercase from cursor to end of word
alt + l make lowercase from cursor to end of word
alt + n
alt + p Non-incremental reverse search of history.
alt + r Undo all changes to the line
alt + ctl + e Expand command line.
~[TAB][TAB] List all users
$[TAB][TAB] List all system variables
@[TAB][TAB] List all entries in your /etc/hosts file
[TAB] Auto complete
cd - change to PREVIOUS working directory

History

command description
ctrl + r Search backward starting at the current line and moving ‘up’ through the history as necessary
crtl + s Search forward starting at the current line and moving ‘down’ through the history as necessary
ctrl + p Fetch the previous command from the history list, moving back in the list (same as up arrow)
ctrl + n Fetch the next command from the history list, moving forward in the list (same as down arrow)
ctrl + o Execute the command found via Ctrl+r or Ctrl+s
ctrl + g Escape from history searching mode
!! Run PREVIOUS command (ie sudo !!)
!vi Run PREVIOUS command that BEGINS with vi
!vi:p Print previously run command that BEGINS with vi
!n Execute nth command in history
!$ Last argument of last command
!^ First argument of last command
abcxyz Replace first occurance of abc with xyz in last command and execute it

常用命令

https://github.com/jlevy/the-art-of-command-line/tree/master

bash命令行

  • 如果你输入命令的时候中途改了主意,按下 alt-# 在行首添加 # 把它当做注释再按下回车执行(或者依次按下 ctrl-a, #, enter)。这样做的话,之后借助命令行历史记录,你可以很方便恢复你刚才输入到一半的命令。
    1
    2
    ➜  ~ # yum install gcc   <--前面加#
    ➜ ~ # yum install gcc <-- 按上键
  • 使用 netstat -lntp 或 ss -plat 检查哪些进程在监听端口(默认是检查 TCP 端口; 添加参数 -u 则检查 UDP 端口)或者 lsof -iTCP -sTCP:LISTEN -P -n (这也可以在 OS X 上运行)。
  • pstree -p 以一种优雅的方式展示进程树。
  • 使用 nohup 或 disown 使一个后台进程持续运行。
  • 使用 pgrep 和 pkill 根据名字查找进程或发送信号(-f 参数通常有用)。
  • 将 shell 切换为其他用户,使用 su username 或者 su - username。加入 - 会使得切换后的环境与使用该用户登录后的环境相同。省略用户名则默认为 root。切换到哪个用户,就需要输入_哪个用户的_密码。
  • 可以把别名、shell 选项和常用函数保存在 ~/.bashrc,具体看下这篇文章。这样做的话你就可以在所有 shell 会话中使用你的设定。
  • 把环境变量的设定以及登陆时要执行的命令保存在 ~/.bash_profile。而对于从图形界面启动的 shell 和 cron 启动的 shell,则需要单独配置文件。
  • shell PS1显示
    1
    2
    3
    4
    5
    6
    7
    8
    9
    # 1. 编辑配置文件(以当前用户的 ~/.bashrc 为例)
    vim ~/.bashrc

    # 2. 在文件末尾添加一行,例如:
    # 显示效果类似于:[Sun Dec 21 14:05:30] user@server:~$
    export PS1="[\d \t] \u@\h:\W\$ "

    # 3. 让配置立即生效
    source ~/.bashrc

文件及数据处理

  • wc 去计算新行数(-l),字符数(-m),单词数(-w)以及字节数(-c)。
  • 使用 tee 将标准输入复制到文件甚至标准输出,例如 ls -al | tee file.txt
  • 替换一个或多个文件中出现的字符串:
    1
    perl -pi.bak -e 's/old-string/new-string/g' my-files-*.txt
  • rsync 是一个快速且非常灵活的文件复制工具。它闻名于设备之间的文件同步,但其实它在本地情况下也同样有用。在安全设置允许下,用 rsync 代替 scp 可以实现文件续传,而不用重新从头开始。它同时也是删除大量文件的最快方法之一:
    1
    mkdir empty && rsync -r --delete empty/ some-dir && rmdir some-dir
  • 标准的源代码对比及合并工具是 diff 和 patch。使用 diffstat 查看变更总览数据。注意到 diff -r 对整个文件夹有效。使用 diff -r tree1 tree2 | diffstat 查看变更的统计数据。vimdiff 用于比对并编辑文件。
  • 对于二进制文件,使用 hdhexdump 或者 xxd 使其以十六进制显示,使用 bvihexedit 或者 biew 来进行二进制编辑。
  • 同样对于二进制文件,strings(包括 grep 等工具)可以帮助在二进制文件中查找特定比特。
  • 拆分文件可以使用 split(按大小拆分)和 csplit(按模式拆分)。
  • 操作日期和时间表达式,可以用 dateutils 中的 dateadddatediffstrptime 等工具。
  • 使用 zlesszmorezcat 和 zgrep 对压缩过的文件进行操作。
  • 文件属性可以通过 chattr 进行设置,它比文件权限更加底层。例如,为了保护文件不被意外删除,可以使用不可修改标记:sudo chattr +i /critical/directory/or/file
  • 使用 getfacl 和 setfacl 以保存和恢复文件权限。例如:
    1
    2
    getfacl -R /some/path > permissions.txt
    setfacl --restore=permissions.txt
  • 为了高效地创建空文件,请使用 truncate(创建稀疏文件),fallocate(用于 ext4,xfs,btrf 和 ocfs2 文件系统),xfs_mkfile(适用于几乎所有的文件系统,包含在 xfsprogs 包中),mkfile(用于类 Unix 操作系统,比如 Solaris 和 Mac OS)。

系统调试

  • curl 和 curl -I 可以被轻松地应用于 web 调试中,它们的好兄弟 wget 也是如此,或者也可以试试更潮的 httpie
  • 获取 CPU 和硬盘的使用状态,通常使用使用 tophtop 更佳),iostat 和 iotop。而 iostat -mxz 15 可以让你获悉 CPU 和每个硬盘分区的基本信息和性能表现。
  • 使用 netstat 和 ss 查看网络连接的细节。
  • dstat 在你想要对系统的现状有一个粗略的认识时是非常有用的。然而若要对系统有一个深度的总体认识,使用 glances,它会在一个终端窗口中向你提供一些系统级的数据。
  • 若要了解内存状态,运行并理解 free 和 vmstat 的输出。值得留意的是“cached”的值,它指的是 Linux 内核用来作为文件缓存的内存大小,而与空闲内存无关。
  • Java 系统调试则是一件截然不同的事,一个可以用于 Oracle 的 JVM 或其他 JVM 上的调试的技巧是你可以运行 kill -3 <pid> 同时一个完整的栈轨迹和堆概述(包括 GC 的细节)会被保存到标准错误或是日志文件。JDK 中的 jpsjstatjstackjmap 很有用。SJK tools 更高级。
  • 使用 mtr 去跟踪路由,用于确定网络问题。
  • 用 ncdu 来查看磁盘使用情况,它比寻常的命令,如 du -sh *,更节省时间。
  • 查找正在使用带宽的套接字连接或进程,使用 iftop 或 nethogs
  • ab 工具(Apache 中自带)可以简单粗暴地检查 web 服务器的性能。对于更复杂的负载测试,使用 siege
  • wiresharktshark 和 ngrep 可用于复杂的网络调试。
  • 了解 strace 和 ltrace。这俩工具在你的程序运行失败、挂起甚至崩溃,而你却不知道为什么或你想对性能有个总体的认识的时候是非常有用的。注意 profile 参数(-c)和附加到一个运行的进程参数 (-p)。
  • 了解使用 ldd 来检查共享库。但是永远不要在不信任的文件上运行
  • 了解如何运用 gdb 连接到一个运行着的进程并获取它的堆栈轨迹。
  • 学会使用 /proc。它在调试正在出现的问题的时候有时会效果惊人。比如:/proc/cpuinfo/proc/meminfo/proc/cmdline/proc/xxx/cwd/proc/xxx/exe/proc/xxx/fd//proc/xxx/smaps(这里的 xxx 表示进程的 id 或 pid)。
  • 当调试一些之前出现的问题的时候,sar 非常有用。它展示了 cpu、内存以及网络等的历史数据。
  • 关于更深层次的系统分析以及性能分析,看看 stapSystemTap),perf,以及sysdig
  • 查看你当前使用的系统,使用 unameuname -a(Unix/kernel 信息)或者 lsb_release -a(Linux 发行版信息)。
  • 无论什么东西工作得很欢乐(可能是硬件或驱动问题)时可以试试 dmesg
  • 如果你删除了一个文件,但通过 du 发现没有释放预期的磁盘空间,请检查文件是否被进程占用: lsof | grep deleted | grep "filename-of-my-big-file"

单行脚本

  • 当你需要对文本文件做集合交、并、差运算时,sort 和 uniq 会是你的好帮手。具体例子请参照代码后面的,此处假设 a 与 b 是两内容不同的文件。这种方式效率很高,并且在小文件和上 G 的文件上都能运用(注意尽管在 /tmp 在一个小的根分区上时你可能需要 -T 参数,但是实际上 sort 并不被内存大小约束),参阅前文中关于 LC_ALL 和 sort 的 -u 参数的部分。
1
2
3
sort a b | uniq > c   # c 是 a 并 b
sort a b | uniq -d > c # c 是 a 交 b
sort a b b | uniq -u > c # c 是 a - b
  • 使用 grep . *(每行都会附上文件名)或者 head -100 *(每个文件有一个标题)来阅读检查目录下所有文件的内容。这在检查一个充满配置文件的目录(如 /sys/proc/etc)时特别好用。

  • 计算文本文件第三列中所有数的和(可能比同等作用的 Python 代码快三倍且代码量少三倍):

    1
    awk '{ x += $3 } END { print x }' myfile

  • 如果你想在文件树上查看大小/日期,这可能看起来像递归版的 ls -l 但比 ls -lR 更易于理解:

    1
    find . -type f -ls

  • 假设你有一个类似于 web 服务器日志文件的文本文件,并且一个确定的值只会出现在某些行上,假设一个 acct_id 参数在 URI 中。如果你想计算出每个 acct_id 值有多少次请求,使用如下代码:

    1
    egrep -o 'acct_id=[0-9]+' access.log | cut -d= -f2 | sort | uniq -c | sort -rn

  • 要持续监测文件改动,可以使用 watch,例如检查某个文件夹中文件的改变,可以用 watch -d -n 2 'ls -rtlh | tail';或者在排查 WiFi 设置故障时要监测网络设置的更改,可以用 watch -d -n 2 ifconfig

  • 运行这个函数从这篇文档中随机获取一条技巧(解析 Markdown 文件并抽取项目):

    1
    2
    3
    4
    5
    6
    7
    8
    function taocl() {
    curl -s https://raw.githubusercontent.com/jlevy/the-art-of-command-line/master/README-zh.md|
    pandoc -f markdown -t html |
    iconv -f 'utf-8' -t 'unicode' |
    xmlstarlet fo --html --dropdtd |
    xmlstarlet sel -t -v "(html/body/ul/li[count(p)>0])[$RANDOM mod last()+1]" |
    xmlstarlet unesc | fmt -80
    }
    ## 冷门但有用

  • expr:计算表达式或正则匹配

  • m4:简单的宏处理器

  • yes:多次打印字符串

  • cal:漂亮的日历

  • env:执行一个命令(脚本文件中很有用)

  • printenv:打印环境变量(调试时或在写脚本文件时很有用)

  • look:查找以特定字符串开头的单词或行

  • cutpaste 和 join:数据修改

  • fmt:格式化文本段落

  • pr:将文本格式化成页/列形式

  • fold:包裹文本中的几行

  • column:将文本格式化成多个对齐、定宽的列或表格

  • expand 和 unexpand:制表符与空格之间转换

  • nl:添加行号

  • seq:打印数字

  • bc:计算器

  • factor:分解因数

  • gpg:加密并签名文件

  • toe:terminfo 入口列表

  • nc:网络调试及数据传输

  • socat:套接字代理,与 netcat 类似

  • slurm:网络流量可视化

  • dd:文件或设备间传输数据

  • file:确定文件类型

  • tree:以树的形式显示路径和文件,类似于递归的 ls

  • stat:文件信息

  • time:执行命令,并计算执行时间

  • timeout:在指定时长范围内执行命令,并在规定时间结束后停止进程

  • lockfile:使文件只能通过 rm -f 移除

  • logrotate: 切换、压缩以及发送日志文件

  • watch:重复运行同一个命令,展示结果并/或高亮有更改的部分

  • when-changed:当检测到文件更改时执行指定命令。参阅 inotifywait 和 entr

  • tac:反向输出文件

  • shuf:文件中随机选取几行

  • comm:一行一行的比较排序过的文件

  • strings:从二进制文件中抽取文本

  • tr:转换字母

  • iconv 或 uconv:文本编码转换

  • split 和 csplit:分割文件

  • sponge:在写入前读取所有输入,在读取文件后再向同一文件写入时比较有用,例如 grep -v something some-file | sponge some-file

  • units:将一种计量单位转换为另一种等效的计量单位(参阅 /usr/share/units/definitions.units

  • apg:随机生成密码

  • xz:高比例的文件压缩

  • ldd:动态库信息

  • nm:提取 obj 文件中的符号

  • ab 或 wrk:web 服务器性能分析

  • strace:调试系统调用

  • mtr:更好的网络调试跟踪工具

  • cssh:可视化的并发 shell

  • rsync:通过 ssh 或本地文件系统同步文件和文件夹

  • wireshark 和 tshark:抓包和网络调试工具

  • ngrep:网络层的 grep

  • host 和 dig:DNS 查找

  • lsof:列出当前系统打开文件的工具以及查看端口信息

  • dstat:系统状态查看

  • glances:高层次的多子系统总览

  • iostat:硬盘使用状态

  • mpstat: CPU 使用状态

  • vmstat: 内存使用状态

  • htop:top 的加强版

  • last:登入记录

  • w:查看处于登录状态的用户

  • id:用户/组 ID 信息

  • sar:系统历史数据

  • iftop 或 nethogs:套接字及进程的网络利用情况

  • ss:套接字数据

  • dmesg:引导及系统错误信息

  • sysctl: 在内核运行时动态地查看和修改内核的运行参数

  • hdparm:SATA/ATA 磁盘更改及性能分析

  • lsblk:列出块设备信息:以树形展示你的磁盘以及磁盘分区信息

  • lshwlscpulspcilsusb 和 dmidecode:查看硬件信息,包括 CPU、BIOS、RAID、显卡、USB设备等

  • lsmod 和 modinfo:列出内核模块,并显示其细节

  • fortuneddate 和 sl:额,这主要取决于你是否认为蒸汽火车和莫名其妙的名人名言是否“有用”

转载:https://github.com/rahulrajaram/linux_troubleshooting

Purpose: a single-page, production-ready cheatsheet for Linux/SRE triage. Optimized for fast on-call

use: concise flags, copy-paste recipes, brief notes, and clear risk callouts.

Binaries & ELF

Cheat Card

  • Linked libs: ldd /path/to/bin (security caveat: may execute code in rare cases)
  • ELF headers/sections: readelf -h /bin/ls; sections: readelf -S /bin/ls
  • Symbols (prefer readelf): readelf -Ws /bin/ls | grep ' FUNC '; dynamic: readelf -Ws -d /bin/ls
  • Disassemble: objdump -d /bin/ls | less (add -M intel for Intel syntax)
  • Symbols via nm: nm -D /bin/ls | grep symbol_name
  • Requires: binutils (readelf/objdump/nm)

1. ldd

List shared library dependencies of executables and shared objects.

  • Basic: ldd /path/to/bin
  • Security caveat: may execute code in rare cases; avoid on untrusted binaries.
  • Alternative: LD_TRACE_LOADED_OBJECTS=1 /lib64/ld-linux-x86-64.so.2 /path/to/bin (still uses loader)

Text & Data Utilities

Cheat Card

  • Search recursively: grep -RIn --exclude-dir .git 'pattern' .; context: -C2
  • Edit in-place: sed -i.bak -E 's/old/new/g' file (backup)
  • Summarize data: awk -F, '{a[$1]+=$2} END{for(k in a) print k,a[k]}' file.csv
  • JSON parse: jq -r '.items[].metadata.name' file.json
  • Compare dirs: diff -ruN dir_old dir_new | less -R
  • Transform text: tr -s ' ' | cut -d, -f1,3 | xargs -n1 echo
  • Safe temp: mktemp -d for dirs; files: mktemp
  • Reverse lines: rev <file (quick visual check)

2. grep

  • search for one or more expressions: grep -E 'hello|world' temp
  • search for one or more words: grep -Ew 'hello|world' temp
  • search for suffix matches: grep -E 'hello(world|lolo)' temp
  • search for suffixes matching regex: grep -E 'hello[0-9]{3,}' temp
  • recursive search in tree: grep -RIn --exclude-dir .git --exclude='*.log' 'pattern' .
  • fixed strings (fast) and ignore case: grep -Fni 'literal text' file
  • context lines: grep -R --color -n -C2 'pattern' . (or -A after, -B before)
  • binary-skip and file names only: grep -rI -l 'pattern' .

3. sed

What it does: stream editor for non-interactive find/replace, line edits, and range selections.

  • In-place with backup: sed -i.bak -E 's/old/new/g' file
  • Delete matching lines: sed -i '/pattern/d' file
  • Print lines between markers: sed -n '/BEGIN/,/END/p' file
  • Replace with capture groups: sed -E 's/([0-9]{4})-([0-9]{2})-([0-9]{2})/\3-\2-\1/' file
  • Insert before/after match:
    • Before: sed '/pattern/i\\inserted before' file
    • After: sed '/pattern/a\\appended after' file
  • Trim trailing spaces: sed -i 's/[ \t]\+$//' file
  • Multiple edits: sed -E -e 's/foo/bar/g' -e '/tmp/d' file

4. awk

What it does: text processing and quick data summarization using fields and expressions.

  • Default FS is whitespace; set CSV FS: awk -F, '...' file.csv
  • Select fields: awk '{print $1, $3}' file
  • Filter rows: awk '$5 > 100 {print $1, $5}' file
  • Sum a column: awk '{s+=$3} END{print s}' file
  • Group and sum by key: awk '{a[$1]+=$2} END{for (k in a) print k, a[k]}' file
  • Pretty print: awk '{printf "%-20s %10d\n", $1, $2}' file
  • Count unique values: awk '{c[$1]++} END{for (k in c) print k, c[k]}' file

Networking

Cheat Card

  • Ports→PIDs: ss -ltnp; established only: ss -tn state established
  • TCP detail: ss -i dst <ip> (rtt, cwnd, retrans)
  • Path/source IP: ip route get <dest>; counters: ip -s link show <iface>
  • Latency/loss: mtr -ezbw <dest>; quick traceroute ICMP: traceroute -I <dest>
  • Targeted capture: tcpdump -ni <iface> tcp port 443 (or port 53icmp)
  • DNS: resolvectl query <name> or dig <name> A +short

5. ping

  • Compat: Linux; Root: may require CAP_NET_RAW depending on system; Requires: iputils-ping.
  • -4: ping IPv4 only
  • -6: ping IPv6 only
  • -A: adapts to roundtrip time
  • -b: allow pinging broadcast addresses
  • -I: ping through an interface
  • -M: set PMTU strategy
  • -s: set packetsize (default is 56B)
  • -t: set IP time-to-live
  • ping 224.0.0.1: ping multicast address

Notes: - Using average rtt values, you can determine whether there are huge variations causing jitter, especially in RT applications - ping will report duplications, however, duplicate packets should never occur, and seem to be caused by inappropriate link-level retransmissions - ping will report damaged packets, suggesting broken hardware in the network Requires: iputils-ping.

6. ip

  • Compat: Linux; Root: not required for reads; Requires: iproute2.
  • ip addr: Show information for all addresses
  • ip addr show dev wlo1: Display information only for device wlo1
  • ip link: Show information for all interfaces
  • ip link show dev wlo1: Display information only for device wlo1
  • ip -s: Display interface statistics (packets dropped, received, sent, etc.)
  • Quick recipes:
    • Path and source IP: ip route get <dest>
    • Interface counters: ip -s link show <iface> (rx/tx errors, drops)
    • Neighbors/ARP: ip neigh and ip neigh show dev <iface>
    • Multicast: ip maddr or ip maddr show dev <iface>

Example

1
2
3
# Query path and chosen source IP
ip route get 8.8.8.8
# Expect: 8.8.8.8 via 192.168.1.1 dev wlo1 src 192.168.1.23

  • ip route: List all of the route entries in the kernel
  • ip route add: Add a route entry to the kernel routing table
  • ip route replace: Replace an existing route (add if not present)
  • ip maddr: Display multicast information for all devices
  • ip maddr show dev wlo1
  • ip neigh show dev wlo1: check for reachability of specific interfaces Requires: iproute2.

7. arp

  • Compat: Legacy; prefer ip neigh; Requires: net-tools.
  • arp: show all ARP table entries
  • arp -d address: delete ARP entry for address
  • arp -s address hw_addr: set up new table entry Note: legacy from net-tools; prefer ip neigh. Requires: net-tools.

8. arping

  • Compat: Linux; Root/CAP_NET_RAW required; Package: arping (iputils-arping on some distros).
  • arping -I wlo1 192.168.0.1: send ARP requests to host
  • arping -D -I wlo1 192.168.0.15: check for duplicate MAC address Requires: arping (iputils-arping on some distros).

9. ethtool

  • Compat: Linux; Root for changing settings, read stats usually ok; Requires: ethtool.
  • ethtool -S wlo1: print network statistics Requires: ethtool.

10. ss

  • Compat: Linux; Modern replacement for netstat; Requires: iproute2.
  • ss -a: show all sockets
  • ss -o: show all sockets with timer information
  • ss -p: show process using the socket
  • ss -t|-u|-4|-6
  • ss -ltnp: list listening TCP sockets with PIDs
  • ss -tn state established: show established TCP only
  • ss -tn sport = :443 or ss -tn dport = :443: filter by port
  • ss -s: summary stats (TCP states, mem)
  • ss -i:
    • ts: show string “ts” if the timestamp option is set
    • sack: show string “sack” if the sack option is set
    • ecn: show string “ecn” if the explicit congestion notification option is set
    • ecnseen: show string “ecnseen” if the saw ecn flag is found in received packets
    • fastopen: show string “fastopen” if the fastopen option is set
    • cong_alg: the congestion algorithm name, the default congestion algorithm is “cubic”
    • wscale:<snd_wscale>:<rcv_wscale>: if window scale option is used, this field shows the send scale factor and receive scale factor
    • rto:<icsk_rto>: tcp re-transmission timeout value, the unit is millisecond
    • backoff:<icsk_backoff>: used for exponential backoff re-transmission, the actual re-transmission timeout value is icsk_rto << icsk_backoff
    • rtt:<rtt>/<rttvar>: rtt is the average round trip time, rttvar is the mean deviation of rtt, their units are millisecond
    • ato:<ato>: ack timeout, unit is millisecond, used for delay ack mode
    • mss:<mss>: max segment size
    • cwnd:<cwnd>: congestion window size
    • pmtu:<pmtu>: path MTU value
    • ssthresh:<ssthresh>: tcp congestion window slow start threshold
    • bytes_acked:<bytes_acked>: bytes acked
    • bytes_received:<bytes_received>: bytes received
    • segs_out:<segs_out>: segments sent out
    • segs_in:<segs_in>: segments received
    • send <send_bps>bps: egress bps
    • lastsnd:<lastsnd>: how long time since the last packet sent, the unit is millisecond
    • lastrcv:<lastrcv>: how long time since the last packet received, the unit is millisecond
    • lastack:<lastack>: how long time since the last ack received, the unit is millisecond
  • ss -A tcp,udp: dump socket tables Requires: iproute2.

37. tcpdump

Compat: Linux; Root/CAP_NET_RAW required for captures; Requires: tcpdump. What it does: capture packets for inspection and troubleshooting. Requires: tcpdump.

  • Interface and no name resolution: tcpdump -ni <iface>

  • Host or subnet: tcpdump -ni <iface> host <ip>tcpdump -ni <iface> net 10.0.0.0/8

  • Ports/protocols: tcpdump -ni <iface> tcp port 443 or udp port 53

  • SYNs only (new TCP handshakes):

    1
    2
    # New TCP handshakes only (SYN without ACK)
    tcpdump -ni <iface> 'tcp[tcpflags] & (tcp-syn) != 0 and tcp[tcpflags] & (tcp-ack) == 0'

  • DNS queries: tcpdump -ni <iface> port 53

  • ICMP reachability: tcpdump -ni <iface> icmp

    1
    2
    3
    4
    5
    6
    # Requires: tcpdump
    # Capture full packets to a file
    tcpdump -ni <iface> -s 0 -w capture.pcap

    # Rotate captures every 5m, keep 6 files
    tcpdump -ni <iface> -s 0 -G 300 -W 6 -w 'cap-%Y%m%d%H%M%S.pcap'

38. mtr

Compat: Linux; May need root/CAP_NET_RAW for certain probe types; Requires: mtr. What it does: combines ping and traceroute to visualize latency and loss per hop.

  • Run with extra info: mtr -ezbw <dest>
  • Report mode (one-off): mtr -ezbwrc 10 <dest> Requires: mtr.

39. traceroute

  • Compat: Linux; Requires: traceroute; TCP mode may need CAP_NET_RAW/root.
  • traceroute -I: use ICMP echo for probes
  • traceroute -T: use TCP SYN for probes Requires: traceroute.

40. nicstat

  • Compat: Linux; Not widely packaged; Consider sar -n/ethtool -S alternatives.
  • nicstat prints out network statistics for all network cards (NICs), including packets, kilobytes per second, average packet sizes and more.
  • nicstat -t: show CPU stats
  • nicstat: show network interface stats Requires: nicstat (may need third-party repo/source on some distros).

Metrics reference (click to expand)

  • Time - The time corresponding to the end of the sample shown, in HH:MM:SS format (24-hour clock).
  • Int - The interface name.
  • rKB/s, InKB - Kilobytes/second read (received).
  • wKB/s, OutKB - Kilobytes/second written (transmitted).
  • rMbps, RdMbps - Megabits/second read (received).
  • wMbps, WrMbps - Megabits/second written (transmitted).
  • rPk/s, InSeg, InDG - Packets (TCP Segments, UDP Datagrams)/second read (received).
  • wPk/s, OutSeg, OutDG - Packets (TCP Segments, UDP Datagrams)/second written (transmitted).
  • rAvs - Average size of packets read (received).
  • wAvs - Average size of packets written (transmitted).
  • %Util - Percentage utilization of the interface. For full-duplex interfaces, this is the greater of rKB/s or wKB/s as a percentage of the interface speed. For half-duplex interfaces, rKB/s and wKB/s are summed.
  • %rUtil, %wUtil - Percentage utilization for bytes read and written, respectively.
  • Sat - Saturation. This the number of errors/second seen for the interface
    • an indicator the interface may be approaching saturation. This statistic is combined from a number of kernel statistics. It is recommended to use the ‘-x’ option to see more individual statistics (those mentioned below) when attempting to diagnose a network issue.
  • IErr - Packets received that could not be processed because they contained errors
  • OErr - Packets that were not successfully transmitted because of errors
  • Coll - Ethernet collisions during transmit.
  • NoCP - No-can-puts. This is when an incoming packet can not be put to the process reading the socket. This suggests the local process is unable to process incoming packets in a timely manner.
  • Defer - Defer Transmits. Packets without collisions where first transmit attempt was delayed because the medium was busy.
  • Reset - tcpEstabResets. The number of times TCP connections have made a direct transition to the CLOSED state from either the ESTABLISHED state or the CLOSE-WAIT state.
  • AttF - tcpAttemptFails - The number of times that TCP connections have made a direct transition to the CLOSED state from either the SYN-SENT state or the SYN-RCVD state, plus the number of times TCP connections have made a direct transition to the LISTEN state from the SYN-RCVD state.
  • %ReTX - Percentage of TCP segments retransmitted - that is, the number of TCP segments transmitted containing one or more previously transmitted octets.
  • InConn - tcpPassiveOpens - The number of times that TCP connections have made a direct transition to the SYN-RCVD state from the LISTEN state.
  • OutCon - tcpActiveOpens - The number of times that TCP connections have made a direct transition to the SYN-SENT state from the CLOSED state.
  • Drops - tcpHalfOpenDrop + tcpListenDrop + tcpListenDropQ0. tcpListenDrop and tcpListenDropQ0 - Number of connections dropped from the completed connection queue and incomplete connection queue, respectively. tcpHalfOpenDrops - Number of connections dropped after the initial SYN packet was received.

41. nslookup

  • Compat: Legacy; prefer dig/resolvectl; Requires: dnsutils/bind-utils.query Internet name servers interactively
  • nslookup <domain>
  • Note: legacy tool. Prefer dig for detailed queries or resolvectl on systemd-based systems. Requires: dnsutils/bind-utils (for nslookup/dig).
  • Quick equivalents: dig <domain> A +shortresolvectl query <domain>

42. host

  • Compat: Linux; Requires: bind9-host/bind-utils. host is a simple utility for performing DNS lookups. It is normally used to convert names to IP addresses and vice versa.
  • host <domain>
  • Examples: host -t A <domain>; reverse lookup: host <ip>
  • Tip: for more control, use dig (if installed) or resolvectl. Requires: bind9-host (Debian/Ubuntu) or bind-utils.

43. iwconfig

  • Compat: Legacy; prefer iw; Requires: wireless-tools.
  • iwconfig wlo1: show WLAN config:
  • Note: iwconfig is legacy (wireless-tools). Prefer iw for modern drivers, e.g., iw deviw dev wlo1 link. Requires: wireless-tools. Modern alternative: iw.
    1
    2
    3
    4
    5
    6
    7
    8
    wlo1      IEEE 802.11  ESSID:"NETGEAR97"  
    Mode:Managed Frequency:2.462 GHz Access Point: C4:04:15:58:60:C7
    Bit Rate=72.2 Mb/s Tx-Power=20 dBm
    Retry short limit:7 RTS thr=2347 B Fragment thr:off
    Power Management:off
    Link Quality=70/70 Signal level=-32 dBm
    Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0
    Tx excessive retries:0 Invalid misc:22932 Missed beacon:0

44. brctl

  • Compat: Legacy; prefer ip link and bridge; Requires: bridge-utils.
  • brctl is used to set up, maintain, and inspect the ethernet bridge configuration in the linux kernel. Legacy: prefer ip link add name br0 type bridge and bridge (iproute2) tooling. Requires: bridge-utils.

Kernel & Tracing

Cheat Card

  • Kernel logs: dmesg -T -l err,crit,alert,emerg
  • Syscalls: strace -ttT -p <pid> -f -e trace=network,file
  • Modules: lsmod | headmodprobe <name> (caution), sysctl -a | grep tcp
  • Optional advanced:
    1
    2
    3
    4
    5
    6
    # perf (if installed)
    perf top
    perf record -g -p <pid>; perf report

    # bpftrace one-liner (Requires: bpftrace)
    bpftrace -e 'tracepoint:syscalls:sys_enter_openat { @[comm] = count(); }'

11. dmesg

  • Compat: Linux; May be restricted by kernel.dmesg_restrict; Requires: util-linux.

  • dmesg --level=<LEVEL> where <LEVEL> is:

    • emerg - system is unusable.
    • alert - action must be taken immediately.
    • crit - critical conditions.
    • err - error conditions.
    • warn - warning conditions.
    • notice - normal but significant condition.
    • info - informational.
    • debug - debug-level messages.
  • dmesg -k: print kernel messages

  • dmesg -f=<FACILITY> where <FACILITY> is:

    • kern: Kernel messages.
    • user: User-level messages.
    • mail: Mail system.
    • daemon: System daemons.
    • auth: Security/authorization messages.
    • syslog: Internal syslogd messages.
    • lpr: Line printer subsystem.
    • news: Network news subsystem.
  • dmesg -T: human readable timestamps

12. lsmod

  • Compat: Linux; Lists modules without root; Requires: kmod.
  • Show loaded kernel modules and sizes/dependencies.
  • Quick peek: lsmod | head
  • Module info (version, params): modinfo <module>

13. modprobe

  • Compat: Linux; Root required; Caution: can destabilize systems; Requires: kmod.Add or remove modules from the Linux kernel.
  • Load: modprobe <module>; with params: modprobe <module> key=value
  • Unload: modprobe -r <module> (fails if in use)
  • Caution: loading/unloading modules can destabilize systems; prefer persistent config and ensure module compatibility.

Disk & Filesystems

Cheat Card

  • Space/inodes: df -h and df -i; biggest dirs: du -xhd1 /path | sort -h
  • IO saturation: iostat -xz 1; per-proc IO: pidstat -d 1iotop -oPa
  • Devices/FS: lsblk -o NAME,TYPE,SIZE,ROTA,MOUNTPOINT,MODEL; mounts: findmnt
  • Mount ops: mount --bind olddir newdir; remount ro: mount -o remount,ro /mnt

Inventory and health

  • Device tree: lsblk -o NAME,TYPE,SIZE,FSTYPE,MOUNTPOINT,MODEL
  • Identify filesystem UUID/TYPE: blkid
  • SMART check (if supported): smartctl -H /dev/sdX and smartctl -a /dev/sdX (Requires: smartmontools)
  • NVMe info: nvme listnvme smart-log /dev/nvme0 (Requires: nvme-cli)

Notes

  • iostat quick view (Requires: sysstat): iostat -xz 1 (watch await%utilr/sw/s)
  • findmnt: show mount hierarchy or lookup by target: findmnt /mount/point
  • adds or removes modules from the Linux Kernel
  • Caution: loading/unloading modules can destabilize systems; prefer persistent config and ensure module compatibility.

14. dd (DANGER: DESTRUCTIVE — READ FIRST)

  • Compat: Linux; Root required for raw devices; Highly destructive when writing; Requires: coreutils.
  • Danger: dd will overwrite data with no confirmation. Double-check devices (e.g., /dev/sdX) and consider read-only or safer alternatives first. Use lsblkblkid to verify targets.
  • Safer tips: for copies, consider pv to visualize throughput; for imaging, dcfldd; for testing, prefer non-destructive reads.
    1
    2
    # Danger: wipes target disk. Verify device with lsblk/blkid.
    dd if=/dev/zero of=/dev/sda bs=4k status=progress
    1
    2
    # Verify a drive is zeroed (non-zero bytes check)
    dd if=/dev/sda status=none | hexdump -C | grep -q '[^00]' || echo "All zeros"
    1
    2
    # Fill a file with random data (example size)
    dd if=/dev/urandom of=myfile bs=6703104 count=1 status=progress
    1
    2
    # Danger: clone a partition to another (same size/align). Verify both!
    dd if=/dev/sda3 of=/dev/sdb3 bs=4096 status=progress conv=fsync
    1
    2
    # Danger: write an image to a USB device. Verify device path first!
    dd if=/path/to/bootimage.img of=/dev/sdc bs=4M status=progress conv=fsync
    1
    2
    3
    # Quick r/w benchmark for a file (non-destructive read + temp write)
    dd if=/home/$user/bigfile of=/dev/null status=progress
    dd if=/dev/zero of=/home/$user/bigfile bs=1M count=1000 oflag=dsync status=progress
    1
    2
    # Sequential device read throughput sample (approx 1 GiB)
    dd if=/dev/sda of=/dev/null bs=1024k count=1024 status=progress
    1
    2
    # Create a swapfile (example: 8 GiB), then mkswap + swapon
    dd if=/dev/zero of=swapfile bs=1MiB count=$((8*1024)) status=progress

15. jq

  • Compat: Linux; Requires: jq package. What it does: parse/query/transform JSON on the command line. Requires: jq.

  • Pretty-print: jq . file.json

  • Extract field list: jq -r '.items[].metadata.name' file.json

  • Filter by condition: jq '.[] | select(.status=="RUNNING")' file.json

  • Transform and count: jq '[.[] | .level] | group_by(.) | map({level: .[0], count: length})' file.json

  • Sort and top N: jq 'sort_by(.time) | reverse | .[0:5]' file.json

  • From journald: journalctl -o json | jq -r 'select(.PRIORITY<=3) | .MESSAGE'

    1
    2
    # Requires: jq — show high-priority messages from journald
    journalctl -o json | jq -r 'select(.PRIORITY<=3) | .MESSAGE'

  • Keys and length: jq 'keys, length' file.json

16. diff

  • Compat: Linux; Requires: diffutils.
  • unified diff: diff -u old.txt new.txt
  • recursive dirs: diff -ruN dir_old dir_new
  • ignore whitespace changes: diff -u -w old new
  • handle CRLF: diff -u --strip-trailing-cr a b
  • color (if supported): diff --color=auto -u a b
  • apply a patch: patch -p1 < change.diff

17. uname

  • Compat: Linux; Requires: coreutils.
  • get all details about the computer

18. sync/fsync

  • Compat: Linux; sync is user command; fsync is a syscall.
  • fsync is a syscall that flushes a file’s in-memory data and metadata to storage. From the shell, use sync (flush all dirty data) or syncfs (flush a filesystem) when available.

19. mkswap

  • Compat: Linux; Root required; Requires: util-linux.
  • -c: check if blocks are corrupted
  • -p: set pagesize

20. fsck

  • Compat: Linux; Root required; Avoid on mounted filesystems; Requires: e2fsprogs for ext*.
  • check for file system consistency:
    • The superblock is checked for inconsistencies in:
      • File system size
      • Number of inodes
      • Free-block count
      • Free-inode count
    • Each inode is checked for inconsistencies in:
      • Format and type
      • Link count
      • Duplicate block
      • Bad block numbers
      • Inode size
  • see: https://docs.oracle.com/cd/E19455-01/805-7228/6j6q7uf0e/index.html
  • Caution: avoid running fsck on a mounted filesystem (except with specific fs support); prefer read-only mounts or maintenance windows.

Extended notes

  • ext* specifics: e2fsck checks ext2/3/4; use -f to force, -n for read-only, -p for preen (auto-fix safe issues). Requires: e2fsprogs.
  • Bad blocks (DANGER): badblocks scans devices for bad sectors; write-mode is destructive. Prefer read-only first.

Examples

1
2
3
4
5
6
7
8
# Read-only badblocks scan (non-destructive)
sudo badblocks -sv /dev/sdX

# DANGER: write-mode destructive scan — data loss
sudo badblocks -wsv /dev/sdX

# ext* filesystem check (read-only)
sudo e2fsck -fn /dev/sdXN

21. mount

  • Compat: Linux; Root required unless user mounts configured; Requires: util-linux.
  • mount -a [-t type] [-O optlist]: mount all FSs mentioned in fstab to be mounted
  • -o: override the settings in fstab
  • mount --bind olddir newdir: remount part of the hierarchy elsewhere
  • mount --move: move mounted tree to another place
  • Caution: --bind/--move and remounts can impact running services; ensure correct fstab for persistence and have rollback plan.

22. umount

  • Compat: Linux; Root required for system mounts; Requires: util-linux.
  • unmount from a mountpoint

23. chown

  • chown root:staff /u: change owner and group

24. sysctl

  • Compat: Linux; Root required for -w; Persistence via /etc/sysctl.d; Requires: procps.
  • configure kernel parameters at runtime
  • sysctl -a | grep "tcp"
  • Caution: sysctl -w changes take effect immediately; persist only via /etc/sysctl.d/*.conf after validation.
  • Read a key: sysctl net.ipv4.tcp_congestion_control
  • Set a key (runtime): sysctl -w vm.swappiness=10
  • Persist: create /etc/sysctl.d/99-local.conf with vm.swappiness = 10, then sysctl --system

25. iotop

  • Compat: Linux; Root required; Needs kernel taskstats/delay accounting; Python tool.
  • iotop -o: only show threads doing I/O
  • iotop -p <PID1>,<PID2>,...: list of processes to monitor
  • iotop -a: show accumulated IO rather than diff Requires: iotop.

26. netstat

  • Compat: Legacy; prefer ss; Requires: net-tools.

Processes & Scheduling

Cheat Card

  • Top offenders: ps -eo pid,ppid,user,%cpu,%mem,cmd --sort=-%cpu | head

  • Threads view: top -H or ps -Lp <pid> -o pid,tid,pcpu,comm

  • Target processes:

    1
    2
    3
    4
    5
    # Preview before signaling
    pgrep -a <name>

    # Then send a scoped, safe signal (example: TERM)
    pkill -TERM -u <user> -f '<exact-pattern>'

  • Over time: pidstat -u 1 -p <pid> (CPU) and pidstat -d 1 (IO)

  • Find PIDs: pidof <proc>; list threads: ps -Lp <pid>

  • Niceness: start nice -n 10 cmd; adjust: renice -n 10 -p <pid>

  • Locks: /proc/locks shows current file locks (read-only)

  • Sessions: users who; recent logins last | head

  • Schedule: crontab -l list; crontab -e edit

  • Deprecated in many distros; prefer ss.

  • Common mappings:

    • netstat -tulpn -> ss -tulpn
    • netstat -anp -> ss -anp
    • netstat -s -> ss -s

27. top

  • Compat: Linux; Requires: procps.
  • Dynamic process view with CPU, memory, and load summaries.
  • Key CPU line fields: us (user), sy (system), niid (idle), wa (iowait), hi/si (IRQ/softIRQ), st (steal).
  • Key per-proc fields: %CPU%MEMVIRT (virtual), RES (resident), SHR (shared), TIME+ (CPU time).
  • top -E m|g: scale as mega|giga bytes
  • top -H: thread-mode
  • top -i: show idle processes
  • top -o RES|VIRT|SWAP, etc: sort by attribute
  • top -O: output fields: print all available sort-attributes
  • top -p pid1,pid2,...: monitor only these PIDs
  • top -1: show per-CPU stats

28. vmstat

  • Compat: Linux; Requires: procps. Useful to get so/si information
  • Report virtual memory statistics
  • vmstat -a: number active/inactive memory
  • vmstat --stats: various statistics

Interpretation tips

  • r runnable > number of CPUs indicates run-queue contention.
  • b blocked processes (often IO wait); correlate with %wa in top/mpstat.
  • si/so swap in/out: sustained non-zero values indicate memory pressure.
  • Use vmstat 1 for near-real-time view.

29. strace

  • Compat: Linux; May be restricted by ptrace scope; Requires: strace. Trace system calls and signals.
  • Attach to a PID: strace -ttT -p <pid> -f -e trace=network,file,fsync,clock,nanosleep
  • Run a program under strace: strace -o strace.log -s 200 -vv -f -ttT your_cmd --arg
  • Syscall time summary: strace -c -p <pid>
  • Filter a path: strace -ttT -e trace=file -P /etc/resolv.conf -p <pid>
  • Notes: -f follows forks; -ttT adds timestamps and syscall durations; -s increases string size.
  • trace system calls and signals

30. slabtop

  • Compat: Linux; Requires: procps.
  • slabtop: display kernel slab cache information in real time
  • Sort by size: slabtop -s c; one-shot: slabtop -o

31. uptime

  • Compat: Linux; Requires: procps.
  • information about how long the system has been up, and load averages

32. htop

  • Compat: Linux; Requires: htop package.
  • like top, but prettier

33. ps

  • Compat: Linux; Requires: procps. Cheat Card
  • Top CPU: ps -eo pid,ppid,user,%cpu,%mem,cmd --sort=-%cpu | head
  • Top RSS: ps -eo pid,user,rss,cmd --sort=-rss | head
  • Tree view: ps -ejH (or ps axjf)
  • By command: ps -C nginx -o pid,ppid,cmd,%mem,%cpu
  • Threads of a PID: ps -Lp <pid> -o pid,tid,pcpu,comm
  • ps aux: show all processes
  • ps axjf - print process tree
  • ps a - Lift the BSD-style “only yourself” restriction
  • ps -A - select all processes
  • ps -d - select all processes except session leaders
  • ps g - select all processes including session leaders
  • ps Ta - all process associated with this terminal
  • ps r - restrict to running processes
  • ps --pid pidlist - restrict to pidlist processes
  • ps -s|--sid - select by session ID
  • ps t ttylist - select by TTY list
  • ps U|-U - select by effective user-id
  • ps s - display signals
  • ps f - ASCII art process hierarchy
  • ps ax -o rss,pid,user,pcpu,command --sort -%cpu: sort by %cpu
  • ps ax -o rss,pid,user,pcpu,command --sort -rss: sort by rss

process states:

  • D - uninterruptible sleep (usually IO)
  • I - Idle kernel thread
  • R - running or runnable (on run queue)
  • S - interruptible sleep (waiting for an event to complete)
  • T - stopped by job control signal
  • t - stopped by debugger during the tracing
  • W - paging (not valid since the 2.6.xx kernel)
  • X - dead (should never be seen)
  • Z - defunct (“zombie”) process, terminated but not reaped by its parent

see STANDARD FORMAT SPECIFIERS in man ps

CPU

Cheat Card

  • CPU saturation: mpstat -P ALL 1 (sys/iowait/irq/soft)
  • Per-core view in top: top -1; over time per PID: pidstat -u 1 -p <pid>
  • Interrupt spikes: mpstat -I CPU 1

34. mpstat

  • Compat: Linux; Requires: sysstat.

The mpstat command writes to standard output activities for each available processor, processor 0 being the first one. Global average activities among all processors are also reported. Requires: sysstat.

Interpretation tips

  • High %iowait: CPUs idle while waiting on disk IO (check iostat).
  • High %irq/%soft: heavy interrupts/softirqs (often network or storage).
  • High %steal: hypervisor stealing time (noisy neighbor in a VM).
  • Compare per-core: hotspots can be isolated to specific cores (affinity).
  • CPU: Processor number. The keyword all indicates that statistics are calculated as averages among all processors.
  • %usr: Show the percentage of CPU utilization that occurred while executing at the user level (application).
  • %nice: Show the percentage of CPU utilization that occurred while executing at the user level with nice priority.
  • %sys: Show the percentage of CPU utilization that occurred while executing at the system level (kernel). Note that this does not include time spent servicing hardware and software interrupts.
  • %iowait: Show the percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.
  • %irq: Show the percentage of time spent by the CPU or CPUs to service hardware interrupts.
  • %soft: Show the percentage of time spent by the CPU or CPUs to service software interrupts.
  • %steal: Show the percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor.
  • %guest: Show the percentage of time spent by the CPU or CPUs to run a virtual processor.
  • %gnice: Show the percentage of time spent by the CPU or CPUs to run a niced guest.
  • mpstat -I: report interrupt stats
    • of interrupts per CPU
    • of times a particular interrupt occurred

Memory

Cheat Card

  • Snapshot: free -h --wide; paging: vmstat -a 1 (si/so)
  • Per-proc memory: ps -eo pid,user,rss,cmd --sort=-rss | head; deep dive: pmap -x <pid>
  • OOM evidence: dmesg -T | grep -i oom or journalctl -k -g OOM

35. free

  • Compat: Linux; Requires: procps.
  • used - Used memory (calculated as total - free - buffers - cache)
  • free - Unused memory (MemFree and SwapFree in /proc/meminfo)
  • shared - Memory used (mostly) by tmpfs (Shmem in /proc/meminfo)
  • buffers - Memory used by kernel buffers (Buffers in /proc/meminfo)
  • cache - Memory used by the page cache and slabs (Cached and SReclaimable in /proc/meminfo)
  • buff/cache - Sum of buffers and cache
  • available - Estimation of how much memory is available for starting new applications, without swapping. Unlike the data provided by the cache or free fields, this field takes into account page cache and also that not all reclaimable memory slabs will be reclaimed due to items being in use (MemAvailable in /proc/meminfo, available on kernels 3.14, emulated on kernels 2.6.27+, otherwise the same as free)
  • free -l: show low-high memory breakdown
  • free --wide: show free memory stats Interpretation tips
  • available approximates memory free for new apps without swapping; don’t confuse free with usable memory.
  • High buff/cache is normal; it’s the page cache and reclaimable slabs. Examples
  • Human-readable snapshot: free -h --wide
  • Example output:
    1
    2
    3
                  total        used        free      shared  buff/cache   available
    Mem: 31Gi 2.1Gi 22Gi 312Mi 7.2Gi 28Gi
    Swap: 8Gi 0B 8Gi

36. sar

  • Compat: Linux; Requires: sysstat; history needs sadc enabled. Cheat Card

  • CPU load/queue: sar -q 1 5; memory: sar -r 1 5

  • IO bw/ops: sar -b 1 5; per-device: sar -d 1 5 (watch await%util)

  • Network: sar -n DEV 1 5; TCP: sar -n TCP,ETCP 1 5

  • Paging: sar -B 1 5 (pgstealpgscanmajflt/s) Requires: sysstat (includes pidstat). Field reference (click to expand)

  • sar -B: report paging stats

    • gpgin/s - Total number of kilobytes the system paged in from disk per second.
    • pgpgout/s - Total number of kilobytes the system paged out to disk per second.
  • fault/s - Number of page faults (major + minor) made by the system per second. This is not a count of page faults that generate I/O, because some page faults can be resolved without I/O.

    • majflt/s - Number of major faults the system has made per second, those which have required loading a memory page from disk.
    • pgfree/s - Number of pages placed on the free list by the system per second.
    • pgscank/s - Number of pages scanned by the kswapd daemon per second.
    • pgscand/s - Number of pages scanned directly per second.
    • pgsteal/s - Number of pages the system has reclaimed from cache (pagecache and swapcache) per second to satisfy its memory demands.
    • %vmeff - Calculated as pgsteal / pgscan, this is a metric of the efficiency of page reclaim. If it is near 100% then almost every page coming off the tail of the inactive list is being reaped. If it gets too low (e.g. less than 30%) then the virtual memory is having some difficulty. This field is displayed as zero if no pages have been scanned during the interval of time.
  • sar -b: Report I/O and transfer rate statistics.

    • tps - Total number of transfers per second that were issued to physical devices. A transfer is an I/O request to a physical device. Multiple logical requests can be combined into a single I/O request to the device. A transfer is of indeterminate size.
    • rtps - Total number of read requests per second issued to physical devices.
    • wtps - Total number of write requests per second issued to physical devices.
    • bread/s - Total amount of data read from the devices in blocks per second. Blocks are equivalent to sectors and therefore have a size of 512 bytes.
    • bwrtn/s - Total amount of data written to devices in blocks per second.
  • sar -d: report activity for each block device

  • tps - Total number of transfers per second that were issued to physical devices. A transfer is an I/O request to a physical device. Multiple logical requests can be combined into a single I/O request to the device. A transfer is of indeterminate size.

    • rkB/s - Number of kilobytes read from the device per second.
    • wkB/s - Number of kilobytes written to the device per second.
    • areq-sz - The average size (in kilobytes) of the I/O requests that were issued to the device. Note: In previous versions, this field was known as avgrq-sz and was expressed in sectors.
    • aqu-sz - The average queue length of the requests that were issued to the device. Note: In previous versions, this field was known as avgqu-sz.
    • await - The average time (in milliseconds) for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.
    • svctm - The average service time (in milliseconds) for I/O requests that were issued to the device. Warning! Do not trust this field any more. This field will be removed in a future sysstat version.
    • %util - Percentage of elapsed time during which I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this value is close to 100% for devices serving requests serially. But for devices serving requests in parallel, such as RAID arrays and modern SSDs, this number does not reflect their performance limits.
  • sar -F: display stats. for currently mounted FSs:

    • MBfsfree - Total amount of free space in megabytes (including space available only to privileged user).
    • MBfsused - Total amount of space used in megabytes.
    • %fsused - Percentage of filesystem space used, as seen by a privileged user.
    • %ufsused - Percentage of filesystem space used, as seen by an unprivileged user.
    • Ifree - Total number of free file nodes in filesystem.
    • Iused - Total number of file nodes used in filesystem.
    • %Iused - Percentage of file nodes used in filesystem.
  • sar -m: power management statistics:

    • MHz - Instantaneous CPU clock frequency in MHz. With the FAN keyword, statistics about fans speed are reported. The following values are displayed:
    • rpm - Fan speed expressed in revolutions per minute.
    • drpm - This field is calculated as the difference between current fan speed (rpm) and its low limit (fan_min).
    • DEVICE - Sensor device name. With the FREQ keyword, statistics about CPU clock frequency are reported. The following value is displayed:
    • wghMHz - Weighted average CPU clock frequency in MHz. Note that the cpufreq-stats driver must be compiled in the kernel for this option to work.

    With the IN keyword, statistics about voltage inputs are reported. The following values are displayed:

    • inV - Voltage input expressed in Volts.
    • %in - Relative input value. A value of 100% means that voltage input has reached its high limit (in_max) whereas a value of 0% means that it has reached its low limit (in_min).
    • DEVICE - Sensor device name.

    With the USB keyword, the sar command takes a snapshot of all the USB devices currently plugged into the system. At the end of the report, sar will display a summary of all those USB devices. The following values are displayed:

    • BUS - Root hub number of the USB device.
    • idvendor - Vendor ID number (assigned by USB organization).
    • idprod - Product ID number (assigned by Manufacturer).
    • maxpower - Maximum power consumption of the device (expressed in mA).
    • manufact - Manufacturer name.
    • product - Product name.
  • sar -n DEV:

    • IFACE - Name of the network interface for which statistics are reported.
    • rxpck/s - Total number of packets received per second.
    • txpck/s - Total number of packets transmitted per second.
    • rxkB/s - Total number of kilobytes received per second.
    • txkB/s - Total number of kilobytes transmitted per second.
    • rxcmp/s - Number of compressed packets received per second (for cslip etc.).
    • txcmp/s - Number of compressed packets transmitted per second.
    • rxmcst/s - Number of multicast packets received per second.
    • %ifutil - Utilization percentage of the network interface. For half-duplex interfaces, utilization is calculated using the sum of rxkB/s and txkB/s as a percentage of the interface speed. For full-duplex, this is the greater of rxkB/S or txkB/s.
  • sar -n EDEV:

    • IFACE - Name of the network interface for which statistics are reported.
    • rxerr/s - Total number of bad packets received per second.
    • txerr/s - Total number of errors that happened per second while transmitting packets.
    • coll/s - Number of collisions that happened per second while transmitting packets.
    • rxdrop/s - Number of received packets dropped per second because of a lack of space in linux buffers.
    • txdrop/s - Number of transmitted packets dropped per second because of a lack of space in linux buffers.
    • txcarr/s - Number of carrier-errors that happened per second while transmitting packets.
    • rxfram/s - Number of frame alignment errors that happened per second on received packets.
    • rxfifo/s - Number of FIFO overrun errors that happened per second on received packets.
    • txfifo/s - Number of FIFO overrun errors that happened per second on transmitted packets.
  • sar -n ICMP:

    • imsg/s - The total number of ICMP messages which the entity received per second [icmpInMsgs]. Note that this counter includes all those counted by ierr/s.
    • omsg/s - The total number of ICMP messages which this entity attempted to send per second [icmpOutMsgs]. Note that this counter includes all those counted by oerr/s.
    • iech/s - The number of ICMP Echo (request) messages received per second [icmpInEchos].
    • iechr/s - The number of ICMP Echo Reply messages received per second [icmpInEchoReps].
    • oech/s - The number of ICMP Echo (request) messages sent per second [icmpOutEchos].
    • oechr/s - The number of ICMP Echo Reply messages sent per second [icmpOutEchoReps].
    • itm/s - The number of ICMP Timestamp (request) messages received per second [icmpInTimestamps].
    • itmr/s - The number of ICMP Timestamp Reply messages received per second [icmpInTimestampReps].
    • otm/s - The number of ICMP Timestamp (request) messages sent per second [icmpOutTimestamps].
    • otmr/s - The number of ICMP Timestamp Reply messages sent per second [icmpOutTimestampReps].
    • iadrmk/s - The number of ICMP Address Mask Request messages received per second [icmpInAddrMasks].
    • oadrmk/s - The number of ICMP Address Mask Request messages sent per second [icmpOutAddrMasks].
    • oadrmkr/s - The number of ICMP Address Mask Reply messages sent per second [icmpOutAddrMaskReps].
  • sar -n EICMP: Extended ICMP stats (errors, dest unreachable, time exceeded). Focus on spikes in ierr/s and oerr/s, and patterns in unreachable/time- exceeded when debugging path issues.

  • sar -n EIP: Extended IPv4 stats (header errors, addr errors, discards, no routes, reassembly, fragment fails). Use to spot header errors and routing/ no-route conditions.

  • sar -n IP6: IPv6 per-protocol counters (receive/deliver/forward, multicast, fragmentation). Check for anomalies similar to IPv4.

  • sar -n EIP6: Extended IPv6 errors and routing stats (header/addr errors, discards, no routes, reassembly/frag). Useful for IPv6-specific troubleshooting.

  • sar -n SOCK:

    • totsck - Total number of sockets used by the system.
    • tcpsck - TCP sockets in use; tcp-tw - TIME_WAIT sockets.
  • sar -n SOFT:

    • total/s - The total number of network frames processed per second.
    • dropd/s - The total number of network frames dropped per second because there was no room on the processing queue.
    • squeezd/s - The number of times the softirq handler function terminated per second because its budget was consumed or the time limit was reached, but more work could have been done.
    • rx_rps/s - The number of times the CPU has been woken up per second to process packets via an inter-processor interrupt.
    • flw_lim/s - The number of times the flow limit has been reached per second. Flow limiting is an optional RPS feature that can be used to limit the number of packets queued to the backlog for each flow to a certain amount. This can help ensure that smaller flows are processed even though much larger flows are pushing packets in.
  • sar -n TCP:

    • active/s - The number of times TCP connections have made a direct transition to the SYN-SENT state from the CLOSED state per second [tcpActiveOpens].
    • passive/s - The number of times TCP connections have made a direct transition to the SYN-RCVD state from the LISTEN state per second [tcpPassiveOpens].
    • iseg/s - The total number of segments received per second, including those received in error [tcpInSegs]. This count includes segments received on currently established connections.
    • oseg/s - The total number of segments sent per second, including those on current connections but excluding those containing only retransmitted octets [tcpOutSegs].
  • sar -n ETCP:

    • atmptf/s - The number of times per second TCP connections have made a direct transition to the CLOSED state from either the SYN-SENT state or the SYN-RCVD state, plus the number of times per second TCP connections have made a direct transition to the LISTEN state from the SYN-RCVD state [tcpAttemptFails].
    • estres/s - The number of times per second TCP connections have made a direct transition to the CLOSED state from either the ESTABLISHED state or the CLOSE-WAIT state [tcpEstabResets].
    • retrans/s - The total number of segments retransmitted per second - that is, the number of TCP segments transmitted containing one or more previously transmitted octets [tcpRetransSegs].
    • isegerr/s - The total number of segments received in error (e.g., bad TCP checksums) per second [tcpInErrs].
    • orsts/s - The number of TCP segments sent per second containing the RST flag [tcpOutRsts].
  • sar -n UDP:

    • idgm/s - The total number of UDP datagrams delivered per second to UDP users [udpInDatagrams].
    • odgm/s - The total number of UDP datagrams sent per second from this entity [udpOutDatagrams].
    • noport/s - The total number of received UDP datagrams per second for which there was no application at the destination port [udpNoPorts].
    • idgmerr/s - The number of received UDP datagrams per second that could not be delivered for reasons other than the lack of an application at the destination port [udpInErrors].
  • sar -n UDP6:

    • idgm6/s - The total number of UDP datagrams delivered per second to UDP users [udpInDatagrams].
    • odgm6/s - The total number of UDP datagrams sent per second from this entity [udpOutDatagrams].
    • noport6/s - The total number of received UDP datagrams per second for which there was no application at the destination port [udpNoPorts].
    • idgmer6/s - The number of received UDP datagrams per second that could not be delivered for reasons other than the lack of an application at the destination port [udpInErrors].
  • sar -q:

    • runq-sz - Run queue length (number of tasks waiting for run time).
    • plist-sz - Number of tasks in the task list.
    • ldavg-1 - System load average for the last minute. The load average is calculated as the average number of runnable or running tasks (R state), and the number of tasks in uninterruptible sleep (D state) over the specified interval.
    • ldavg-5 - System load average for the past 5 minutes.
    • ldavg-15 - System load average for the past 15 minutes.
    • blocked - Number of tasks currently blocked, waiting for I/O to complete.
  • sar -r:

    • kbmemfree - Amount of free memory available in kilobytes.
    • kbavail - Estimate of how much memory in kilobytes is available for starting new applications, without swapping. The estimate takes into account that the system needs some page cache to function well, and that not all reclaimable memory slabs will be reclaimable, due to items being in use. The impact of those factors will vary from system to system.
  • kbmemused - Amount of used memory in kilobytes (calculated as total installed memory - kbmemfree - kbbuffers - kbcached - kbslab).

    • %memused - Percentage of used memory.
    • kbbuffers - Amount of memory used as buffers by the kernel in kilobytes.
    • kbcached - Amount of memory used to cache data by the kernel in kilobytes.
  • kbcommit - Amount of memory in kilobytes needed for current workload. This is an estimate of how much RAM/swap is needed to guarantee that there never is out of memory.

  • %commit - Percentage of memory needed for current workload in relation to the total amount of memory (RAM+swap). This number may be greater than 100% because the kernel usually overcommits memory.

  • kbactive - Amount of active memory in kilobytes (memory that has been used more recently and usually not reclaimed unless absolutely necessary).

  • kbinact - Amount of inactive memory in kilobytes (memory which has been less recently used. It is more eligible to be reclaimed for other purposes).

    • kbdirty - Amount of memory in kilobytes waiting to get written back to the disk.
    • kbanonpg - Amount of non-file backed pages in kilobytes mapped into userspace page tables.
    • kbslab - Amount of memory in kilobytes used by the kernel to cache data structures for its own use.
    • kbkstack - Amount of memory in kilobytes used for kernel stack space.
    • kbpgtbl - Amount of memory in kilobytes dedicated to the lowest level of page tables.
    • kbvmused - Amount of memory in kilobytes of used virtual address space.
  • sar -S:

    • kbswpfree - Amount of free swap space in kilobytes.
    • kbswpused - Amount of used swap space in kilobytes.
    • %swpused - Percentage of used swap space.
  • kbswpcad - Amount of cached swap memory in kilobytes. This is memory that once was swapped out, is swapped back in but still also is in the swap area (if memory is needed it doesn’t need to be swapped out again because it is already in the swap area. This saves I/O).

    • %swpcad - Percentage of cached swap memory in relation to the amount of used swap space.
  • sar -u:

  • %user - Percentage of CPU utilization that occurred while executing at the user level (application). Note that this field includes time spent running virtual processors.

  • %usr - Percentage of CPU utilization that occurred while executing at the user level (application). Note that this field does NOT include time spent running virtual processors.

  • %nice - Percentage of CPU utilization that occurred while executing at the user level with nice priority.

  • %system - Percentage of CPU utilization that occurred while executing at the system level (kernel). Note that this field includes time spent servicing hardware and software interrupts.

  • %sys - Percentage of CPU utilization that occurred while executing at the system level (kernel). Note that this field does NOT include time spent servicing hardware or software interrupts.

  • %iowait - Percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.

  • %steal - Percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor.

    • %irq - Percentage of time spent by the CPU or CPUs to service hardware interrupts.
    • %soft - Percentage of time spent by the CPU or CPUs to service software interrupts.
    • %guest - Percentage of time spent by the CPU or CPUs to run a virtual processor.
    • %gnice - Percentage of time spent by the CPU or CPUs to run a niced guest.
  • %idle - Percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request.

  • sar -v:

    • dentunusd - Number of unused cache entries in the directory cache.
    • file-nr - Number of file handles used by the system.
    • inode-nr - Number of inode handlers used by the system.
    • pty-nr - Number of pseudo-terminals used by the system.
  • sar -W: Report swapping statistics. The following values are displayed:

    • pswpin/s - Total number of swap pages the system brought in per second.
    • pswpout/s - Total number of swap pages the system brought out per second.
  • sar -w: Report task creation and system switching activity.

    • proc/s - Tasks created per second; cswch/s - context switches per second.
  • sar -y: Report TTY devices activity. The following values are displayed:

  • rcvin/s - Number of receive interrupts per second for current serial line. Serial line number is given in the TTY column.

    • xmtin/s - Number of transmit interrupts per second for current serial line.
    • framerr/s - Number of frame errors per second for current serial line.
    • prtyerr/s - Number of parity errors per second for current serial line.
    • brk/s - Number of breaks per second for current serial line.
    • ovrun/s - Number of overrun errors per second for current serial line.

45. pidstat

  • Compat: Linux; Requires: sysstat.
  • monitor individual tasks currently being managed Requires: sysstat.

Cheat Card - CPU by PID: pidstat -u 1 -p <pid> (watch %usr/%system/%wait) - IO by PID: pidstat -d 1 -p <pid> (check kB_rd/skB_wr/siodelay) - Memory faults: pidstat -r 1 -p <pid> (watch majflt/s) - Threads: pidstat -t -u 1 -p <pid> - pidstat -d: - Key fields: kB_rd/skB_wr/siodelay (IO wait), kB_ccwr/s (cancelled writes). - pidstat -R: Report realtime priority and scheduling policy information. The following values may be displayed: - Key fields: priopolicy. - pidstat -r: Report page faults and memory utilization. When reporting statistics for individual tasks, the following values may be displayed: - Key fields: majflt/s (major faults), RSS%MEM. When reporting global statistics for tasks and all their children, the following values may be displayed: - With children: majflt-nrminflt-nr summarize faults. - pidstat -s: Report stack utilization. The following values may be displayed: - Key fields: StkRef (used), StkSize (reserved). - pidstat -t: Also display statistics for threads associated with selected tasks. List process and threads - pidstat -u: Report CPU utilization. When reporting statistics for individual tasks, the following values may be displayed: - Key fields: %usr%system%wait%CPUCPU. When reporting global statistics for tasks and all their children, the following values may be displayed: - With children: usr-mssystem-msguest-ms summarize CPU time.

46. lsof

  • Compat: Linux; May require root to see all descriptors; Requires: lsof.
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    # List all open files
    lsof

    # Processes using a file? (fuser equivalent)
    lsof /path/to/file

    # Open files within a directory
    lsof +D /path

    # Files by user
    lsof -u name
    lsof -u name1,name2
    lsof -u name1 -u name2

    # By program name
    lsof -c apache

    # AND'ing selection conditions
    lsof -u www-data -c apache

    # By pid
    lsof -p 1

    # Except certain pids
    lsof -p ^1

    # TCP and UDP connections
    lsof -i
    lsof -i tcp # TCP connections
    lsof -i udp # UDP connections

    # By port
    lsof -i :25
    lsof -i :smtp
    lsof -i udp:53
    lsof -i tcp:80

    # All network activity by a user
    lsof -a -u name1 -i

    lsof -N # NFS use
    lsof -U # UNIX domain socket use

    # List PIDs
    lsof -t -i
    # Danger: broad kill; preview and scope carefully before use
    kill -9 $(lsof -t -i) # Kill all programs w/network activity

Requires: lsof.

51. pmap

  • Compat: Linux; Requires: procps; -X needs procps-ng.
  • pmap 29740 -X: show Address,Perm,Offset,Device,Inode,Size,Rss,Pss,Referenced,Anonymous,LazyFree, ShmemPmdMapped,Shared_Hugetlb,Private_Hugetlb,Swap,SwapPss,Locked,THPeligible, Mapping Requires: procps. Common recipes
  • Largest mappings first: pmap -x <pid> | sort -nrk 3 | head (by RSS KB)
  • Totals summary: pmap <pid> (last line shows total)

52. blktrace

  • Compat: Linux; Root required; Needs kernel block trace support; Requires: blktrace.
  • blktrace is a block layer IO tracing mechanism which provides detailed information about request queue operations up to user space. There are three major components: a kernel component, a utility to record the i/o trace information for the kernel to user space, and utilities to analyse and view the trace information.
    1
    2
    # Trace block I/O on /dev/sda and parse
    sudo blktrace -d /dev/sda -o - | blkparse -i -
    Requires: blktrace. outputs:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    CPU0 (8,0):
    Reads Queued: 385, 1540KiB Writes Queued: 0, 0KiB
    Read Dispatches: 75, 1544KiB Write Dispatches: 4, 16KiB
    Reads Requeued: 0 Writes Requeued: 0
    Reads Completed: 681, 15168KiB Writes Completed: 42, 1208KiB
    Read Merges: 315, 1260KiB Write Merges: 0, 0KiB
    Read depth: 84 Write depth: 21
    IO unplugs: 63 Timer unplugs: 0
    CPU1 (8,0):
    Reads Queued: 406, 1624KiB Writes Queued: 13, 996KiB
    Read Dispatches: 71, 1620KiB Write Dispatches: 10, 992KiB
    Reads Requeued: 1 Writes Requeued: 0
    Reads Completed: 0, 0KiB Writes Completed: 0, 0KiB
    Read Merges: 336, 1344KiB Write Merges: 2, 200KiB
    Read depth: 84 Write depth: 21
    IO unplugs: 68 Timer unplugs: 0
    CPU2 (8,0):
    Reads Queued: 1531, 6152KiB Writes Queued: 30, 120KiB
    Read Dispatches: 257, 6152KiB Write Dispatches: 3, 108KiB
    Reads Requeued: 0 Writes Requeued: 0
    Reads Completed: 0, 0KiB Writes Completed: 0, 0KiB
    Read Merges: 1277, 5108KiB Write Merges: 24, 96KiB
    Read depth: 84 Write depth: 21
    IO unplugs: 255 Timer unplugs: 0
    CPU3 (8,0):
    Reads Queued: 1266, 5852KiB Writes Queued: 23, 92KiB
    Read Dispatches: 279, 5852KiB Write Dispatches: 21, 92KiB
    Reads Requeued: 0 Writes Requeued: 0
    Reads Completed: 0, 0KiB Writes Completed: 0, 0KiB
    Read Merges: 987, 3948KiB Write Merges: 2, 8KiB
    Read depth: 84 Write depth: 21
    IO unplugs: 279 Timer unplugs: 1

    Total (8,0):
    Reads Queued: 3588, 15168KiB Writes Queued: 66, 1208KiB
    Read Dispatches: 682, 15168KiB Write Dispatches: 38, 1208KiB
    Reads Requeued: 1 Writes Requeued: 0
    Reads Completed: 681, 15168KiB Writes Completed: 42, 1208KiB
    Read Merges: 2915, 11660KiB Write Merges: 28, 304KiB
    IO unplugs: 665 Timer unplugs: 1

53. btrace

  • Compat: Linux; Wrapper script from blktrace; Root required.
  • The btrace script provides a quick and easy way to do live tracing of block devices. It calls blktrace on the specified devices and pipes the output through blkparse for formatting. See blktrace (8) for more in-depth information about how blktrace works.
  • btrace /dev/sda Requires: blktrace.

54. tr

  • Compat: Linux; Requires: coreutils. Translate, squeeze, and/or delete characters from standard input, writing to standard output.
  • tr '\n' ',': convert new lines to commas
  • squeeze repeats: tr -s ' ' < file (collapse runs of spaces)
  • delete chars: tr -d '\r' < file (remove CR)
  • keep only printable: tr -cd '[:print:]\n' < file
  • case convert: tr '[:upper:]' '[:lower:]' < file

55. cut

  • Compat: Linux; Requires: coreutils.
  • select CSV fields: cut -d, -f1,3 file.csv
  • ranges: cut -d: -f1-3 /etc/passwd
  • bytes/chars: cut -b1-10 filecut -c1-20 file
  • complement: cut -d, -f1 --complement file.csv
  • with headers: pair with head -1 to see column indexes

56. xargs

  • Compat: Linux; Requires: findutils; GNU -r may vary on BusyBox. Build and run argument lists; combine with find and null-terminated records for safety.
  • safe null delim: find . -type f -name '*.log' -print0 | xargs -0 rm -f
  • limit args per call: xargs -n 1 -I{} sh -c 'echo {}'
  • parallelism: xargs -P 4 -n 1 cmd (run 4 at a time)
  • interactive confirm: xargs -p rm (ask before each batch)
  • do nothing on empty input: xargs -r cmd (GNU)

Logs & Systemd

Cheat Card

  • Unit status: systemctl status <unit>; failed: systemctl list-units --failed
  • Hot errors: journalctl -xeu <unit>; follow: journalctl -fu <unit>
  • Boot scoping: journalctl -b and -b -1; size: journalctl --disk-usage

Systemd basics

1
2
3
4
5
6
7
8
9
10
11
12
# Unit status and enablement
systemctl status <unit>
systemctl is-active <unit>
systemctl is-enabled <unit>

# Failed units overview
systemctl list-units --failed
journalctl -xe # recent critical logs

# Restart and verify logs from this boot
systemctl restart <unit>
journalctl -u <unit> -b -n 50

Journal essentials

1
2
3
4
5
6
7
8
9
10
11
12
13
# Recent errors for a unit and live follow
journalctl -xeu <unit>
journalctl -fu <unit>

# Time window and priority
journalctl -u <unit> --since "1 hour ago" --until now
journalctl -p err..alert -b

# Previous boot
journalctl -b -1

# JSON output piped to jq (Requires: jq)
journalctl -u <unit> -o json | jq -r '.MESSAGE'

Journal management

1
2
3
4
5
6
7
8
9
10
# Disk usage and vacuum
journalctl --disk-usage
journalctl --vacuum-size=1G
journalctl --vacuum-time=7d

# Make logs persistent (requires root; edit journald.conf)
# /etc/systemd/journald.conf: set Storage=persistent
systemctl restart systemd-journald

# Tip: tune RateLimitIntervalSec/RateLimitBurst to manage log storms

Resolved (DNS)

1
2
3
4
5
6
7
8
# Overall resolver status
resolvectl status

# Query using systemd-resolved
resolvectl query example.com

# Flush caches
resolvectl flush-caches

Security & Audit

Cheat Card

  • SELinux mode: getenforce; recent denials: ausearch -m AVC -ts recent
  • AppArmor status: aa-status; set complain/enforce on a profile
  • Audit rule example: auditctl -w /etc/ssh/sshd_config -p wa -k sshcfg SELinux
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    # Current mode and temporary permissive (diagnostic; requires root)
    getenforce
    setenforce 0 # Caution: reduces enforcement

    # Contexts and recent denials
    ls -Z
    ps -eZ | head
    ausearch -m AVC -ts recent
    journalctl -t setroubleshoot

    # Manage booleans (example: allow httpd network connect)
    getsebool -a | grep httpd
    setsebool -P httpd_can_network_connect on
    # Requires: selinux-utils/policycoreutils; setroubleshoot (optional)
    AppArmor
    1
    2
    3
    4
    5
    6
    7
    8
    # Status and service
    aa-status
    systemctl status apparmor

    # Toggle a profile mode
    aa-complain /path/to/bin
    aa-enforce /path/to/bin
    # Requires: apparmor-utils
    Auditd
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    # Service and rules
    systemctl status auditd
    auditctl -l

    # Search recent denials / by PID
    ausearch -m avc -ts recent
    ausearch -p <pid> -ts recent

    # Watch a file for writes/attr changes (key: sshcfg)
    auditctl -w /etc/ssh/sshd_config -p wa -k sshcfg

    # Summary report
    aureport --summary -ts today
    # Requires: auditd (auditd, auditctl, ausearch, aureport)

Containers & Namespaces

Cheat Card

  • Enter container namespace: nsenter --target <pid> --mount --uts --ipc --net --pid -- bash

  • Docker triage: docker psdocker logs --tail=200 -f <id>docker exec -it <id> sh

  • K8s triage: kubectl get pods -Akubectl describe pod <pod> -n <ns>kubectl logs <pod> -n <ns> --previous nsenter (enter namespaces of a PID)

    1
    2
    3
    4
    5
    6
    7
    8
    9
    # Get target PID (e.g., container process)
    pidof <proc>

    # Enter multiple namespaces of a PID
    nsenter --target <pid> --mount --uts --ipc --net --pid -- bash

    # Inspect and chroot-like into the process rootfs
    ls -l /proc/<pid>/root
    nsenter --target <pid> --mount -- chroot /proc/<pid>/root bash
    Docker (if present)
    1
    2
    3
    4
    5
    # List, exec, inspect PID, and tail logs
    docker ps --format '{{.ID}} {{.Names}} {{.Status}}'
    docker exec -it <id|name> bash # or sh
    docker inspect -f '{{.State.Pid}}' <id>
    docker logs --tail=200 -f <id>
    Kubernetes (if present)
    1
    2
    3
    4
    5
    6
    7
    8
    9
    # Pods and events
    kubectl get pods -A -o wide
    kubectl get events -A --sort-by=.lastTimestamp | tail

    # Describe, logs, and exec
    kubectl describe pod <pod> -n <ns>
    kubectl logs <pod> -n <ns> --tail=200
    kubectl logs <pod> -n <ns> --previous
    kubectl exec -it <pod> -n <ns> -- bash
    CRI/containerd (if present)
    1
    2
    3
    4
    # List, inspect, and logs via crictl
    crictl ps -a
    crictl inspect <id>
    crictl logs <id>
    Notes

  • Without runtime CLIs, use nsenter by PID from ps/systemctl.

  • Requires: docker or podman for Docker-like commands; kubectl; crictl for containerd/CRI. # Incident Playbooks

High CPU

1
2
3
4
5
6
7
8
# Top CPU processes and hot threads
ps -eo pid,ppid,user,%cpu,%mem,cmd --sort=-%cpu | head
top -H
ps -Lp <pid> -o pid,tid,pcpu,comm

# Per-process CPU over time; optional perf if available
pidstat -u 1 -p <pid>
perf top # if installed

High IO wait / Disk latency

1
2
3
4
5
6
7
8
9
10
11
# Device saturation and per-process IO
iostat -xz 1 # watch await, %util, r/s, w/s
pidstat -d 1
iotop -oPa

# Device/FS inventory and kernel errors
lsblk -o NAME,TYPE,SIZE,ROTA,MOUNTPOINT,MODEL
dmesg -T | egrep -i 'error|reset|blk|nvme'

# Optional deep dive
blktrace -d /dev/sdX -o - | blkparse -i -

Memory leak / OOM

1
2
3
4
5
6
7
8
9
10
11
# Snapshot and top RSS processes
free -h
ps aux --sort=-rss | head

# Per-process mappings and over-time faults
pmap -x <pid> | sort -nrk 3 | head
pidstat -r 1 -p <pid>
smem -r # if installed

# OOM evidence
dmesg -T | grep -i oom || journalctl -k -g OOM

Packet loss / High latency

1
2
3
4
5
6
7
8
9
10
11
12
# Path and end-to-end latency
ip route get <dest>
mtr -ezbw <dest>

# Interface health and TCP details
ip -s link show <iface>
ethtool -S <iface>
ss -i dst <dest>

# Targeted capture samples
tcpdump -ni <iface> host <dest> and icmp
tcpdump -ni <iface> tcp port 443 and 'tcp[tcpflags] & tcp-syn != 0'

DNS failures

1
2
3
4
5
6
7
8
9
10
11
12
13
# Resolve via systemd-resolved (or dig if available)
resolvectl query example.com
resolvectl status
dig @8.8.8.8 example.com A +time=2 +tries=1

# Reachability and captures
ss -u 'sport = :53 or dport = :53'
tcpdump -ni <iface> port 53

# Config checks
ls -l /etc/resolv.conf
resolvectl flush-caches
# Check firewall rules as appropriate

TLS handshake issues

1
2
3
4
5
6
7
8
9
10
11
12
# Inspect handshake/cert chain (TLS1.2 example)
openssl s_client -connect host:443 -servername host -tls1_2 -showcerts

# Check expiry/subject/issuer quickly
echo | openssl s_client -connect host:443 -servername host 2>/dev/null \
| openssl x509 -noout -dates -subject -issuer

# App behavior (SNI, ALPN, protocols)
curl -v https://host/

# If proxy/MTLS: verify CA path and client certs; check time skew
timedatectl

Disk full / Inode exhaustion

1
2
3
4
5
6
7
8
9
10
11
12
13
# Space vs inodes
df -h
df -i

# Find biggest dirs on same filesystem
du -xhd1 /path | sort -h

# Deleted-but-open files
lsof +L1
journalctl --vacuum-size=1G # cull journal size

# Many small files
find /path -xdev -type f | wc -l

Syscall slowness

1
2
3
4
5
# Trace syscalls and timings
strace -ttT -p <pid> -f -e trace=network,file,fsync,clock,nanosleep

# Optional CPU hotspot profiling
perf record -g -p <pid>; perf report

Container restart loops

1
2
3
4
5
6
7
8
9
10
11
# Docker restart loops
docker ps --filter 'status=restarting'
docker logs <id> --tail=200

# Kubernetes restart loops
kubectl get pods -A
kubectl describe pod <pod> -n <ns>
kubectl logs <pod> -n <ns> --previous

# Node/agent issues
journalctl -u kubelet

_config.next.yml文件

1
2
3
4
5
6
7
# Mermaid tag
mermaid:
enable: true
# Available themes: default | dark | forest | neutral
theme:
light: forest
dark: dark

例子

flowchart

A-- This is the text! ---B
sequenceDiagram
Alice->>John: Hello John, how are you?
John-->>Alice: Great!
Alice-)John: See you later!