122 技術社區[雲棲]

kubernetes集群問題排查

本文CSDN博客地址：https://blog.csdn.net/huwh_/article/details/71308301

1. 查看係統Event事件

[plain] view plain copy

kubectl describe pod <PodName> --namespace=<NAMESPACE>

該命令可以顯示Pod創建時的配置定義、狀態等信息和最近的Event事件，事件信息可用於排錯。例如當Pod狀態為Pending，可通過查看Event事件確認原因，一般原因有幾種：

沒有可用的Node可調度
開啟了資源配額管理並且當前Pod的目標節點上恰好沒有可用的資源
正在下載鏡像（鏡像拉取耗時太久）

kubectl describe還可以查看其它k8s對象：NODE,RC,Service,Namespace,Secrets。

1.1. Pod

kubectl describe pod <PodName> --namespace=<NAMESPACE>

1.2. NODE

kubectl describe NODE

[plain] view plain copy

Name: runtime2.foshan2.wae.haplat.net
Labels: kubernetes.io/hostname=runtime2.foshan2.wae.haplat.net,namespace/test=true
CreationTimestamp: Fri, 01 Apr 2016 17:34:16 +0800
Phase:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
──── ────── ───────────────── ────────────────── ────── ───────
Ready True Sat, 08 Apr 2017 14:15:41 +0800 Sun, 26 Mar 2017 08:58:04 +0800 KubeletReady kubelet is posting ready status
OutOfDisk False Sat, 08 Apr 2017 14:15:41 +0800 Fri, 01 Apr 2016 17:34:16 +0800 KubeletHasSufficientDisk kubelet hassufficient disk space available
Addresses: 221.5.100.100,221.5.100.100
Capacity:
memory: 134975102976
pods: 40
cpu: 32
System Info:
Machine ID: 120de474f77e4d75a670a74eea6d1e45
System UUID: 1C929431-8D94-11E1-BD1D-001E6744D094
Boot ID: 2a154beb-86e4-40e3-acce-4f83b1ea5ed2
Kernel Version: 3.10.0-229.20.1.el7.x86_64
OS Image: CentOS Linux 7 (Core)
Container Runtime Version: docker://1.8.2-el7.centos
Kubelet Version: v1.1.1-wae2-12
Kube-Proxy Version: v1.1.1-wae2-12
ExternalID: runtime2.foshan2.wae.haplat.net
Non-terminated Pods: (6 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
───────── ──── ──────────── ────────── ─────────────── ─────────────
acp acp-ui-1-4-0-16j3a 4 (12%) 4 (12%) 8589934592 (6%) 8589934592 (6%)
acp acp-ui-1-4-1-kou20 4 (12%) 4 (12%) 8589934592 (6%) 8589934592 (6%)
acp acp-ui-api-1-4-0-u3w4f 4 (12%) 4 (12%) 8589934592 (6%) 8589934592 (6%)
cloud-eye cloud-eye-dim2-1-6-6-1-9g16i 4 (12%) 4 (12%) 34359738368 (25%) 34359738368 (25%)
cloud-wst cloud-wst-ceba-1-12-0-1-hjdk2 2 (6%) 2 (6%) 8589934592 (6%) 8589934592 (6%)
cms-fd cms-fd-schedule-3-9-1-3-wzqa1 8 (25%) 8 (25%) 21474836480 (15%) 21474836480 (15%)
Allocated resources:
(Total limits may be over 100%, i.e., overcommitted. More info: https://releases.k8s.io/HEAD/docs/user-guide/compute-resources.md)
CPU Requests CPU Limits Memory Requests Memory Limits
──────────── ────────── ─────────────── ─────────────
26 (81%) 26 (81%) 90194313216 (66%) 90194313216 (66%)
No events.

1.3. RC

kubectl describe rc --namespace=rmp

[plain] view plain copy

[root@node5 ~]# kubectl describe rc --namespace=rmp
Name: rmp-web-2-15-3-1
Namespace: rmp
Image(s): registry.wae.haplat.net/rmp/rmp-web:2.15.3-1
Selector: app=rmp-web,appVersion=2.15.3-1
Labels: app=rmp-web,appVersion=2.15.3-1,env=product,zone=foshan2
Replicas: 1 current / 1 desired
Pods Status: 1 Running / 0 Waiting / 0 Succeeded / 0 Failed
No volumes.
No events.

1.4. NAMESPACE

kubectl describe NAMESPACE

[plain] view plain copy

[root@node5 ~]# kubectl describe NAMESPACE
Name: acp
Labels: <none>
Status: Active
Resource Quotas
Resource Used Hard
--- --- ---
cpu 24 20
memory 51539607552 53687091200
persistentvolumeclaims 0 10
pods 6 10
replicationcontrollers 6 10
resourcequotas 1 1
secrets 2 10
services 6 10
No resource limits.

1.5. Service

kubectl describe Service --namespace=rmp

[plain] view plain copy

[root@node5 ~]# kubectl describe Service --namespace=rmp
Name: rmp-web-2-15-3-1
Namespace: rmp
Labels: app=rmp-web,appVersion=2.15.3-1,waeEnv=product,waeZone=foshan2
Selector: app=rmp-web,appVersion=2.15.3-1
Type: ClusterIP
IP: 10.254.201.163
Port: port-l7-tcp-80 80/TCP
Endpoints: 10.0.68.240:80
Session Affinity: None
No events.

2. 查看容器日誌

1、查看指定pod的日誌

kubectl logs <pod_name>

kubectl logs -f <pod_name> #類似tail -f的方式查看

2、查看上一個pod的日誌

kubectl logs -p <pod_name>

3、查看指定pod中指定容器的日誌

kubectl logs <pod_name> -c <container_name>

[root@node5 ~]# kubectl logs --help

Print the logs for a container in a pod. If the pod has only one container, the container name is optional.

Usage:

kubectl logs [-f] [-p] POD [-c CONTAINER] [flags]

Aliases:

logs, log

Examples:

# Return snapshot logs from pod nginx with only one container

$ kubectl logs nginx

# Return snapshot of previous terminated ruby container logs from pod web-1

$ kubectl logs -p -c ruby web-1

# Begin streaming the logs of the ruby container in pod web-1

$ kubectl logs -f -c ruby web-1

# Display only the most recent 20 lines of output in pod nginx

$ kubectl logs --tail=20 nginx

# Show all logs from pod nginx written in the last hour

$ kubectl logs --since=1h nginx

3. 查看k8s服務日誌

3.1. journalctl

在Linux係統上systemd係統來管理kubernetes服務，並且journal係統會接管服務程序的輸出日誌，可以通過systemctl status <xxx>或journalctl -u <xxx> -f來查看kubernetes服務的日誌。

其中kubernetes組件包括：

kube-apiserver
kube-controller-manager	Pod擴容相關或RC相關
kube-scheduler	Pod擴容相關或RC相關
kubelet	Pod生命周期相關：創建、停止等
etcd

3.2. 日誌文件

也可以通過指定日誌存放目錄來保存和查看日誌

--logtostderr=false：不輸出到stderr
--log-dir=/var/log/kubernetes:日誌的存放目錄
--alsologtostderr=false:設置為true表示日誌輸出到文件也輸出到stderr
--v=0:glog的日誌級別
--vmodule=gfs*=2,test*=4：glog基於模塊的詳細日誌級別

4. 常見問題

4.1. Pod狀態一直為Pending

kubectl describe <pod_name> --namespace=<NAMESPACE>

查看該POD的事件。

正在下載鏡像但拉取不下來（鏡像拉取耗時太久）[一般都是該原因]
沒有可用的Node可調度
開啟了資源配額管理並且當前Pod的目標節點上恰好沒有可用的資源

解決方法：

查看該POD所在宿主機與鏡像倉庫之間的網絡是否有問題，可以手動拉取鏡像
刪除POD實例，讓POD調度到別的宿主機上

4.2. Pod創建後不斷重啟

kubectl get pods中Pod狀態一會running，一會不是，且RESTARTS次數不斷增加。

一般原因為容器啟動命令不是阻塞式命令，導致容器運行後馬上退出。

非阻塞式命令：

本身CMD指定的命令就是非阻塞式命令
將服務啟動方式設置為後台運行

解決方法：

1、將命令改為阻塞式命令（前台運行），例如：zkServer.sh start-foreground

2、Java運行程序的啟動腳本將 nohup xxx &的nobup和&去掉，例如：

nohup $JAVA_HOME/bin/java $JAVA_OPTS -cp $CLASSPATH com.cnc.open.processor.Main &

改為：

$JAVA_HOME/bin/java $JAVA_OPTS -cp $CLASSPATH com.cnc.open.processor.Main

文章參考：

《Kubernetes權威指南》

最後更新：2017-08-13 22:51:21

kubernetes集群問題排查

1. 查看係統Event事件

1.1. Pod

1.2. NODE

1.3. RC

1.4. NAMESPACE

1.5. Service

2. 查看容器日誌

3. 查看k8s服務日誌

3.1. journalctl

3.2. 日誌文件

4. 常見問題

4.1. Pod狀態一直為Pending

4.2. Pod創建後不斷重啟

上一篇： MaxCompute和DataIDE權限體係介紹

下一篇： Kubernetes總架構圖

相關內容

熱門內容

最新內容