Kubernetes中Nginx服务启动失败排查流程分析(Error: ImagePullBackOff)
pod节点启动失败,nginx服务无法正常访问,服务状态显示为ImagePullBackOff。
[root@m1 ~]# kubectl get pods NAME READY STATUS RESTARTS AGE nginx-f89759699-cgjgp 0/1 ImagePullBackOff 0 103m
查看nginx服务的Pod节点详细信息。
[root@m1 ~]# kubectl describe pod nginx-f89759699-cgjgp
Name: nginx-f89759699-cgjgp
Namespace: default
Priority: 0
Service Account: default
Node: n1/192.168.200.84
Start Time: Fri, 10 Mar 2023 08:40:33 +0800
Labels: app=nginx
pod-template-hash=f89759699
Annotations: <none>
Status: Pending
IP: 10.244.3.20
IPs:
IP: 10.244.3.20
Controlled By: ReplicaSet/nginx-f89759699
Containers:
nginx:
Container ID:
Image: nginx
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: ImagePullBackOff
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-zk8sj (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-zk8sj:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-zk8sj
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal BackOff 57m (x179 over 100m) kubelet Back-off pulling image "nginx"
Normal Pulling 7m33s (x22 over 100m) kubelet Pulling image "nginx"
Warning Failed 2m30s (x417 over 100m) kubelet Error: ImagePullBackOff
发现,获取nginx镜像失败。可能是由于Docker服务引起的。
于是,检查Docker是否正常启动
systemctl status docker
发现,docker服务启动失败,手动尝试重新启动。
systemctl restart docker
但是,重启docker服务失败,出现如下报错信息。
[root@m1 ~]# systemctl restart docker Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.
执行systemctl restart docker命令失效。
接着,当执行docker version命令时,发现未能连接到Docker daemon
[root@m1 ~]# docker version Client: Docker Engine - Community Version: 20.10.17 API version: 1.41 Go version: go1.17.11 Git commit: 100c701 Built: Mon Jun 6 23:03:11 2022 OS/Arch: linux/amd64 Context: default Experimental: true Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
于是,再次通过执行systemctl status docker命令,查看docker服务未能启动,阅读输出报错信息,如下所示。
[root@m1 ~]# systemctl status docker
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Fri 2023-03-10 10:28:16 CST; 4min 35s ago
Docs: https://docs.docker.com
Main PID: 2221 (code=exited, status=1/FAILURE)
Mar 10 10:28:13 m1 systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Mar 10 10:28:13 m1 systemd[1]: docker.service: Failed with result 'exit-code'.
Mar 10 10:28:13 m1 systemd[1]: Failed to start Docker Application Container Engine.
Mar 10 10:28:16 m1 systemd[1]: docker.service: Service RestartSec=2s expired, scheduling restart.
Mar 10 10:28:16 m1 systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
Mar 10 10:28:16 m1 systemd[1]: Stopped Docker Application Container Engine.
Mar 10 10:28:16 m1 systemd[1]: docker.service: Start request repeated too quickly.
Mar 10 10:28:16 m1 systemd[1]: docker.service: Failed with result 'exit-code'.
Mar 10 10:28:16 m1 systemd[1]: Failed to start Docker Application Container Engine.
[root@m1 ~]#
通过上述输出显示,Docker 服务进程的启动失败,状态为 1/FAILURE。
接下来,尝试通过以下步骤来排查和解决问题:
1️⃣查看 Docker 服务日志:使用以下命令查看 Docker 服务日志,以便更详细地了解失败原因。
sudo journalctl -u docker.service

2️⃣ 通过输出Ddocker日志分析,提取到了相关报错信息片段,发现是配置daemon中的/etc/docker/daemon.json配置文件出错导致的。
Mar 10 10:20:17 m1 systemd[1]: Starting Docker Application Container Engine... Mar 10 10:20:17 m1 dockerd[1572]: unable to configure the Docker daemon with file /etc/docker/daemon.json: invalid character '"' after object key:value pair Mar 10 10:20:17 m1 systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE Mar 10 10:20:17 m1 systemd[1]: docker.service: Failed with result 'exit-code'. Mar 10 10:20:17 m1 systemd[1]: Failed to start Docker Application Container Engine. Mar 10 10:20:19 m1 systemd[1]: docker.service: Service RestartSec=2s expired, scheduling restart. Mar 10 10:20:19 m1 systemd[1]: docker.service: Scheduled restart job, restart counter is at 2. Mar 10 10:20:19 m1 systemd[1]: Stopped Docker Application Container Engine.
3️⃣此时,查看daemon配置文件/etc/docker/daemon.json是否配置正确。
[root@m1 ~]# cat /etc/docker/daemon.json
{
# 设置 Docker 镜像的注册表镜像源为阿里云镜像源。
"registry-mirrors": ["https://w2kavmmf.mirror.aliyuncs.com"]
# 指定 Docker 守护进程使用 systemd 作为 cgroup driver。
"exec-opts": ["native.cgroupdriver=systemd"]
}
咋一看,配置信息没有什么问题,都是正确的,但仔细一看,就会发现应该在"registry-mirrors"选项的结尾添加逗号。犯了缺少逗号(,)导致的语法错误,终于找到了问题根源。
赞 (0)
