k8s集群部署时etcd容器不停重启问题以及处理详解

目录
  • 问题现象
  • 解决问题
  • 总结

问题现象

在安装部署Kubernetes 1.26版本时,通过kubeadm初始化集群后,发现执行kubectl命令报以下错误:

The connection to the server localhost:8080 was refused - did you specify the right host or port?

查看kubelet状态是否正常,发现无法连接apiserver的6443端口。

Dec 21 09:36:03 k8s-master kubelet[7127]: E1221 09:36:03.015089    7127 kubelet_node_status.go:540] "Error updating node status, will retry" err="error getting node \"k8s-master\": Get \"https://192.168.2.200:6443/api/v1/nodes/k8s-master?timeout=10s\": dial tcp 192.168.2.200:6443: connect: connection refused"
Dec 21 09:36:03 k8s-master kubelet[7127]: E1221 09:36:03.015445    7127 kubelet_node_status.go:540] "Error updating node status, will retry" err="error getting node \"k8s-master\": Get \"https://192.168.2.200:6443/api/v1/nodes/k8s-master?timeout=10s\": dial tcp 192.168.2.200:6443: connect: connection refused"
Dec 21 09:36:03 k8s-master kubelet[7127]: E1221 09:36:03.015654    7127 kubelet_node_status.go:540] "Error updating node status, will retry" err="error getting node \"k8s-master\": Get \"https://192.168.2.200:6443/api/v1/nodes/k8s-master?timeout=10s\": dial tcp 192.168.2.200:6443: connect: connection refused"
Dec 21 09:36:03 k8s-master kubelet[7127]: E1221 09:36:03.015818    7127 kubelet_node_status.go:540] "Error updating node status, will retry" err="error getting node \"k8s-master\": Get \"https://192.168.2.200:6443/api/v1/nodes/k8s-master?timeout=10s\": dial tcp 192.168.2.200:6443: connect: connection refused"

进而查看apiserver容器的状态,由于是基于containerd作为容器运行时,此时kubectl不可用的情况下,使用crictl ps -a命令可以查看所有容器的情况。

root@k8s-master:~/k8s/calico# crictl ps -a
CONTAINER           IMAGE               CREATED             STATE               NAME                      ATTEMPT             POD ID              POD
395b45b1cb733       a31e1d84401e6       50 seconds ago      Exited              kube-apiserver            28                  e87800ae06ff5       kube-apiserver-k8s-master
b5c7e2a07bf1b       5d7c5dfd3ba18       3 minutes ago       Running             kube-controller-manager   32                  6b7cc9dd07f1d       kube-controller-manager-k8s-master
944aa31862613       556768f31eb1d       4 minutes ago       Exited              kube-proxy                27                  ccb6557c6f629       kube-proxy-ctjjq
c097332b6f416       fce326961ae2d       4 minutes ago       Exited              etcd                      30                  079d491eb9925       etcd-k8s-master
b8103090322c4       dafd8ad70b156       6 minutes ago       Exited              kube-scheduler            32                  48f9544c9798c       kube-scheduler-k8s-master
a14b969e8ad05       5d7c5dfd3ba18       12 minutes ago      Exited              kube-controller-manager   31                  5576806b4e142       kube-controller-manager-k8s-master

发现此时kube-apiserver容器已经退出,查看容器日志是否有异常信息。通过日志信息发现是kube-apiserver无法连接etcd的2379端口,那么问题应该是出在etcd了。

W1221 07:00:20.392868       1 logging.go:59] [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {
  "Addr": "127.0.0.1:2379",
  "ServerName": "127.0.0.1",
  "Attributes": null,
  "BalancerAttributes": null,
  "Type": 0,
  "Metadata": null
}. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused"
W1221 07:00:21.391330       1 logging.go:59] [core] [Channel #4 SubChannel #6] grpc: addrConn.createTransport failed to connect to {
  "Addr": "127.0.0.1:2379",
  "ServerName": "127.0.0.1",
  "Attributes": null,
  "BalancerAttributes": null,
  "Type": 0,
  "Metadata": null
}. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused"

此时etcd容器也在不断地重启,查看其日志发现没有错误级别的信息。

{"level":"info","ts":"2022-12-21T10:29:00.740Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"d975d9ebc69964b3 is starting a new election at term 2"}
{"level":"info","ts":"2022-12-21T10:29:00.740Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"d975d9ebc69964b3 became pre-candidate at term 2"}
{"level":"info","ts":"2022-12-21T10:29:00.740Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"d975d9ebc69964b3 received MsgPreVoteResp from d975d9ebc69964b3 at term 2"}
{"level":"info","ts":"2022-12-21T10:29:00.740Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"d975d9ebc69964b3 became candidate at term 3"}
{"level":"info","ts":"2022-12-21T10:29:00.740Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"d975d9ebc69964b3 received MsgVoteResp from d975d9ebc69964b3 at term 3"}
{"level":"info","ts":"2022-12-21T10:29:00.740Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"d975d9ebc69964b3 became leader at term 3"}
{"level":"info","ts":"2022-12-21T10:29:00.740Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"raft.node: d975d9ebc69964b3 elected leader d975d9ebc69964b3 at term 3"}
{"level":"info","ts":"2022-12-21T10:29:00.742Z","caller":"etcdserver/server.go:2054","msg":"published local member to cluster through raft","local-member-id":"d975d9ebc69964b3","local-member-attributes":"{Name:k8s-master ClientURLs:[https://192.168.2.200:2379]}","request-path":"/0/members/d975d9ebc69964b3/attributes","cluster-id":"f88ac1c8c4bab6","publish-timeout":"7s"}
{"level":"info","ts":"2022-12-21T10:29:00.742Z","caller":"embed/serve.go:100","msg":"ready to serve client requests"}
{"level":"info","ts":"2022-12-21T10:29:00.742Z","caller":"embed/serve.go:100","msg":"ready to serve client requests"}
{"level":"info","ts":"2022-12-21T10:29:00.743Z","caller":"etcdmain/main.go:44","msg":"notifying init daemon"}
{"level":"info","ts":"2022-12-21T10:29:00.743Z","caller":"etcdmain/main.go:50","msg":"successfully notified init daemon"}
{"level":"info","ts":"2022-12-21T10:29:00.744Z","caller":"embed/serve.go:198","msg":"serving client traffic securely","address":"192.168.2.200:2379"}
{"level":"info","ts":"2022-12-21T10:29:00.745Z","caller":"embed/serve.go:198","msg":"serving client traffic securely","address":"127.0.0.1:2379"}
{"level":"info","ts":"2022-12-21T10:30:20.624Z","caller":"osutil/interrupt_unix.go:64","msg":"received signal; shutting down","signal":"terminated"}
{"level":"info","ts":"2022-12-21T10:30:20.624Z","caller":"embed/etcd.go:373","msg":"closing etcd server","name":"k8s-master","data-dir":"/var/lib/etcd","advertise-peer-urls":["https://192.168.2.200:2380"],"advertise-client-urls":["https://192.168.2.200:2379"]}
{"level":"info","ts":"2022-12-21T10:30:20.636Z","caller":"etcdserver/server.go:1465","msg":"skipped leadership transfer for single voting member cluster","local-member-id":"d975d9ebc69964b3","current-leader-member-id":"d975d9ebc69964b3"}
{"level":"info","ts":"2022-12-21T10:30:20.637Z","caller":"embed/etcd.go:568","msg":"stopping serving peer traffic","address":"192.168.2.200:2380"}
{"level":"info","ts":"2022-12-21T10:30:20.639Z","caller":"embed/etcd.go:573","msg":"stopped serving peer traffic","address":"192.168.2.200:2380"}
{"level":"info","ts":"2022-12-21T10:30:20.639Z","caller":"embed/etcd.go:375","msg":"closed etcd server","name":"k8s-master","data-dir":"/var/lib/etcd","advertise-peer-urls":["https://192.168.2.200:2380"],"advertise-client-urls":["https://192.168.2.200:2379"]}

但是,其中一行日志信息表示etcd收到了关闭的信号,并不是异常退出的。

{"level":"info","ts":"2022-12-21T10:30:20.624Z","caller":"osutil/interrupt_unix.go:64","msg":"received signal; shutting down","signal":"terminated"}

解决问题

该问题为未正确设置cgroups导致,在containerd的配置文件/etc/containerd/config.toml中,修改SystemdCgroup配置为true

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
  BinaryName = ""
  CriuImagePath = ""
  CriuPath = ""
  CriuWorkPath = ""
  IoGid = 0
  IoUid = 0
  NoNewKeyring = false
  NoPivotRoot = false
  Root = ""
  ShimCgroup = ""
  SystemdCgroup = true

重启containerd服务

systemctl restart containerd

etcd容器不再重启,其他容器也恢复正常,问题解决。

总结

到此这篇关于k8s集群部署时etcd容器不停重启问题以及处理方法的文章就介绍到这了,更多相关k8s集群部署etcd容器不停重启内容请搜索我们以前的文章或继续浏览下面的相关文章希望大家以后多多支持我们!

(0)

相关推荐

  • Kubernetes(K8S)容器集群管理环境完整部署详细教程-上篇

    Kubernetes(通常称为"K8S")是Google开源的容器集群管理系统.其设计目标是在主机集群之间提供一个能够自动化部署.可拓展.应用容器可运营的平台.Kubernetes通常结合docker容器工具工作,并且整合多个运行着docker容器的主机集群,Kubernetes不仅仅支持Docker,还支持Rocket,这是另一种容器技术.Kubernetes是一个用于容器集群的自动化部署.扩容以及运维的开源平台. 本文系列: Kubernetes(K8S)容器集群管理环境完整部署详

  • k8s部署redis cluster集群的实现

    Redis 介绍 Redis代表REmote DIctionary Server是一种开源的内存中数据存储,通常用作数据库,缓存或消息代理.它可以存储和操作高级数据类型,例如列表,地图,集合和排序集合. 由于Redis接受多种格式的密钥,因此可以在服务器上执行操作,从而减少了客户端的工作量. 它仅将磁盘用于持久性,而将数据完全保存在内存中. Redis是一种流行的数据存储解决方案,并被GitHub,Pinterest,Snapchat,Twitter,StackOverflow,Flickr等技

  • k8s如何给node添加标签(最新推荐)

    目录 一.为什么需要标签? 二.怎么查看目前node上具有的标签 三.设置节点标签信息 1.设置节点标签 2.查看 worker02的标签是否已经设置上 3.多维度标签 3.1 设置多维度标签 3.2  显示节点的相应标签 四.通过标签来查找node 4.1 查找env=test的节点 五.修改标签 六.删除node的标签 七.标签选择器 一.为什么需要标签? k8s集群如果由大量节点组成,可将节点打上对应的标签,然后通过标签进行筛选及查看,更好的进行资源对象的相关选择与匹配 二.怎么查看目前n

  • 如何给k8s集群里的资源打标签

    目录 如何给k8s集群里的资源打标签 补充:k8s kubernetes给node节点添加标签和删除node节点标签 如何给k8s集群里的资源打标签 给节点添加角色: k8s集群,节点如果有多个角色,需要标记出来,可以给对应的节点打上标签,方便后续了解节点的功能 命令:kubectl label nodes 节点名字 node-role.kubernetes.io/你想要的roles(=/-) 最后括号里的加减号,减号就是删除roles,等号就是增加roles 更新标签,在打标签命令后面添加参数

  • Linux集群/分布式环境下session处理的五种策略详解

    前言 我们一般在搭建完集群环境后,不得不考虑的一个问题就是用户访问产生的session如何处理.如果不做任何处理的话,用户将出现频繁登录的现象,比如集群中存在A.B两台服务器,用户在第一次访问网站时,Nginx通过其负载均衡机制将用户请求转发到A服务器,这时A服务器就会给用户创建一个Session.当用户第二次发送请求时,Nginx将其负载均衡到B服务器,而这时候B服务器并不存在Session,所以就会将用户踢到登录页面.这将大大降低用户体验度,导致用户的流失,这种情况是项目绝不应该出现的. 我

  • Centos7 安装部署Kubernetes(k8s)集群实现过程

    目录 一.系统环境 二.前言 三.Kubernetes 3.1 概述 3.2 Kubernetes 组件 3.2.1 控制平面组件 3.2.2 Node组件 四.安装部署Kubernetes集群 4.1 环境介绍 4.2 配置节点的基本环境 4.3 节点安装docker,并进行相关配置 4.4 安装kubelet,kubeadm,kubectl 4.5 kubeadm初始化 4.6 添加worker节点到k8s集群 4.7 部署CNI网络插件calico 4.8 配置kubectl命令tab键自

  • Docker+K8S 集群环境搭建及分布式应用部署

    1.安装docker yum install docker #启动服务 systemctl start docker.service systemctl enable docker.service #测试 docker version 2.安装etcd yum install etcd -y #启动etcd systemctl start etcd systemctl enable etcd #输入如下命令查看 etcd 健康状况 etcdctl -C http://localhost:2379

  • 关于Rancher部署并导入K8S集群的问题

    Rancher 的部署可以有三种架构: 高可用 Kubernetes 安装: 建议使用 Kubernetes 程序包管理器 Helm 在专用的 Kubernetes 集群上安装 Rancher.在 RKE 集群中,需要使用三个节点以实现高可用性.在 K3s 集群(轻量级kubernetes)中,仅需要两个节点即可. 单节点 Kubernetes 安装: 另一个选择是在 Kubernetes 集群上使用 Helm 安装 Rancher,仅在集群中使用单个节点.虽然在这种情况下的 Rancher S

  • Ansible部署K8s集群的方法

    目录 检查网络:k8s-check.yaml检查k8s各主机的网络是否可达; 检查k8s各主机操作系统版本是否达到要求: 配置k8s集群dns解析:k8s-hosts-cfg.yaml 配置yum源:k8s-yum-cfg.yaml 时钟同步:k8s-time-sync.yaml 禁用iptable.firewalld.NetworkManager服务 禁用SElinux.swap:k8s-SE-swap-disable.yaml 修改内核:k8s-kernel-cfg.yaml 配置ipvs:

  • 部署k8s集群的超详细实践步骤

    目录 1.部署k8s的两种方式: 2.环境准备 3.初始化配置 3.1.安装环境准备:下面的操作需要在所有的节点上执行. 3.2.安装 Docker.kubeadm.kubelet[所有节点] 4.部署k8s-master[master执行] 4.1.kubeadm部署(需要等上一会) 4.2.拷贝k8s认证文件 5.配置k8s的node节点[node节点操作] 5.1.向集群添加新节点,执行在kubeadm init输出的kubeadm join命令 6.部署容器网络 (master执行) 7

  • centos7系统部署k8s集群详细介绍

    目录 1 版本.规划 1.1 版本信息: 1.2集群规划 2.部署 1.关闭防火墙 2.关闭selinux 3.关闭swap 4.添加主机名和IP对应关系 5.将桥接的IPV4流量传递给iptables的链 6.安装docker 安装: 7.添加阿里云yum软件源 8.安装kubeadm.kubelet.kubectl 9.初始化master节点 10.安装pod网络插件(CNI) 11.node节点加入集群 1 版本.规划 1.1 版本信息: 名称 版本号 内核 3.10.0-1160.el7

  • MySQL之高可用集群部署及故障切换实现

    一.MHA 1.概念 2.MHA 的组成 3.MHA 的特点 二.搭建MySQL+MHA 思路和准备工作 1.MHA架构 数据库安装 一主两从 MHA搭建 2.故障模拟 模拟主库失效 备选主库成为主库 原故障主库恢复重新加入到MHA成为从库 3.准备4台安装MySQL虚拟机 MHA高可用集群相关软件包 MHAmanager IP:192.168.221.30 MySQL1 IP:192.168.221.20 MySQL2 IP:192.168.221.100 MySQL3 IP: 192.168

  • 以示例讲解Clickhouse Docker集群部署以及配置

    目录 写在前面 环境部署 Zookeeper集群部署 Clickhouse集群部署 1.临时镜像拷贝出配置 2.修改config.xml配置 3.拷贝到其他文件夹 4.分发到其他服务器 配置集群 1.修改配置 2.新增集群配置文件metrika.xml 集群运行及测试 写在前面 抽空来更新一下大数据的玩意儿了,起初架构选型的时候有考虑Hadoop那一套做数仓,但是Hadoop要求的服务器数量有点高,集群至少6台或以上,所以选择了Clickhouse(后面简称CH).CH做集群的话,3台服务器起步

  • Redis中常见的几种集群部署方案

    目录 前言 几种常用的集群方案 主从集群模式 全量同步 增量同步 举个栗子 哨兵机制 什么是哨兵机制 如何保证选主的准确性 如何选主 选举主节点的规则 哨兵进行主节点切换 切片集群 RedisCluster方案 哈希槽重新分配 避免HotKey 如何发现HotKey HotKey如何解决 避免BigKey BigKey存在问题 如何发现BigKey BigKey如何避免 BigKey如何删除 参考 前言 这里来了解一下,Redis 中常见的集群方案 几种常用的集群方案 主从集群模式 哨兵机制 切

随机推荐