k8s1.27.1集群部署问题总结
问题:helm部署ingress-nginx貌似都会出现这个问题
]# kubectl apply -f ingress-test.yaml Error from server (InternalError): error when creating "ingress-test.yaml": Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": failed to call webhook: Post "https://ingress-nginx-controller-admission.ingress-nginx.svc:443/networking/v1/ingresses?timeout=10s": tls: failed to verify certificate: x509: certificate is valid for localhost, kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster, kubernetes.default.svc.cluster.local, *.k8s.local, not ingress-nginx-controller-admission.ingress-nginx.svc
解决方法:
https://github.com/kubernetes/ingress-nginx/issues/5968
就是把这个验证资源删掉:
[root@master01 ~]# kubectl get validatingwebhookconfigurations -A NAME WEBHOOKS AGE ingress-nginx-admission 1 15h ]# kubectl delete validatingwebhookconfigurations ingress-nginx-admission
非helm方式:
https://kubernetes.github.io/ingress-nginx/deploy/#bare-metal-clusters
安装ingress-nginx参考:
cnblogs.com/syushin/p/15271304.html
问题:containerd网段为 10.88.x.x 问题解决方法:cni默认找排序第一个作为通信的插件配置,安装了flannel等网络插件后,由于安装的网络插件的配置文件并没有排在第一个,所以还是用原来的 10-containerd-net.conflist 发个配置文件,需要删除 10-containerd-net.conflist 这个文件。
rm -rf /etc/cni/net.d/10-containerd-net.conflist ifconfig cni0 down && ip link delete cni0 systemctl daemon-reload systemctl restart containerd kubelet
参考地址:
zhuanlan.zhihu.com/p/608369342
创建证书方式参考:
https://kubernetes.io/zh-cn/docs/tasks/administer-cluster/certificates/
问题:cilium网络插件一直处于 Podinitializing。
]# kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-system cilium-7hjx9 0/1 Init:CrashLoopBackOff 4 (31s ago) 4m1s kube-system cilium-n227s 0/1 Init:CrashLoopBackOff 4 (56s ago) 4m1s kube-system cilium-operator-76c55fc6b6-bphrj 1/1 Running 0 4m1s kube-system cilium-operator-76c55fc6b6-z8fwb 1/1 Running 0 4m1s kube-system cilium-tvsqg 0/1 Init:CrashLoopBackOff 4 (42s ago) 4m1s
使用describe查看事件:并没有发现有用的信息。
]# kubectl describe pods -n kube-system cilium-7hjx9 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 4m41s default-scheduler Successfully assigned kube-system/cilium-7hjx9 to master03.k8s.local Normal Pulling 4m38s kubelet Pulling image "quay.io/cilium/cilium:v1.14.0@sha256:5a94b561f4651fcfd85970a50bc78b201cfbd6e2ab1a03848eab25a82832653a" Normal Pulled 2m51s kubelet Successfully pulled image "quay.io/cilium/cilium:v1.14.0@sha256:5a94b561f4651fcfd85970a50bc78b201cfbd6e2ab1a03848eab25a82832653a" in 1m47.561088123s (1m47.561348425s including waiting) Normal Created 2m49s kubelet Created container config Normal Started 2m49s kubelet Started container config Normal Created 2m3s (x4 over 2m49s) kubelet Created container mount-cgroup Normal Started 2m2s (x4 over 2m48s) kubelet Started container mount-cgroup Warning BackOff 84s (x8 over 2m47s) kubelet Back-off restarting failed container mount-cgroup in pod cilium-7hjx9_kube-system(18002368-faf0-4a8a-b331-3ae112f5529d) Normal Pulled 72s (x5 over 2m49s) kubelet Container image "quay.io/cilium/cilium:v1.14.0@sha256:5a94b561f4651fcfd85970a50bc78b201cfbd6e2ab1a03848eab25a82832653a" already present on machine
查看日志:发现报错 Error from server (BadRequest): container "cilium-agent" in pod "cilium-7hjx9" is waiting to start: PodInitializing
]# kubectl logs -f -n kube-system cilium-7hjx9 Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init), install-cni-binaries (init) Error from server (BadRequest): container "cilium-agent" in pod "cilium-7hjx9" is waiting to start: PodInitializing
查看退出的容器:
]# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 5c2c52681a83 cc39628c8f03 "sh -ec 'cp /usr/bin…" 16 seconds ago Exited (1) 15 seconds ago k8s_mount-cgroup_cilium-n227s_kube-system_bdfbdb56-2c53-43e1-9b1b-ca90a6dfdb6b_7 3f4b544f0fe9 quay.io/cilium/operator-generic "cilium-operator-gen…" 11 minutes ago Up 11 minutes k8s_cilium-operator_cilium-operator-76c55fc6b6-bphrj_kube-system_83fa27cc-f506-4744-8da9-936a49a11279_0 81079ece70fb quay.io/cilium/cilium "cilium build-config" 11 minutes ago Exited (0) 11 minutes ago k8s_config_cilium-n227s_kube-system_bdfbdb56-2c53-43e1-9b1b-ca90a6dfdb6b_0 eb005515a26c registry.aliyuncs.com/google_containers/pause:3.9 "/pause" 12 minutes ago Up 12 minutes k8s_POD_cilium-operator-76c55fc6b6-bphrj_kube-system_83fa27cc-f506-4744-8da9-936a49a11279_0 305d454cfdfb registry.aliyuncs.com/google_containers/pause:3.9 "/pause" 12 minutes ago Up 12 minutes k8s_POD_cilium-n227s_kube-system_bdfbdb56-2c53-43e1-9b1b-ca90a6dfdb6b_0
查看已经退出的初始化容器日志:
]# docker logs -f 5c2c52681a83 nsenter: cannot open /hostproc/1/ns/cgroup: No such file or directory
查看部署文件:初始化容器在initContainers字段下,报错信息说 /hostproc/1/ns/cgroup 目录不存在。
]# kubectl edit ds -n kube-system cilium ... - command: - sh - -ec - | cp /usr/bin/cilium-mount /hostbin/cilium-mount; nsenter --cgroup=/hostproc/1/ns/cgroup --mount=/hostproc/1/ns/mnt "${BIN_PATH}/cilium-mount" $CGROUP_ROOT; rm /hostbin/cilium-mount env: - name: CGROUP_ROOT value: /run/cilium/cgroupv2 - name: BIN_PATH value: /opt/cni/bin ... volumeMounts: - mountPath: /hostproc name: hostproc ... - hostPath: path: /proc # 也就是说是宿主机的这个目录中没有 /proc/1/ns/cgroup 这个文件。 type: Directory name: hostproc ...
查看属主机中的 /proc/1/ns/cgroup 文件是否纯在:查看后发现cgroup文件果然不存在。
]# ll /proc/1/ns/ 总用量 0 lrwxrwxrwx 1 root root 0 8月 6 13:58 ipc -> ipc:[4026531839] lrwxrwxrwx 1 root root 0 8月 6 13:58 mnt -> mnt:[4026531840] lrwxrwxrwx 1 root root 0 8月 6 13:58 net -> net:[4026531956] lrwxrwxrwx 1 root root 0 8月 6 13:50 pid -> pid:[4026531836] lrwxrwxrwx 1 root root 0 8月 6 13:58 user -> user:[4026531837] lrwxrwxrwx 1 root root 0 8月 6 13:58 uts -> uts:[4026531838]
解决方法:cgroup是内核中的功能,cgroup文件不存在是内核的问题,由于之前升级内核时没有重启操作系统,所以新内核没有生效,重启操作系统后cgroup文件就出现了。
]# ll /proc/1/ns/ 总用量 0 lrwxrwxrwx 1 root root 0 8月 6 22:48 cgroup -> cgroup:[4026531835] lrwxrwxrwx 1 root root 0 8月 6 22:48 ipc -> ipc:[4026531839] lrwxrwxrwx 1 root root 0 8月 6 22:48 mnt -> mnt:[4026531841] lrwxrwxrwx 1 root root 0 8月 6 22:48 net -> net:[4026531840] lrwxrwxrwx 1 root root 0 8月 6 22:46 pid -> pid:[4026531836] lrwxrwxrwx 1 root root 0 8月 6 22:48 pid_for_children -> pid:[4026531836] lrwxrwxrwx 1 root root 0 8月 6 22:48 time -> time:[4026531834] lrwxrwxrwx 1 root root 0 8月 6 22:48 time_for_children -> time:[4026531834] lrwxrwxrwx 1 root root 0 8月 6 22:48 user -> user:[4026531837] lrwxrwxrwx 1 root root 0 8月 6 22:48 uts -> uts:[4026531838]
如果/proc/1/ns/cgroup文件出现之后cilium一直处于Pending状态,如果不能正常启动重新安装k8s集群即可。
问题:cilium-operator不断重启,报错如下,cilium-operator与kube-apiserver通信超时。
error retrieving resource lock kube-system/cilium-operator-resource-lock: Get "https://172.16.0.1:443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cilium-operator-resource-lock?timeout=5s": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
解决方法:重启kube-apiserver服务,或重启etcd服务。