报错集锦

2021-07-23 | 阅读：次

1. permission denied

-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       0.26.1
  Build:         git-2de5a893a
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: openresty/1.15.8.2

-------------------------------------------------------------------------------

W0719 06:58:01.543840       6 flags.go:243] SSL certificate chain completion is disabled (--enable-ssl-chain-completion=false)
W0719 06:58:01.544045       6 client_config.go:541] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0719 06:58:01.544341       6 main.go:182] Creating API client for https://10.233.0.1:443
I0719 06:58:01.558257       6 main.go:226] Running in Kubernetes cluster version v1.16 (v1.16.3) - git (clean) commit b3cbbae08ec52a7fc73d334838e18d17e8512749 - platform linux/amd64
F0719 06:58:01.857260       6 ssl.go:389] unexpected error storing fake SSL Cert: could not create PEM certificate file /etc/ingress-controller/ssl/default-fake-certificate.pem: open /etc/ingress-controller/ssl/default-fake-certificate.pem: permission denied

创建 Pod 后，我们可以看到并没有成功运行，出现了 open /prometheus/queries.active: permission denied 这样的错误信息，这是因为我们的 prometheus 的镜像中是使用的 nobody 这个用户，然后现在我们通过 hostPath 挂载到宿主机上面的目录的 ownership 却是 root：


$ ls -la /data/
total 36
drwxr-xr-x   6 root root  4096 Dec 12 11:07 .
dr-xr-xr-x. 19 root root  4096 Nov  9 23:19 ..
drwxr-xr-x   2 root root  4096 Dec 12 11:07 prometheus
所以当然会出现操作权限问题了，这个时候我们就可以通过 securityContext 来为 Pod 设置下 volumes 的权限，通过设置 runAsUser=0 指定运行的用户为 root：


......
securityContext:
  runAsUser: 0
volumes:
- name: data
  hostPath:
    path: /data/prometheus/
- configMap:
    name: prometheus-config
  name: config-volume

参考待验证

CRD spec.versions: Invalid value

原因: CRD yaml 文件中 apiVersion 与 versions 中的版本不对应参考: https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definition-versioning/

删除 namespaces 时 Terminating，无法强制删除且无法在该 ns 下创建对象

原因: ns 处于 terminating 时 hang 住了，使用 –grace-period=0 –force 强制删除也无效解决:

# 导出K8s访问密钥
echo $(kubectl config view --raw -oyaml | grep client-cert  |cut -d ' ' -f 6) |base64 -d > /tmp/client.pem
echo $(kubectl config view --raw -oyaml | grep client-key-data  |cut -d ' ' -f 6 ) |base64 -d > /tmp/client-key.pem
echo $(kubectl config view --raw -oyaml | grep certificate-authority-data  |cut -d ' ' -f 6  ) |base64 -d > /tmp/ca.pem
# 解决namespace Terminating，根据实际情况修改<namespaces>
curl --cert /tmp/client.pem --key /tmp/client-key.pem --cacert /tmp/ca.pem -H "Content-Type: application/json" -X PUT --data-binary @/tmp/temp.json https://xxx.xxx.xxx.xxx:6443/api/v1/namespaces/<namespaces>/finalize

Docker 启动时提示 no sockets found via socket activation

解决: 在启动 Docker 前先执行 systemctl unmask Docker.socket 即可

Prometheus opening storage failed: invalid block sequence

原因: 这个需要排查 Prometheus 持久化目录中是否存在时间超出设置阈值的时间段的文件，删掉后重启即可

Kubelet 提示: The node was low on resource: ephemeral-storage

原因: 节点上 Kubelet 的配置路径超过阈值会触发驱逐，默认情况下阈值是 85%

解决: 或者清理磁盘释放资源，或者通过可修改 Kubelet 的配置参数imagefs.available来提高阈值,然后重启 Kubelet.

参考: https://cloud.tencent.com/developer/article/1456389

kubectl 查看日志时提示: Error from server: Get https://xxx:10250/containerLogs/spring-prod/xxx-0/xxx: dial tcp xxx:10250: i/o timeout

原因: 目地机器的 iptables 对 10250 这个端口进行了 drop，如下图

iptables-save -L INPUT –-line-numbers

解决: 删除对应的规则

iptables -D INPUT 10

Service 解析提示 Temporary failure in name resolution

原因: 出现这种情况很奇怪，现象显示就是域名无法解析，全格式的域名能够解析是因为在 pod 的/etc/hosts 中有全域名的记录,那么问题就出在于 CoreDNS 解析上，CoreDNS 从日志来看，没有任何报错，但是从 pod 的状态来看，虽然处于 Running 状态，但是 0/1 可以看出 CoreDNS 并未处于 ready 状态.

可以查看 ep 记录，会发现 Endpoint 那一栏是空的，这也就证实了 K8s 把 CoreDNS 的状态分为了 notready 状态，所以 ep 才没有记录，经过与其它环境比较后发现跟配置有关，最终定位在 CoreDNS 的配置文件上,在插件上需要加上 ready 解决: 在 cm 的配置上添加 read 插件，如下图

# ... 省略
data:
  Corefile: |
    .:53 {
        errors
        health
        ready  # 加上该行后问题解决
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          upstream /etc/resolv.conf
          fallthrough in-addr.arpa ip6.arpa
        }
       # ... 省略

总结起来就是使用 ready 来表明当前已准备好可以接收请求，从 codedns 的 yaml 文件也可以看到有livenessProbe