问题汇总及解决方法

阿里云镜像不支持selinux enforcing

阿里云上的centos 7.6 镜像默认disabled selinux, 如果设成enforcing的话, 会导致重启主机不能登陆. 解决办法是设置成Permissive.

    # cat /etc/selinux/config
      SELINUX=permissive
    # reboot

https服务不能用浏览器访问, 错误 ERR_CONNECTION_RESET

因为阿里云的限制, 不能直接用433端口访问, 临时办法是其中一台节点上安装vnc server, 远程访问

  • 参考安装vnc server 注意centos 7.6的vnc server有bug, 必须同时安装 GNOME才能启动成功.
    # yum groupinstall 'GNOME Desktop'
    # systemctl start vncserver@:1
    # iptables -I INPUT -p tcp -m state --state NEW -m tcp --dport 5901 -j ACCEPT
    # 修改阿里云网络规则开放tcp 5901端口

Spring Metrics不能构建成功

按照官档的方式 集成prometheus, mvn build 构建总不能成功. 错误信息如下.

    2018-07-26 16:06:12.312 ERROR 7582 --- [           main] o.s.boot.SpringApplication               : Application run failed
    org.springframework.beans.factory.BeanDefinitionStoreException: Failed to process import candidates for configuration class [com.example.demo.DemoApplication]; nested exception is java.lang.IllegalStateException: Failed to introspect annotated methods on class io.prometheus.client.spring.boot.PrometheusEndpointConfiguration

原来这是一个bug, 这功能根本不能用. 相关issue

    [SpringBoot2] Cannot get SpringBoot 2 to work with Prometheus #405
    https://github.com/prometheus/client_java/issues/405

POD创建一直处于pending状态, cni-server报 connection refuse错误

POD一直创建不成功, 有以下错误信息

    Warning  FailedCreatePodSandBox  20m     kubelet, node01-inner  Failed create pod sandbox: rpc error:
    code = Unknown desc = [failed to set up sandbox container "d422c351dbd77432a6204db362f82c0a4009eeb230987e1ad2b3fbca2f27c476"
    network for pod "logging-es-data-master-15qapabn-2-mhdq8": NetworkPlugin cni failed to set up pod
    "logging-es-data-master-15qapabn-2-mhdq8_openshift-logging" network: failed to send CNI request: Post http://dummy/:
    dial unix /var/run/openshift-sdn/cni-server.sock: connect: connection refused, failed to clean up sandbox container
    "d422c351dbd77432a6204db362f82c0a4009eeb230987e1ad2b3fbca2f27c476" network for pod "logging-es-data-master-15qapabn-2-mhdq8":
    NetworkPlugin cni failed to teardown pod "logging-es-data-master-15qapabn-2-mhdq8_openshift-logging" network:
    failed to send CNI request: Post http://dummy/: dial unix /var/run/openshift-sdn/cni-server.sock: connect: connection refused]

尝试手工连接 cni-server socket, 同样返回 connection refuse

    # CNI_COMMAND=ADD curl --unix-socket /var/run/openshift-sdn/cni-server.sock -X POST http://dummy/
    curl: (7) Failed to connect to /var/run/openshift-sdn/cni-server.sock: Connection refused

cni-server 容器日志显示, 对ovs健康检查不过

    SDN healthcheck detected unhealthy OVS server, restarting

检查代码 openshift-origin/pkg/network/node/healthcheck.go. 发现如果健康检查不过, 就卡死在这里不动了,外包的一层是utilwait.NeverStop的loop, 所以也不退出. 从TODO信息看也证明 没有开发进程内重启机制. 只能是手工重启.

    if err != nil {
        // If OVS restarts and our health check fails, we exit
        // TODO: make openshift-sdn able to reconcile without a restart
        glog.Fatalf("SDN healthcheck detected unhealthy OVS server, restarting: %v", err)
    }

最后,解决办法是重启SDN POD, 恢复正常

    # oc delete pod sdn-5qtv4 -n openshift-sdn