声明:这是我在大学毕业后进入第二家互联网公司学习的内容
背景 服务器突然断电,启动后发现kibana打不开,一起来看下问题吧
环境 docker-compose deploy kibana:7,17.0 es:7.17.0 cluster启动 现状 先看下容器的情况 1 2 3 4 5 6 $ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES d3235df3a59c docker.elastic.co/kibana/kibana:7.17.0 "/bin/tini -- /usr/l…" 25 seconds ago Up 24 seconds 0.0.0.0:5601->5601/tcp, :::5601->5601/tcp kib01 678b59584943 docker.elastic.co/elasticsearch/elasticsearch:7.17.0 "/bin/tini -- /usr/l…" 56 seconds ago Up 56 seconds 9200/tcp, 9300/tcp es03 ef39022b3839 docker.elastic.co/elasticsearch/elasticsearch:7.17.0 "/bin/tini -- /usr/l…" 56 seconds ago Up 56 seconds (healthy) 0.0.0.0:9200->9200/tcp, :::9200->9200/tcp, 9300/tcp es01 9a7d713f2cf0 docker.elastic.co/elasticsearch/elasticsearch:7.17.0 "/bin/tini -- /usr/l…" 56 seconds ago Up 56 seconds 9200/tcp, 9300/tcp es02
容器启动正常
1 2 3 4 5 6 7 8 9 10 11 12 13 $ curl 127.0.0.1:9200 curl: (52) Empty reply from server $ ss -lnt State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 128 *:22 *:* LISTEN 0 100 127.0.0.1:25 *:* LISTEN 0 128 *:5601 *:* LISTEN 0 128 *:9200 *:* LISTEN 0 128 [::]:22 [::]:* LISTEN 0 100 [::1]:25 [::]:* LISTEN 0 128 [::]:5601 [::]:* LISTEN 0 128 [::]:9200 [::]:*
打开kibana页面失败
查看容器报错 1 2 3 4 5 6 7 8 9 10 docker ps -a|grep es01 {"type": "server", "timestamp": "2022-05-25T13:00:03,064Z", "level": "WARN", "component": "r.suppressed", "cluster.name": "es-docker-cluster", "node.name": "es01", "message": "path: /.kibana_task_manager_7.17.0_001/_pit, params: {index=.kibana_task_manager_7.17.0_001, keep_alive=10m}", "cluster.uuid": "foE5hq99TjetmAUbc74NXA", "node.id": "KA0d6R_BRo6wTAROOT-q0Q" , "stacktrace": ["org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed", .... "Caused by: org.elasticsearch.action.NoShardAvailableActionException: [es03][172.19.0.2:9300][indices:data/read/open_reader_context]", "at org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:544) ~[elasticsearch-7.17.0.jar:7.17.0]", "at org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:491) [elasticsearch-7.17.0.jar:7.17.0]", "... 39 more"] } Caused by: org.elasticsearch.action.search.SearchPhaseExecutionException: Search rejected due to missing shards [[.kibana_task_manager_7.17.0_001][0]]. Consider using `allow_partial_search_results` setting to bypass this error.
解决问题 网上查阅资料发现这是es未正常关闭可能导致索引损坏而出现不可用的情况,需要删除损坏的索引,但是目前由于es开启了https和密码登录,需要先关闭它才能正常访问
关闭https和密码登录 当前的docker-compose.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 version: '2.2' services: es01: image: docker.elastic.co/elasticsearch/elasticsearch:${VERSION} container_name: es01 restart: always environment: - node.name=es01 - cluster.name=es-docker-cluster - discovery.seed_hosts=es02,es03 - cluster.initial_master_nodes=es01,es02,es03 - bootstrap.memory_lock=true - "ES_JAVA_OPTS=-Xms512m -Xmx512m" - xpack.license.self_generated.type=trial - xpack.security.enabled=true - xpack.security.http.ssl.enabled=true - xpack.security.http.ssl.key=$CERTS_DIR/es01/es01.key - xpack.security.http.ssl.certificate_authorities=$CERTS_DIR/ca/ca.crt - xpack.security.http.ssl.certificate=$CERTS_DIR/es01/es01.crt - xpack.security.transport.ssl.enabled=true - xpack.security.transport.ssl.verification_mode=certificate - xpack.security.transport.ssl.certificate_authorities=$CERTS_DIR/ca/ca.crt - xpack.security.transport.ssl.certificate=$CERTS_DIR/es01/es01.crt - xpack.security.transport.ssl.key=$CERTS_DIR/es01/es01.key - xpack.security.authc.token.enabled=true ulimits: memlock: soft: -1 hard: -1 volumes: - ./data01:/usr/share/elasticsearch/data - ./certs:$CERTS_DIR ports: - 9200:9200 networks: - elastic healthcheck: test: curl --cacert $CERTS_DIR/ca/ca.crt -s https://localhost:9200 >/dev/null; if [[ $$? == 52 ]]; then echo 0; else echo 1; fi interval: 30s timeout: 10s retries: 5 es02: image: docker.elastic.co/elasticsearch/elasticsearch:${VERSION} container_name: es02 restart: always environment: - node.name=es02 - cluster.name=es-docker-cluster - discovery.seed_hosts=es01,es03 - cluster.initial_master_nodes=es01,es02,es03 - bootstrap.memory_lock=true - "ES_JAVA_OPTS=-Xms512m -Xmx512m" - xpack.license.self_generated.type=trial - xpack.security.enabled=true - xpack.security.http.ssl.enabled=true - xpack.security.http.ssl.key=$CERTS_DIR/es02/es02.key - xpack.security.http.ssl.certificate_authorities=$CERTS_DIR/ca/ca.crt - xpack.security.http.ssl.certificate=$CERTS_DIR/es02/es02.crt - xpack.security.transport.ssl.enabled=true - xpack.security.transport.ssl.verification_mode=certificate - xpack.security.transport.ssl.certificate_authorities=$CERTS_DIR/ca/ca.crt - xpack.security.transport.ssl.certificate=$CERTS_DIR/es02/es02.crt - xpack.security.transport.ssl.key=$CERTS_DIR/es02/es02.key - xpack.security.authc.token.enabled=true ulimits: memlock: soft: -1 hard: -1 volumes: - ./data02:/usr/share/elasticsearch/data - ./certs:$CERTS_DIR networks: - elastic es03: image: docker.elastic.co/elasticsearch/elasticsearch:${VERSION} container_name: es03 restart: always environment: - node.name=es03 - cluster.name=es-docker-cluster - discovery.seed_hosts=es01,es02 - cluster.initial_master_nodes=es01,es02,es03 - bootstrap.memory_lock=true - "ES_JAVA_OPTS=-Xms512m -Xmx512m" - xpack.license.self_generated.type=trial - xpack.security.enabled=true - xpack.security.http.ssl.enabled=true - xpack.security.http.ssl.key=$CERTS_DIR/es03/es03.key - xpack.security.http.ssl.certificate_authorities=$CERTS_DIR/ca/ca.crt - xpack.security.http.ssl.certificate=$CERTS_DIR/es03/es03.crt - xpack.security.transport.ssl.enabled=true - xpack.security.transport.ssl.verification_mode=certificate - xpack.security.transport.ssl.certificate_authorities=$CERTS_DIR/ca/ca.crt - xpack.security.transport.ssl.certificate=$CERTS_DIR/es03/es03.crt - xpack.security.transport.ssl.key=$CERTS_DIR/es03/es03.key - xpack.security.authc.token.enabled=true ulimits: memlock: soft: -1 hard: -1 volumes: - ./data03:/usr/share/elasticsearch/data - ./certs:$CERTS_DIR networks: - elastic kib01: image: docker.elastic.co/kibana/kibana:${VERSION} container_name: kib01 restart: always depends_on: {"es01": {"condition": "service_healthy"}} ports: - 5601:5601 environment: ELASTICSEARCH_URL: https://es01:9200 ELASTICSEARCH_HOSTS: https://es01:9200 ELASTICSEARCH_USERNAME: kibana_system ELASTICSEARCH_PASSWORD: xxx ELASTICSEARCH_SSL_CERTIFICATEAUTHORITIES: $CERTS_DIR/ca/ca.crt SERVER_SSL_ENABLED: "true" SERVER_SSL_KEY: $CERTS_DIR/kib01/kib01.key SERVER_SSL_CERTIFICATE: $CERTS_DIR/kib01/kib01.crt volumes: - ./certs:$CERTS_DIR - ./kibana.yml:/usr/share/kibana/config/kibana.yml networks: - elastic networks: elastic: driver: bridge
更改部分
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 # es每个节点的xpack部分全部注释掉 # es关闭xpack.security.http.ssl.enabled # 添加xpack.security.http.ssl.enabled=false # kibana部分注释 # ELASTICSEARCH_USERNAME: kibana_system # ELASTICSEARCH_PASSWORD: xxx # ELASTICSEARCH_SSL_CERTIFICATEAUTHORITIES: $CERTS_DIR/ca/ca.crt # SERVER_SSL_ENABLED: "true" # SERVER_SSL_KEY: $CERTS_DIR/kib01/kib01.key # SERVER_SSL_CERTIFICATE: $CERTS_DIR/kib01/kib01.crt # 将https改为http # ELASTICSEARCH_URL: http://es01:9200 # ELASTICSEARCH_HOSTS: http://es01:9200
更改完成后重新启动es集群 访问集群
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 $ curl 127.0.0.1:9200 { "name" : "781c67472218", "cluster_name" : "docker-cluster", "cluster_uuid" : "c-0WI5JYQnyZbUIv2ayzgg", "version" : { "number" : "7.17.0", "build_flavor" : "default", "build_type" : "docker", "build_hash" : "bee86328705acaa9a6daede7140defd4d9ec56bd", "build_date" : "2022-01-28T08:36:04.875279988Z", "build_snapshot" : false, "lucene_version" : "8.11.1", "minimum_wire_compatibility_version" : "6.8.0", "minimum_index_compatibility_version" : "6.0.0-beta1" }, "tagline" : "You Know, for Search" }
检查集群信息
1 2 3 4 5 6 7 8 9 $ curl -XGET 'localhost:9200/_cat/indices?v&pretty' health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open .items-default-000001 tpneVh6gT4-U_Po9gBIGOg 1 1 0 0 452b 226b green open .apm-custom-link eZwivRQSTkedjlGHzLL_JA 1 1 0 0 452b 226b green open .fleet-enrollment-api-keys-7 _PNdBZSBQhilwZf6AlfdYQ 1 1 2 0 13.3kb 6.6kb green open .apm-agent-configuration QDKIP1EnQwexEm1o9EDx6w 1 1 0 0 452b 226b green open .tasks lyPUvnLJQbK7KCjpvEz_mg 1 1 424 0 270.3kb 135.1kb green open .geoip_databases hXPIJS1DStSi5_DdxFNQig 1 1 41 41 82.6mb 41.2mb red open .kibana_task_manager_7.17.0_001 9clc9EnwRwykPnuJTMGRaA 1 1
发现有错误的索引 .kibana_task_manager_7.17.0_001 9clc9EnwRwykPnuJTMGRaA
删除索引
1 curl -XDELETE localhost:9200/.kibana_task_manager_7.17.0_001
然后重启集群,这个时候发现kibana的界面已经打开了
最后把注释的配置都还原,重启启动即可恢复https和安全校验的功能
总结 至此,es集群的恢复已经完成
参考资料 Caused by: org.elasticsearch.action.search.SearchPhaseExecutionException
New index patterns are not created