声明:这是我在大学毕业后进入第二家互联网公司学习的内容


背景

服务器突然断电,启动后发现kibana打不开,一起来看下问题吧

环境

  • docker-compose deploy
  • kibana:7,17.0
  • es:7.17.0
  • cluster启动

现状

先看下容器的情况

1
2
3
4
5
6
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d3235df3a59c docker.elastic.co/kibana/kibana:7.17.0 "/bin/tini -- /usr/l…" 25 seconds ago Up 24 seconds 0.0.0.0:5601->5601/tcp, :::5601->5601/tcp kib01
678b59584943 docker.elastic.co/elasticsearch/elasticsearch:7.17.0 "/bin/tini -- /usr/l…" 56 seconds ago Up 56 seconds 9200/tcp, 9300/tcp es03
ef39022b3839 docker.elastic.co/elasticsearch/elasticsearch:7.17.0 "/bin/tini -- /usr/l…" 56 seconds ago Up 56 seconds (healthy) 0.0.0.0:9200->9200/tcp, :::9200->9200/tcp, 9300/tcp es01
9a7d713f2cf0 docker.elastic.co/elasticsearch/elasticsearch:7.17.0 "/bin/tini -- /usr/l…" 56 seconds ago Up 56 seconds 9200/tcp, 9300/tcp es02

容器启动正常

1
2
3
4
5
6
7
8
9
10
11
12
13
$ curl 127.0.0.1:9200
curl: (52) Empty reply from server

$ ss -lnt
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 *:22 *:*
LISTEN 0 100 127.0.0.1:25 *:*
LISTEN 0 128 *:5601 *:*
LISTEN 0 128 *:9200 *:*
LISTEN 0 128 [::]:22 [::]:*
LISTEN 0 100 [::1]:25 [::]:*
LISTEN 0 128 [::]:5601 [::]:*
LISTEN 0 128 [::]:9200 [::]:*

打开kibana页面失败

查看容器报错

1
2
3
4
5
6
7
8
9
10
docker ps -a|grep es01
{"type": "server", "timestamp": "2022-05-25T13:00:03,064Z", "level": "WARN", "component": "r.suppressed", "cluster.name": "es-docker-cluster", "node.name": "es01", "message": "path: /.kibana_task_manager_7.17.0_001/_pit, params: {index=.kibana_task_manager_7.17.0_001, keep_alive=10m}", "cluster.uuid": "foE5hq99TjetmAUbc74NXA", "node.id": "KA0d6R_BRo6wTAROOT-q0Q" ,
"stacktrace": ["org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed",
....
"Caused by: org.elasticsearch.action.NoShardAvailableActionException: [es03][172.19.0.2:9300][indices:data/read/open_reader_context]",
"at org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:544) ~[elasticsearch-7.17.0.jar:7.17.0]",
"at org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:491) [elasticsearch-7.17.0.jar:7.17.0]",
"... 39 more"] }

Caused by: org.elasticsearch.action.search.SearchPhaseExecutionException: Search rejected due to missing shards [[.kibana_task_manager_7.17.0_001][0]]. Consider using `allow_partial_search_results` setting to bypass this error.

解决问题

网上查阅资料发现这是es未正常关闭可能导致索引损坏而出现不可用的情况,需要删除损坏的索引,但是目前由于es开启了https和密码登录,需要先关闭它才能正常访问

关闭https和密码登录

当前的docker-compose.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
version: '2.2'

services:
es01:
image: docker.elastic.co/elasticsearch/elasticsearch:${VERSION}
container_name: es01
restart: always
environment:
- node.name=es01
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es02,es03
- cluster.initial_master_nodes=es01,es02,es03
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- xpack.license.self_generated.type=trial
- xpack.security.enabled=true
- xpack.security.http.ssl.enabled=true
- xpack.security.http.ssl.key=$CERTS_DIR/es01/es01.key
- xpack.security.http.ssl.certificate_authorities=$CERTS_DIR/ca/ca.crt
- xpack.security.http.ssl.certificate=$CERTS_DIR/es01/es01.crt
- xpack.security.transport.ssl.enabled=true
- xpack.security.transport.ssl.verification_mode=certificate
- xpack.security.transport.ssl.certificate_authorities=$CERTS_DIR/ca/ca.crt
- xpack.security.transport.ssl.certificate=$CERTS_DIR/es01/es01.crt
- xpack.security.transport.ssl.key=$CERTS_DIR/es01/es01.key
- xpack.security.authc.token.enabled=true
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- ./data01:/usr/share/elasticsearch/data
- ./certs:$CERTS_DIR
ports:
- 9200:9200
networks:
- elastic

healthcheck:
test: curl --cacert $CERTS_DIR/ca/ca.crt -s https://localhost:9200 >/dev/null; if [[ $$? == 52 ]]; then echo 0; else echo 1; fi
interval: 30s
timeout: 10s
retries: 5

es02:
image: docker.elastic.co/elasticsearch/elasticsearch:${VERSION}
container_name: es02
restart: always
environment:
- node.name=es02
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es01,es03
- cluster.initial_master_nodes=es01,es02,es03
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- xpack.license.self_generated.type=trial
- xpack.security.enabled=true
- xpack.security.http.ssl.enabled=true
- xpack.security.http.ssl.key=$CERTS_DIR/es02/es02.key
- xpack.security.http.ssl.certificate_authorities=$CERTS_DIR/ca/ca.crt
- xpack.security.http.ssl.certificate=$CERTS_DIR/es02/es02.crt
- xpack.security.transport.ssl.enabled=true
- xpack.security.transport.ssl.verification_mode=certificate
- xpack.security.transport.ssl.certificate_authorities=$CERTS_DIR/ca/ca.crt
- xpack.security.transport.ssl.certificate=$CERTS_DIR/es02/es02.crt
- xpack.security.transport.ssl.key=$CERTS_DIR/es02/es02.key
- xpack.security.authc.token.enabled=true
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- ./data02:/usr/share/elasticsearch/data
- ./certs:$CERTS_DIR
networks:
- elastic

es03:
image: docker.elastic.co/elasticsearch/elasticsearch:${VERSION}
container_name: es03
restart: always
environment:
- node.name=es03
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es01,es02
- cluster.initial_master_nodes=es01,es02,es03
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- xpack.license.self_generated.type=trial
- xpack.security.enabled=true
- xpack.security.http.ssl.enabled=true
- xpack.security.http.ssl.key=$CERTS_DIR/es03/es03.key
- xpack.security.http.ssl.certificate_authorities=$CERTS_DIR/ca/ca.crt
- xpack.security.http.ssl.certificate=$CERTS_DIR/es03/es03.crt
- xpack.security.transport.ssl.enabled=true
- xpack.security.transport.ssl.verification_mode=certificate
- xpack.security.transport.ssl.certificate_authorities=$CERTS_DIR/ca/ca.crt
- xpack.security.transport.ssl.certificate=$CERTS_DIR/es03/es03.crt
- xpack.security.transport.ssl.key=$CERTS_DIR/es03/es03.key
- xpack.security.authc.token.enabled=true
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- ./data03:/usr/share/elasticsearch/data
- ./certs:$CERTS_DIR
networks:
- elastic
kib01:
image: docker.elastic.co/kibana/kibana:${VERSION}
container_name: kib01
restart: always
depends_on: {"es01": {"condition": "service_healthy"}}
ports:
- 5601:5601
environment:
ELASTICSEARCH_URL: https://es01:9200
ELASTICSEARCH_HOSTS: https://es01:9200
ELASTICSEARCH_USERNAME: kibana_system
ELASTICSEARCH_PASSWORD: xxx
ELASTICSEARCH_SSL_CERTIFICATEAUTHORITIES: $CERTS_DIR/ca/ca.crt
SERVER_SSL_ENABLED: "true"
SERVER_SSL_KEY: $CERTS_DIR/kib01/kib01.key
SERVER_SSL_CERTIFICATE: $CERTS_DIR/kib01/kib01.crt
volumes:
- ./certs:$CERTS_DIR
- ./kibana.yml:/usr/share/kibana/config/kibana.yml
networks:
- elastic

networks:
elastic:
driver: bridge

更改部分

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# es每个节点的xpack部分全部注释掉

# es关闭xpack.security.http.ssl.enabled

# 添加xpack.security.http.ssl.enabled=false

# kibana部分注释
# ELASTICSEARCH_USERNAME: kibana_system
# ELASTICSEARCH_PASSWORD: xxx
# ELASTICSEARCH_SSL_CERTIFICATEAUTHORITIES: $CERTS_DIR/ca/ca.crt
# SERVER_SSL_ENABLED: "true"
# SERVER_SSL_KEY: $CERTS_DIR/kib01/kib01.key
# SERVER_SSL_CERTIFICATE: $CERTS_DIR/kib01/kib01.crt

# 将https改为http
# ELASTICSEARCH_URL: http://es01:9200
# ELASTICSEARCH_HOSTS: http://es01:9200

更改完成后重新启动es集群

1
$ docker-compose up -d

访问集群

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
$ curl 127.0.0.1:9200
{
"name" : "781c67472218",
"cluster_name" : "docker-cluster",
"cluster_uuid" : "c-0WI5JYQnyZbUIv2ayzgg",
"version" : {
"number" : "7.17.0",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "bee86328705acaa9a6daede7140defd4d9ec56bd",
"build_date" : "2022-01-28T08:36:04.875279988Z",
"build_snapshot" : false,
"lucene_version" : "8.11.1",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}

检查集群信息

1
2
3
4
5
6
7
8
9
$ curl -XGET 'localhost:9200/_cat/indices?v&pretty'
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .items-default-000001 tpneVh6gT4-U_Po9gBIGOg 1 1 0 0 452b 226b
green open .apm-custom-link eZwivRQSTkedjlGHzLL_JA 1 1 0 0 452b 226b
green open .fleet-enrollment-api-keys-7 _PNdBZSBQhilwZf6AlfdYQ 1 1 2 0 13.3kb 6.6kb
green open .apm-agent-configuration QDKIP1EnQwexEm1o9EDx6w 1 1 0 0 452b 226b
green open .tasks lyPUvnLJQbK7KCjpvEz_mg 1 1 424 0 270.3kb 135.1kb
green open .geoip_databases hXPIJS1DStSi5_DdxFNQig 1 1 41 41 82.6mb 41.2mb
red open .kibana_task_manager_7.17.0_001 9clc9EnwRwykPnuJTMGRaA 1 1

发现有错误的索引 .kibana_task_manager_7.17.0_001 9clc9EnwRwykPnuJTMGRaA

删除索引

1
curl -XDELETE localhost:9200/.kibana_task_manager_7.17.0_001

然后重启集群,这个时候发现kibana的界面已经打开了

最后把注释的配置都还原,重启启动即可恢复https和安全校验的功能

总结

至此,es集群的恢复已经完成

参考资料

Caused by: org.elasticsearch.action.search.SearchPhaseExecutionException

New index patterns are not created