k8s部署持久化prometheus
声明:这是我在大学毕业后进入第二家互联网公司学习的内容
准备工作
Kubernetes 1.16+
Helm 3+
nfs
安装nfs
准备持久化redis的磁盘
挑选一台服务器做nfs
yum install -y nfs-utils
mkdir /data/prom
vim /etc/exports
/data/prom *(rw,no_root_squash)
systemctl enable nfs
systemctl start nfs
查看是否启动成功
showmount -e
自定义配置文件
.
├── prometheus.yaml
└── pv.yaml
prometheus.yaml
暴露nodeport,因为我是自建了grafana,所以需要把k8s-prom接入,如果grafana是直接跟prom一起装到k8s里的可以不用暴露nodeport
retention默认15天,我觉得不够长,调成1年
persistentVolume是挂载的pv信息
这个配置文件是权限问题,后面会讲到
securityContext:
runAsUser: 0
runAsNonRoot: false
runAsGroup: 0
fsGroup: 0
server:
service:
nodePort: 30003
type: NodePort
retention: "365d"
persistentVolume:
storageClass: "prometheus-server"
securityContext:
runAsUser: 0
runAsNonRoot: false
runAsGroup: 0
fsGroup: 0
alertmanager:
persistentVolume:
storageClass: "prometheus-alertmanager"
pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: prometheus-alertmanager
spec:
capacity:
storage: 2Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: prometheus-alertmanager
nfs:
path: /data/prom/alertmanager
server: ${ip}
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: prometheus-server
spec:
capacity:
storage: 100Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: prometheus-server
nfs:
path: /data/prom/prometheus-server
server: ${ip}
安装
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
kubectl apply -f pv.yaml
helm install prometheus -f prometheus.yaml prometheus-community/prometheus
更新
helm upgrade -i prometheus prometheus-community/prometheus -f prometheus.yaml --namespace prometheus
报错
node-exporter启动失败
原因是9100端口被原服务器的docker的node-exporter占用了
解决办法关闭原docker的node-exporter或者改yaml文件node-exporter的端口
我是直接关闭docker的node-exporter
prom-server启动失败
默认prom-server的persistentVolume是true,所以如果不提前准备好磁盘是启动不了的,除非关闭持久化
解决办法,提前配置好persistentVolume并先部署
权限不够
挂载磁盘后pod仍然启动失败
容器
prometheus-server CrashLoopBackOff
查看prometheus-server日志发现就一句话
prometheus-server Watching directory: "/etc/config" failed
参考helm文档解决的
更改
securityContext:
fsGroup: 65534
runAsGroup: 65534
runAsNonRoot: true
runAsUser: 65534
为
securityContext:
runAsUser: 0
runAsNonRoot: false
runAsGroup: 0
fsGroup: 0
参考文档
-------------有过牵挂了无牵挂-------------
欢迎关注微信公众号【打工这件小事】~
本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自 宇神之息!
评论