mirror of
https://github.com/ben1234560/k8s_PaaS.git
synced 2025-04-22 06:35:52 +08:00
3099 lines
82 KiB
Bash
3099 lines
82 KiB
Bash
## 第七章——Promtheus监控k8s企业家应用
|
||
|
||
##### 前言:
|
||
|
||
> 配置是独立于程序的可配变量,同一份程序在不同配置下会有不同的行为。
|
||
|
||
##### 云原生(Cloud Native)程序的特点:
|
||
|
||
> - 程序的配置,通过设置环境变了传递到容器内部
|
||
> - 程序的配置,通过程序启动参数配置生效
|
||
> - 程序的配置,通过集中在配置中心进行统一换了(CRUD)
|
||
|
||
##### Devops工程师应该做什么?
|
||
|
||
> - 容器化公司自研的应用程序(通过Docker进行二次封装)
|
||
> - 推动容器化应用,转变为云原生应用(一次构建,到处使用)
|
||
> - 使用容器编排框架(kubernetes),合理、规范、专业的编排业务容器
|
||
|
||
|
||
|
||
### Prometheus监控软件概述
|
||
|
||
> 开源监控告警解决方案,[推荐文章](https://www.jianshu.com/p/60a50539243a)
|
||
>
|
||
> 当然一时半会你可能没那么快的去理解,那就跟我们先做下去你就会慢慢理解什么是时间序列数据
|
||
>
|
||
> [https://github.com/prometheus](https://github.com/prometheus)
|
||
>
|
||
> [https://prometheus.io](https://prometheus.io)
|
||
|
||
#### Prometheus的特点:
|
||
|
||
- 多维数据模型:由度量名称和键值对标识的时间序列数据
|
||
- 内置时间序列数据库:TSDB
|
||
- promQL:一种灵活的查询语言,可以利用多维数据完成复杂查询
|
||
- 基于HTTP的pull(拉取)方式采集时间序列数据
|
||
- 同时支持PushGateway组件收集数据
|
||
- 通过服务发现或静态配置发现目标
|
||
- 支持作为数据源接入Grafana
|
||
|
||
##### 我们将使用的架构图
|
||
|
||

|
||
|
||
> **Prometheus Server:**服务核心组件,通过pull metrics从 Exporter 拉取和存储监控数据,并提供一套灵活的查询语言(PromQL)。
|
||
>
|
||
> **pushgateway:**类似一个中转站,Prometheus的server端只会使用pull方式拉取数据,但是某些节点因为某些原因只能使用push方式推送数据,那么它就是用来接收push而来的数据并暴露给Prometheus的server拉取的中转站,这里我们不做它。
|
||
>
|
||
> **Exporters/Jobs:**负责收集目标对象(host, container…)的性能数据,并通过 HTTP 接口供 Prometheus Server 获取。
|
||
>
|
||
> **Service Discovery:**服务发现,Prometheus支持多种服务发现机制:文件,DNS,Consul,Kubernetes,OpenStack,EC2等等。基于服务发现的过程并不复杂,通过第三方提供的接口,Prometheus查询到需要监控的Target列表,然后轮训这些Target获取监控数据。
|
||
>
|
||
> **Alertmanager:**从 Prometheus server 端接收到 alerts 后,会进行去除重复数据,分组,并路由到对方的接受方式,发出报警。常见的接收方式有:电子邮件,pagerduty 等。
|
||
>
|
||
> **UI页面的三种方法:**
|
||
>
|
||
> - Prometheus web UI:自带的(不怎么好用)
|
||
> - Grafana:美观、强大的可视化监控指标展示工具
|
||
> - API clients:自己开发的监控展示工具
|
||
>
|
||
> **工作流程:**Prometheus Server定期从配置好的Exporters/Jobs中拉metrics,或者来着pushgateway发过来的metrics,或者其它的metrics,收集完后运行定义好的alert.rules(这个文件后面会讲到),记录时间序列或者向Alertmanager推送警报
|
||
|
||
##### 和zabbixc对比
|
||
|
||
| Prometheus | Zabbix |
|
||
| ------------------------------------------------------------ | ------------------------------------------------------------ |
|
||
| 后端用golang开发,K8S也是go开发 | 后端用C开发,界面用PHP开发 |
|
||
| 更适合云环境的监控,尤其是对K8S有着更好的支持 | 更适合监控物理机,虚拟机环境 |
|
||
| 监控数据存储在基于时间序列的数据库内,便于对已有数据进行新的聚合 | 监控数据存储在关系型数据库内,如MySQL,很难从现有数据中扩展维度 |
|
||
| 自身界面相对较弱,很多配置需要修改配置文件,但可以借由Grafana出图 | 图形化界面相对比较成熟 |
|
||
| 支持更大的集群规模,速度也更快 | 集群规模上线为10000个节点 |
|
||
| 2015年后开始快速发展,社区活跃,使用场景越来越多 | 发展实际更长,对于很多监控场景,都有现成的解决方案 |
|
||
|
||
由于资源问题,我已经把不用的服务关掉了
|
||
|
||

|
||
|
||

|
||
|
||

|
||
|
||
|
||
|
||
### 交付kube-state-metric
|
||
|
||
> **WHAT:**为prometheus采集k8s资源数据的exporter,能够采集绝大多数k8s内置资源的相关数据,例如pod、deploy、service等等。同时它也提供自己的数据,主要是资源采集个数和采集发生的异常次数统计
|
||
|
||
https://quay.io/repository/coreos/kube-state-metrics?tab=tags
|
||
|
||
~~~~
|
||
# 200机器,下载包:
|
||
~]# docker pull quay.io/coreos/kube-state-metrics:v1.5.0
|
||
~]# docker images|grep kube-state
|
||
~]# docker tag 91599517197a harbor.od.com/public/kube-state-metrics:v1.5.0
|
||
~]# docker push harbor.od.com/public/kube-state-metrics:v1.5.0
|
||
~~~~
|
||
|
||

|
||
|
||
~~~~shell
|
||
# 200机器,准备资源配置清单:
|
||
~]# mkdir /data/k8s-yaml/kube-state-metrics
|
||
~]# cd /data/k8s-yaml/kube-state-metrics
|
||
kube-state-metrics]# vi rbac.yaml
|
||
apiVersion: v1
|
||
kind: ServiceAccount
|
||
metadata:
|
||
labels:
|
||
addonmanager.kubernetes.io/mode: Reconcile
|
||
kubernetes.io/cluster-service: "true"
|
||
name: kube-state-metrics
|
||
namespace: kube-system
|
||
---
|
||
apiVersion: rbac.authorization.k8s.io/v1
|
||
kind: ClusterRole
|
||
metadata:
|
||
labels:
|
||
addonmanager.kubernetes.io/mode: Reconcile
|
||
kubernetes.io/cluster-service: "true"
|
||
name: kube-state-metrics
|
||
rules:
|
||
- apiGroups:
|
||
- ""
|
||
resources:
|
||
- configmaps
|
||
- secrets
|
||
- nodes
|
||
- pods
|
||
- services
|
||
- resourcequotas
|
||
- replicationcontrollers
|
||
- limitranges
|
||
- persistentvolumeclaims
|
||
- persistentvolumes
|
||
- namespaces
|
||
- endpoints
|
||
verbs:
|
||
- list
|
||
- watch
|
||
- apiGroups:
|
||
- policy
|
||
resources:
|
||
- poddisruptionbudgets
|
||
verbs:
|
||
- list
|
||
- watch
|
||
- apiGroups:
|
||
- extensions
|
||
resources:
|
||
- daemonsets
|
||
- deployments
|
||
- replicasets
|
||
verbs:
|
||
- list
|
||
- watch
|
||
- apiGroups:
|
||
- apps
|
||
resources:
|
||
- statefulsets
|
||
verbs:
|
||
- list
|
||
- watch
|
||
- apiGroups:
|
||
- batch
|
||
resources:
|
||
- cronjobs
|
||
- jobs
|
||
verbs:
|
||
- list
|
||
- watch
|
||
- apiGroups:
|
||
- autoscaling
|
||
resources:
|
||
- horizontalpodautoscalers
|
||
verbs:
|
||
- list
|
||
- watch
|
||
---
|
||
apiVersion: rbac.authorization.k8s.io/v1
|
||
kind: ClusterRoleBinding
|
||
metadata:
|
||
labels:
|
||
addonmanager.kubernetes.io/mode: Reconcile
|
||
kubernetes.io/cluster-service: "true"
|
||
name: kube-state-metrics
|
||
roleRef:
|
||
apiGroup: rbac.authorization.k8s.io
|
||
kind: ClusterRole
|
||
name: kube-state-metrics
|
||
subjects:
|
||
- kind: ServiceAccount
|
||
name: kube-state-metrics
|
||
namespace: kube-system
|
||
|
||
kube-state-metrics]# vi dp.yaml
|
||
apiVersion: extensions/v1beta1
|
||
kind: Deployment
|
||
metadata:
|
||
annotations:
|
||
deployment.kubernetes.io/revision: "2"
|
||
labels:
|
||
grafanak8sapp: "true"
|
||
app: kube-state-metrics
|
||
name: kube-state-metrics
|
||
namespace: kube-system
|
||
spec:
|
||
selector:
|
||
matchLabels:
|
||
grafanak8sapp: "true"
|
||
app: kube-state-metrics
|
||
strategy:
|
||
rollingUpdate:
|
||
maxSurge: 25%
|
||
maxUnavailable: 25%
|
||
type: RollingUpdate
|
||
template:
|
||
metadata:
|
||
labels:
|
||
grafanak8sapp: "true"
|
||
app: kube-state-metrics
|
||
spec:
|
||
containers:
|
||
- name: kube-state-metrics
|
||
image: harbor.od.com/public/kube-state-metrics:v1.5.0
|
||
imagePullPolicy: IfNotPresent
|
||
ports:
|
||
- containerPort: 8080
|
||
name: http-metrics
|
||
protocol: TCP
|
||
readinessProbe:
|
||
failureThreshold: 3
|
||
httpGet:
|
||
path: /healthz
|
||
port: 8080
|
||
scheme: HTTP
|
||
initialDelaySeconds: 5
|
||
periodSeconds: 10
|
||
successThreshold: 1
|
||
timeoutSeconds: 5
|
||
serviceAccountName: kube-state-metrics
|
||
~~~~
|
||
|
||

|
||
|
||
~~~~
|
||
# 应用清单,22机器:
|
||
~]# kubectl apply -f http://k8s-yaml.od.com/kube-state-metrics/rbac.yaml
|
||
~]# kubectl apply -f http://k8s-yaml.od.com/kube-state-metrics/dp.yaml
|
||
# 查询kube-metrics是否正常启动,curl哪个是在dashboard里看到的
|
||
~]# curl 172.7.21.8:8080/healthz
|
||
# out:ok
|
||
# 该命令是查看取出来的信息
|
||
~]# curl 172.7.21.8:8080/metric
|
||
~~~~
|
||
|
||

|
||
|
||

|
||
|
||
完成
|
||
|
||
|
||
|
||
### 交付node-exporter
|
||
|
||
> **WHAT:** 用来监控运算节点上的宿主机的资源信息,需要部署到所有运算节点
|
||
|
||
[node-exporter官方dockerhub地址](https://hub.docker.com/r/prom/node-exporter)
|
||
|
||
~~~
|
||
# 200机器,下载镜像并准备资源配置清单:
|
||
~]# docker pull prom/node-exporter:v0.15.0
|
||
~]# docker images|grep node-exporter
|
||
~]# docker tag 12d51ffa2b22 harbor.od.com/public/node-exporter:v0.15.0
|
||
~]# docker push harbor.od.com/public/node-exporter:v0.15.0
|
||
~]# mkdir /data/k8s-yaml/node-exporter/
|
||
~]# cd /data/k8s-yaml/node-exporter/
|
||
node-exporter]# vi ds.yaml
|
||
kind: DaemonSet
|
||
apiVersion: extensions/v1beta1
|
||
metadata:
|
||
name: node-exporter
|
||
namespace: kube-system
|
||
labels:
|
||
daemon: "node-exporter"
|
||
grafanak8sapp: "true"
|
||
spec:
|
||
selector:
|
||
matchLabels:
|
||
daemon: "node-exporter"
|
||
grafanak8sapp: "true"
|
||
template:
|
||
metadata:
|
||
name: node-exporter
|
||
labels:
|
||
daemon: "node-exporter"
|
||
grafanak8sapp: "true"
|
||
spec:
|
||
volumes:
|
||
- name: proc
|
||
hostPath:
|
||
path: /proc
|
||
type: ""
|
||
- name: sys
|
||
hostPath:
|
||
path: /sys
|
||
type: ""
|
||
containers:
|
||
- name: node-exporter
|
||
image: harbor.od.com/public/node-exporter:v0.15.0
|
||
imagePullPolicy: IfNotPresent
|
||
args:
|
||
- --path.procfs=/host_proc
|
||
- --path.sysfs=/host_sys
|
||
ports:
|
||
- name: node-exporter
|
||
hostPort: 9100
|
||
containerPort: 9100
|
||
protocol: TCP
|
||
volumeMounts:
|
||
- name: sys
|
||
readOnly: true
|
||
mountPath: /host_sys
|
||
- name: proc
|
||
readOnly: true
|
||
mountPath: /host_proc
|
||
hostNetwork: true
|
||
~~~
|
||
|
||
|
||
|
||
~~~
|
||
# 22机器,应用:
|
||
# 先看一下宿主机有没有9100端口,发现什么都没有
|
||
~]# netstat -luntp|grep 9100
|
||
~]# kubectl apply -f http://k8s-yaml.od.com/node-exporter/ds.yaml
|
||
# 创建完再看端口,可能启动的慢些,我是刷了3次才有
|
||
~]# netstat -luntp|grep 9100
|
||
~]# curl localhost:9100
|
||
# 该命令是查看取出来的信息
|
||
~]# curl localhost:9100/metrics
|
||
~~~
|
||
|
||

|
||
|
||

|
||
|
||
完成
|
||
|
||
|
||
|
||
### 交付cadvisor
|
||
|
||
> **WHAT:** 用来监控容器内部使用资源的信息
|
||
|
||
[cadvisor官方dockerhub镜像](https://hub.docker.com/r/google/cadvisor/tags)
|
||
|
||
~~~
|
||
# 200机器,下载镜像:
|
||
~]# docker pull google/cadvisor:v0.28.3
|
||
~]# docker images|grep cadvisor
|
||
~]# docker tag 75f88e3ec33 harbor.od.com/public/cadvisor:v0.28.3
|
||
~]# docker push harbor.od.com/public/cadvisor:v0.28.3
|
||
~]# mkdir /data/k8s-yaml/cadvisor/
|
||
~]# cd /data/k8s-yaml/cadvisor/
|
||
cadvisor]# vi ds.yaml
|
||
apiVersion: apps/v1
|
||
kind: DaemonSet
|
||
metadata:
|
||
name: cadvisor
|
||
namespace: kube-system
|
||
labels:
|
||
app: cadvisor
|
||
spec:
|
||
selector:
|
||
matchLabels:
|
||
name: cadvisor
|
||
template:
|
||
metadata:
|
||
labels:
|
||
name: cadvisor
|
||
spec:
|
||
hostNetwork: true
|
||
tolerations:
|
||
- key: node-role.kubernetes.io/master
|
||
effect: NoExecute
|
||
containers:
|
||
- name: cadvisor
|
||
image: harbor.od.com/public/cadvisor:v0.28.3
|
||
imagePullPolicy: IfNotPresent
|
||
volumeMounts:
|
||
- name: rootfs
|
||
mountPath: /rootfs
|
||
readOnly: true
|
||
- name: var-run
|
||
mountPath: /var/run
|
||
- name: sys
|
||
mountPath: /sys
|
||
readOnly: true
|
||
- name: docker
|
||
mountPath: /var/lib/docker
|
||
readOnly: true
|
||
ports:
|
||
- name: http
|
||
containerPort: 4194
|
||
protocol: TCP
|
||
readinessProbe:
|
||
tcpSocket:
|
||
port: 4194
|
||
initialDelaySeconds: 5
|
||
periodSeconds: 10
|
||
args:
|
||
- --housekeeping_interval=10s
|
||
- --port=4194
|
||
terminationGracePeriodSeconds: 30
|
||
volumes:
|
||
- name: rootfs
|
||
hostPath:
|
||
path: /
|
||
- name: var-run
|
||
hostPath:
|
||
path: /var/run
|
||
- name: sys
|
||
hostPath:
|
||
path: /sys
|
||
- name: docker
|
||
hostPath:
|
||
path: /data/docker
|
||
~~~
|
||
|
||

|
||
|
||
> 此时我们看到大多数节点都运行在21机器上,我们人为的让pod调度到22机器(当然即使你的大多数节点都运行在22机器上也没关系)
|
||
|
||
[^tolerations]: 可人为影响调度策略的方法。为什么需要它:kube-schedule是主控节点的策略,有预选节点和优选节点的策略,但往往生活中调度策略可能不是我们想要的。
|
||
[^tolerations-key]: 是否调度的是某节点,污点可以有多个
|
||
[^tolerations-effect-NoExecute]: 容忍NoExecute,其它的不容忍(如:NoSchedule等),即如过节点上的污点不是NoExecute,就不调度到该节点上,如果是,就可以调度。反之,如果是NoSchedule,那么节点上的污点如果是NoSchedule则可以容器,如果不是,则不可以。
|
||
|
||
> 可人为影响K8S调度策略的三种方法:
|
||
>
|
||
> - 污点、容忍方法:
|
||
> - 污点:运算节点node上的污点(先在运算节点上打标签等 kubectl taint nodes node1 key1=value1:NoSchedule),污点可以有多个
|
||
> - 容忍度:pod是否能够容忍污点
|
||
> - 参考[kubernetes官网](https:kubernetes.io/zh/docs/concepts/configuration/taint-and-toleration/)
|
||
> - nodeName:让Pod运行再指定的node上
|
||
> - nodeSelector:通过标签选择器,让Pod运行再指定的一类node上
|
||
|
||
~~~
|
||
# 给21机器打个污点,22机器:
|
||
~]# kubectl taint node hdss7-21.host.com node-role.kubernetes.io/master=master:NoSchedule
|
||
~~~
|
||
|
||

|
||
|
||

|
||
|
||
~~~
|
||
# 21/22两个机器,修改软连接:
|
||
~]# mount -o remount,rw /sys/fs/cgroup/
|
||
~]# ln -s /sys/fs/cgroup/cpu,cpuacct /sys/fs/cgroup/cpuacct,cpu
|
||
~]# ls -l /sys/fs/cgroup/
|
||
~~~
|
||
|
||
> **mount -o remount, rw /sys/fs/cgroup:**重新以可读可写的方式挂载为已经挂载/sys/fs/cgroup
|
||
>
|
||
> **ln -s:**创建对应的软链接
|
||
>
|
||
> **ls -l:**显示不隐藏的文件与文件夹的详细信息
|
||
|
||

|
||
|
||
~~~
|
||
# 22机器,应用资源清单:
|
||
~]# kubectl apply -f http://k8s-yaml.od.com/cadvisor/ds.yaml
|
||
~]# kubectl get pods -n kube-system -o wide
|
||
~~~
|
||
|
||
只有22机器上有,跟我们预期一样
|
||
|
||

|
||
|
||
~~~
|
||
# 21机器,我们删掉污点:
|
||
~]# kubectl taint node hdss7-21.host.com node-role.kubernetes.io/master-
|
||
# out: node/hdss7-21.host.com untainted
|
||
~~~
|
||
|
||
看dashboard,污点已经没了
|
||
|
||

|
||
|
||
在去Pods看,污点没了,pod就自动起来了
|
||
|
||

|
||
|
||
完成
|
||
|
||
再修改下
|
||
|
||

|
||
|
||
|
||
|
||
|
||
|
||
### 交付blackbox-exporter
|
||
|
||
> **WHAT:**监控业务容器存活性
|
||
|
||
~~~
|
||
# 200机器,下载镜像
|
||
~]# docker pull prom/blackbox-exporter:v0.15.1
|
||
~]# docker images|grep blackbox-exporter
|
||
~]# docker tag 81b70b6158be harbor.od.com/public/blackbox-exporter:v0.15.1
|
||
~]# docker push harbor.od.com/public/blackbox-exporter:v0.15.1
|
||
~]# mkdir /data/k8s-yaml/blackbox-exporter
|
||
~]# cd /data/k8s-yaml/blackbox-exporter
|
||
blackbox-exporter]# vi cm.yaml
|
||
apiVersion: v1
|
||
kind: ConfigMap
|
||
metadata:
|
||
labels:
|
||
app: blackbox-exporter
|
||
name: blackbox-exporter
|
||
namespace: kube-system
|
||
data:
|
||
blackbox.yml: |-
|
||
modules:
|
||
http_2xx:
|
||
prober: http
|
||
timeout: 2s
|
||
http:
|
||
valid_http_versions: ["HTTP/1.1", "HTTP/2"]
|
||
valid_status_codes: [200,301,302]
|
||
method: GET
|
||
preferred_ip_protocol: "ip4"
|
||
tcp_connect:
|
||
prober: tcp
|
||
timeout: 2s
|
||
|
||
blackbox-exporter]# vi dp.yaml
|
||
kind: Deployment
|
||
apiVersion: extensions/v1beta1
|
||
metadata:
|
||
name: blackbox-exporter
|
||
namespace: kube-system
|
||
labels:
|
||
app: blackbox-exporter
|
||
annotations:
|
||
deployment.kubernetes.io/revision: 1
|
||
spec:
|
||
replicas: 1
|
||
selector:
|
||
matchLabels:
|
||
app: blackbox-exporter
|
||
template:
|
||
metadata:
|
||
labels:
|
||
app: blackbox-exporter
|
||
spec:
|
||
volumes:
|
||
- name: config
|
||
configMap:
|
||
name: blackbox-exporter
|
||
defaultMode: 420
|
||
containers:
|
||
- name: blackbox-exporter
|
||
image: harbor.od.com/public/blackbox-exporter:v0.15.1
|
||
imagePullPolicy: IfNotPresent
|
||
args:
|
||
- --config.file=/etc/blackbox_exporter/blackbox.yml
|
||
- --log.level=info
|
||
- --web.listen-address=:9115
|
||
ports:
|
||
- name: blackbox-port
|
||
containerPort: 9115
|
||
protocol: TCP
|
||
resources:
|
||
limits:
|
||
cpu: 200m
|
||
memory: 256Mi
|
||
requests:
|
||
cpu: 100m
|
||
memory: 50Mi
|
||
volumeMounts:
|
||
- name: config
|
||
mountPath: /etc/blackbox_exporter
|
||
readinessProbe:
|
||
tcpSocket:
|
||
port: 9115
|
||
initialDelaySeconds: 5
|
||
timeoutSeconds: 5
|
||
periodSeconds: 10
|
||
successThreshold: 1
|
||
failureThreshold: 3
|
||
|
||
blackbox-exporter]# vi svc.yaml
|
||
kind: Service
|
||
apiVersion: v1
|
||
metadata:
|
||
name: blackbox-exporter
|
||
namespace: kube-system
|
||
spec:
|
||
selector:
|
||
app: blackbox-exporter
|
||
ports:
|
||
- name: blackbox-port
|
||
protocol: TCP
|
||
port: 9115
|
||
|
||
blackbox-exporter]# vi ingress.yaml
|
||
apiVersion: extensions/v1beta1
|
||
kind: Ingress
|
||
metadata:
|
||
name: blackbox-exporter
|
||
namespace: kube-system
|
||
spec:
|
||
rules:
|
||
- host: blackbox.od.com
|
||
http:
|
||
paths:
|
||
- path: /
|
||
backend:
|
||
serviceName: blackbox-exporter
|
||
servicePort: blackbox-port
|
||
~~~
|
||
|
||

|
||
|
||
~~~
|
||
# 11机器,解析域名:
|
||
~]# vi /var/named/od.com.zone
|
||
serial 前滚一位
|
||
blackbox A 10.4.7.10
|
||
|
||
~]# systemctl restart named
|
||
# 22机器
|
||
~]# dig -t A blackbox.od.com @192.168.0.2 +short
|
||
# out: 10.4.7.10
|
||
~~~
|
||
|
||

|
||
|
||
~~~
|
||
# 22机器,应用:
|
||
~]# kubectl apply -f http://k8s-yaml.od.com/blackbox-exporter/cm.yaml
|
||
~]# kubectl apply -f http://k8s-yaml.od.com/blackbox-exporter/dp.yaml
|
||
~]# kubectl apply -f http://k8s-yaml.od.com/blackbox-exporter/svc.yaml
|
||
~]# kubectl apply -f http://k8s-yaml.od.com/blackbox-exporter/ingress.yaml
|
||
~~~
|
||
|
||

|
||
|
||
[blackbox.od.com](blackbox.od.com)
|
||
|
||

|
||
|
||
完成
|
||
|
||
|
||
|
||
### 安装部署Prometheus-server
|
||
|
||
> **WHAT:**服务核心组件,通过pull metrics从 Exporter 拉取和存储监控数据,并提供一套灵活的查询语言(PromQL)
|
||
|
||
[prometheus-server官网docker地址](https://hub.docker.com/r/prom/prometheus)
|
||
|
||
~~~~
|
||
# 200机器,准备镜像、资源清单:
|
||
~]# docker pull prom/prometheus:v2.14.0
|
||
~]# docker images|grep prometheus
|
||
~]# docker tag 7317640d555e harbor.od.com/infra/prometheus:v2.14.0
|
||
~]# docker push harbor.od.com/infra/prometheus:v2.14.0
|
||
~]# mkdir /data/k8s-yaml/prometheus
|
||
~]# cd /data/k8s-yaml/prometheus
|
||
prometheus]# vi rbac.yaml
|
||
apiVersion: v1
|
||
kind: ServiceAccount
|
||
metadata:
|
||
labels:
|
||
addonmanager.kubernetes.io/mode: Reconcile
|
||
kubernetes.io/cluster-service: "true"
|
||
name: prometheus
|
||
namespace: infra
|
||
---
|
||
apiVersion: rbac.authorization.k8s.io/v1
|
||
kind: ClusterRole
|
||
metadata:
|
||
labels:
|
||
addonmanager.kubernetes.io/mode: Reconcile
|
||
kubernetes.io/cluster-service: "true"
|
||
name: prometheus
|
||
rules:
|
||
- apiGroups:
|
||
- ""
|
||
resources:
|
||
- nodes
|
||
- nodes/metrics
|
||
- services
|
||
- endpoints
|
||
- pods
|
||
verbs:
|
||
- get
|
||
- list
|
||
- watch
|
||
- apiGroups:
|
||
- ""
|
||
resources:
|
||
- configmaps
|
||
verbs:
|
||
- get
|
||
- nonResourceURLs:
|
||
- /metrics
|
||
verbs:
|
||
- get
|
||
---
|
||
apiVersion: rbac.authorization.k8s.io/v1
|
||
kind: ClusterRoleBinding
|
||
metadata:
|
||
labels:
|
||
addonmanager.kubernetes.io/mode: Reconcile
|
||
kubernetes.io/cluster-service: "true"
|
||
name: prometheus
|
||
roleRef:
|
||
apiGroup: rbac.authorization.k8s.io
|
||
kind: ClusterRole
|
||
name: prometheus
|
||
subjects:
|
||
- kind: ServiceAccount
|
||
name: prometheus
|
||
namespace: infra
|
||
|
||
prometheus]# vi dp.yaml
|
||
apiVersion: extensions/v1beta1
|
||
kind: Deployment
|
||
metadata:
|
||
annotations:
|
||
deployment.kubernetes.io/revision: "5"
|
||
labels:
|
||
name: prometheus
|
||
name: prometheus
|
||
namespace: infra
|
||
spec:
|
||
progressDeadlineSeconds: 600
|
||
replicas: 1
|
||
revisionHistoryLimit: 7
|
||
selector:
|
||
matchLabels:
|
||
app: prometheus
|
||
strategy:
|
||
rollingUpdate:
|
||
maxSurge: 1
|
||
maxUnavailable: 1
|
||
type: RollingUpdate
|
||
template:
|
||
metadata:
|
||
labels:
|
||
app: prometheus
|
||
spec:
|
||
containers:
|
||
- name: prometheus
|
||
image: harbor.od.com/infra/prometheus:v2.14.0
|
||
imagePullPolicy: IfNotPresent
|
||
command:
|
||
- /bin/prometheus
|
||
args:
|
||
- --config.file=/data/etc/prometheus.yml
|
||
- --storage.tsdb.path=/data/prom-db
|
||
- --storage.tsdb.min-block-duration=10m
|
||
- --storage.tsdb.retention=72h
|
||
ports:
|
||
- containerPort: 9090
|
||
protocol: TCP
|
||
volumeMounts:
|
||
- mountPath: /data
|
||
name: data
|
||
resources:
|
||
requests:
|
||
cpu: "1000m"
|
||
memory: "1.5Gi"
|
||
limits:
|
||
cpu: "2000m"
|
||
memory: "3Gi"
|
||
imagePullSecrets:
|
||
- name: harbor
|
||
securityContext:
|
||
runAsUser: 0
|
||
serviceAccountName: prometheus
|
||
volumes:
|
||
- name: data
|
||
nfs:
|
||
server: hdss7-200
|
||
path: /data/nfs-volume/prometheus
|
||
|
||
prometheus]# vi svc.yaml
|
||
apiVersion: v1
|
||
kind: Service
|
||
metadata:
|
||
name: prometheus
|
||
namespace: infra
|
||
spec:
|
||
ports:
|
||
- port: 9090
|
||
protocol: TCP
|
||
targetPort: 9090
|
||
selector:
|
||
app: prometheus
|
||
|
||
prometheus]# vi ingress.yaml
|
||
apiVersion: extensions/v1beta1
|
||
kind: Ingress
|
||
metadata:
|
||
annotations:
|
||
kubernetes.io/ingress.class: traefik
|
||
name: prometheus
|
||
namespace: infra
|
||
spec:
|
||
rules:
|
||
- host: prometheus.od.com
|
||
http:
|
||
paths:
|
||
- path: /
|
||
backend:
|
||
serviceName: prometheus
|
||
servicePort: 9090
|
||
|
||
# 准备prometheus的配置文件:
|
||
prometheus]# mkdir /data/nfs-volume/prometheus
|
||
prometheus]# cd /data/nfs-volume/prometheus
|
||
prometheus]# mkdir {etc,prom-db}
|
||
prometheus]# cd etc/
|
||
etc]# cp /opt/certs/ca.pem .
|
||
etc]# cp -a /opt/certs/client.pem .
|
||
etc]# cp -a /opt/certs/client-key.pem .
|
||
etc]# prometheus.yml
|
||
global:
|
||
scrape_interval: 15s
|
||
evaluation_interval: 15s
|
||
scrape_configs:
|
||
- job_name: 'etcd'
|
||
tls_config:
|
||
ca_file: /data/etc/ca.pem
|
||
cert_file: /data/etc/client.pem
|
||
key_file: /data/etc/client-key.pem
|
||
scheme: https
|
||
static_configs:
|
||
- targets:
|
||
- '10.4.7.12:2379'
|
||
- '10.4.7.21:2379'
|
||
- '10.4.7.22:2379'
|
||
- job_name: 'kubernetes-apiservers'
|
||
kubernetes_sd_configs:
|
||
- role: endpoints
|
||
scheme: https
|
||
tls_config:
|
||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||
relabel_configs:
|
||
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
|
||
action: keep
|
||
regex: default;kubernetes;https
|
||
- job_name: 'kubernetes-pods'
|
||
kubernetes_sd_configs:
|
||
- role: pod
|
||
relabel_configs:
|
||
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
|
||
action: keep
|
||
regex: true
|
||
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
|
||
action: replace
|
||
target_label: __metrics_path__
|
||
regex: (.+)
|
||
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
|
||
action: replace
|
||
regex: ([^:]+)(?::\d+)?;(\d+)
|
||
replacement: $1:$2
|
||
target_label: __address__
|
||
- action: labelmap
|
||
regex: __meta_kubernetes_pod_label_(.+)
|
||
- source_labels: [__meta_kubernetes_namespace]
|
||
action: replace
|
||
target_label: kubernetes_namespace
|
||
- source_labels: [__meta_kubernetes_pod_name]
|
||
action: replace
|
||
target_label: kubernetes_pod_name
|
||
- job_name: 'kubernetes-kubelet'
|
||
kubernetes_sd_configs:
|
||
- role: node
|
||
relabel_configs:
|
||
- action: labelmap
|
||
regex: __meta_kubernetes_node_label_(.+)
|
||
- source_labels: [__meta_kubernetes_node_name]
|
||
regex: (.+)
|
||
target_label: __address__
|
||
replacement: ${1}:10255
|
||
- job_name: 'kubernetes-cadvisor'
|
||
kubernetes_sd_configs:
|
||
- role: node
|
||
relabel_configs:
|
||
- action: labelmap
|
||
regex: __meta_kubernetes_node_label_(.+)
|
||
- source_labels: [__meta_kubernetes_node_name]
|
||
regex: (.+)
|
||
target_label: __address__
|
||
replacement: ${1}:4194
|
||
- job_name: 'kubernetes-kube-state'
|
||
kubernetes_sd_configs:
|
||
- role: pod
|
||
relabel_configs:
|
||
- action: labelmap
|
||
regex: __meta_kubernetes_pod_label_(.+)
|
||
- source_labels: [__meta_kubernetes_namespace]
|
||
action: replace
|
||
target_label: kubernetes_namespace
|
||
- source_labels: [__meta_kubernetes_pod_name]
|
||
action: replace
|
||
target_label: kubernetes_pod_name
|
||
- source_labels: [__meta_kubernetes_pod_label_grafanak8sapp]
|
||
regex: .*true.*
|
||
action: keep
|
||
- source_labels: ['__meta_kubernetes_pod_label_daemon', '__meta_kubernetes_pod_node_name']
|
||
regex: 'node-exporter;(.*)'
|
||
action: replace
|
||
target_label: nodename
|
||
- job_name: 'blackbox_http_pod_probe'
|
||
metrics_path: /probe
|
||
kubernetes_sd_configs:
|
||
- role: pod
|
||
params:
|
||
module: [http_2xx]
|
||
relabel_configs:
|
||
- source_labels: [__meta_kubernetes_pod_annotation_blackbox_scheme]
|
||
action: keep
|
||
regex: http
|
||
- source_labels: [__address__, __meta_kubernetes_pod_annotation_blackbox_port, __meta_kubernetes_pod_annotation_blackbox_path]
|
||
action: replace
|
||
regex: ([^:]+)(?::\d+)?;(\d+);(.+)
|
||
replacement: $1:$2$3
|
||
target_label: __param_target
|
||
- action: replace
|
||
target_label: __address__
|
||
replacement: blackbox-exporter.kube-system:9115
|
||
- source_labels: [__param_target]
|
||
target_label: instance
|
||
- action: labelmap
|
||
regex: __meta_kubernetes_pod_label_(.+)
|
||
- source_labels: [__meta_kubernetes_namespace]
|
||
action: replace
|
||
target_label: kubernetes_namespace
|
||
- source_labels: [__meta_kubernetes_pod_name]
|
||
action: replace
|
||
target_label: kubernetes_pod_name
|
||
- job_name: 'blackbox_tcp_pod_probe'
|
||
metrics_path: /probe
|
||
kubernetes_sd_configs:
|
||
- role: pod
|
||
params:
|
||
module: [tcp_connect]
|
||
relabel_configs:
|
||
- source_labels: [__meta_kubernetes_pod_annotation_blackbox_scheme]
|
||
action: keep
|
||
regex: tcp
|
||
- source_labels: [__address__, __meta_kubernetes_pod_annotation_blackbox_port]
|
||
action: replace
|
||
regex: ([^:]+)(?::\d+)?;(\d+)
|
||
replacement: $1:$2
|
||
target_label: __param_target
|
||
- action: replace
|
||
target_label: __address__
|
||
replacement: blackbox-exporter.kube-system:9115
|
||
- source_labels: [__param_target]
|
||
target_label: instance
|
||
- action: labelmap
|
||
regex: __meta_kubernetes_pod_label_(.+)
|
||
- source_labels: [__meta_kubernetes_namespace]
|
||
action: replace
|
||
target_label: kubernetes_namespace
|
||
- source_labels: [__meta_kubernetes_pod_name]
|
||
action: replace
|
||
target_label: kubernetes_pod_name
|
||
- job_name: 'traefik'
|
||
kubernetes_sd_configs:
|
||
- role: pod
|
||
relabel_configs:
|
||
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
|
||
action: keep
|
||
regex: traefik
|
||
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
|
||
action: replace
|
||
target_label: __metrics_path__
|
||
regex: (.+)
|
||
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
|
||
action: replace
|
||
regex: ([^:]+)(?::\d+)?;(\d+)
|
||
replacement: $1:$2
|
||
target_label: __address__
|
||
- action: labelmap
|
||
regex: __meta_kubernetes_pod_label_(.+)
|
||
- source_labels: [__meta_kubernetes_namespace]
|
||
action: replace
|
||
target_label: kubernetes_namespace
|
||
- source_labels: [__meta_kubernetes_pod_name]
|
||
action: replace
|
||
target_label: kubernetes_pod_name
|
||
~~~~
|
||
|
||
> **cp -a:**在复制目录时使用,它保留链接、文件属性,并复制目录下的所有内容
|
||
|
||

|
||
|
||
~~~
|
||
# 11机器, 解析域名,有ingress就有页面就需要解析:
|
||
~]# vi /var/named/od.com.zone
|
||
serial 前滚一位
|
||
prometheus A 10.4.7.10
|
||
|
||
~]# systemctl restart named
|
||
~]# dig -t A prometheus.od.com @10.4.7.11 +short
|
||
# out:10.4.7.10
|
||
~~~
|
||
|
||

|
||
|
||
~~~
|
||
# 22机器,应用配置清单:
|
||
~]# kubectl apply -f http://k8s-yaml.od.com/prometheus/rbac.yaml
|
||
~]# kubectl apply -f http://k8s-yaml.od.com/prometheus/dp.yaml
|
||
~]# kubectl apply -f http://k8s-yaml.od.com/prometheus/svc.yaml
|
||
~]# kubectl apply -f http://k8s-yaml.od.com/prometheus/ingress.yaml
|
||
~~~
|
||
|
||

|
||
|
||

|
||
|
||
[prometheus.od.com](prometheus.od.com)
|
||
|
||
> 这就是Prometheus自带的UI页面,现在你就知道为什么我们需要Grafana来替代了,如果你还不清楚,等下看Grafana的页面你就知道了
|
||
|
||

|
||
|
||

|
||
|
||
完成
|
||
|
||
|
||
|
||
### 配置Prometheus监控业务容器
|
||
|
||
##### 先配置traefik
|
||
|
||

|
||
|
||
~~~
|
||
# Edit a Daemon Set,添加以下内容,记得给上面加逗号:
|
||
"annotations": {
|
||
"prometheus_io_scheme": "traefik",
|
||
"prometheus_io_path": "/metrics",
|
||
"prometheus_io_port": "8080"
|
||
}
|
||
# 直接加进去update,会自动对齐
|
||
~~~
|
||
|
||

|
||
|
||
删掉两个对应的pod让它重启
|
||
|
||

|
||
|
||
~~~
|
||
# 22机器,查看下,如果起不来就用命令行的方式强制删除:
|
||
~]# kubectl get pods -n kube-system
|
||
~]# kubectl delete pods traefik-ingress-g26kw -n kube-system --force --grace-period=0
|
||
~~~
|
||
|
||

|
||
|
||
启动成功后,去Prometheus查看
|
||
|
||
刷新后,可以看到是traefik2/2,已经有了
|
||
|
||

|
||
|
||
完成
|
||
|
||
|
||
|
||
##### blackbox
|
||
|
||
我们起一个dubbo-service,之前我们最后做的是Apollo的版本,现在我们的Apollo已经关了(因为消耗资源),现在需要起更早之前不是Apollo的版本。
|
||
|
||
我们去harbor里面找
|
||
|
||

|
||
|
||
> 我的Apollo的版本可能比你的多一个,不用在意,那是做实验弄的
|
||
|
||
修改版本信息
|
||
|
||

|
||
|
||

|
||
|
||
在把scale改成1
|
||
|
||

|
||
|
||
查看POD的LOGS日志
|
||
|
||

|
||
|
||
翻页查看,已经启动
|
||
|
||

|
||
|
||
如何监控存活性,只需要修改配置
|
||
|
||

|
||
|
||
~~~
|
||
# Edit a Deployment(TCP),添加以下内容
|
||
"annotations": {
|
||
"blackbox_port": "20880",
|
||
"blackbox_scheme": "tcp"
|
||
}
|
||
# 直接加进去update,会自动对齐
|
||
~~~
|
||
|
||

|
||
|
||
UPDATE后,已经running起来了
|
||
|
||

|
||
|
||
[prometheus.od.com](prometheus.od.com)刷新,自动发现业务
|
||
|
||

|
||
|
||
[blackbox.od.com](blackbox.od.com) 刷新
|
||
|
||

|
||
|
||
同样的,我们把dubbo-consumer也弄进来
|
||
|
||
先去harbor找一个不是Apollo的版本(为什么要用不是Apollo的版本前面已经说了)
|
||
|
||

|
||
|
||
修改版本信息,并添加annotations
|
||
|
||
~~~
|
||
# Edit a Deployment(http),添加以下内容,记得前面的逗号
|
||
"annotations":{
|
||
"blackbox_path": "/hello?name=health",
|
||
"blackbox_port": "8080",
|
||
"blackbox_scheme": "http"
|
||
}
|
||
# 直接加进去update,会自动对齐
|
||
~~~
|
||
|
||

|
||
|
||

|
||
|
||
UPDATE后,把scale改成1
|
||
|
||

|
||
|
||
确保起来了
|
||
|
||

|
||
|
||

|
||
|
||
[prometheus.od.com](prometheus.od.com)刷新,自动发现业务
|
||
|
||

|
||
|
||
[blackbox.od.com](blackbox.od.com) 刷新
|
||
|
||

|
||
|
||
|
||
|
||
### 安装部署配置Grafana
|
||
|
||
> **WHAT:**美观、强大的可视化监控指标展示工具
|
||
>
|
||
> **WHY:**用来代替prometheus原生UI界面
|
||
|
||
~~~
|
||
# 200机器,准备镜像、资源配置清单:
|
||
~]# docker pull grafana/grafana:5.4.2
|
||
~]# docker images|grep grafana
|
||
~]# docker tag 6f18ddf9e552 harbor.od.com/infra/grafana:v5.4.2
|
||
~]# docker push harbor.od.com/infra/grafana:v5.4.2
|
||
~]# mkdir /data/k8s-yaml/grafana/ /data/nfs-volume/grafana
|
||
~]# cd /data/k8s-yaml/grafana/
|
||
grafana]# vi rbac.yaml
|
||
apiVersion: rbac.authorization.k8s.io/v1
|
||
kind: ClusterRole
|
||
metadata:
|
||
labels:
|
||
addonmanager.kubernetes.io/mode: Reconcile
|
||
kubernetes.io/cluster-service: "true"
|
||
name: grafana
|
||
rules:
|
||
- apiGroups:
|
||
- "*"
|
||
resources:
|
||
- namespaces
|
||
- deployments
|
||
- pods
|
||
verbs:
|
||
- get
|
||
- list
|
||
- watch
|
||
---
|
||
apiVersion: rbac.authorization.k8s.io/v1
|
||
kind: ClusterRoleBinding
|
||
metadata:
|
||
labels:
|
||
addonmanager.kubernetes.io/mode: Reconcile
|
||
kubernetes.io/cluster-service: "true"
|
||
name: grafana
|
||
roleRef:
|
||
apiGroup: rbac.authorization.k8s.io
|
||
kind: ClusterRole
|
||
name: grafana
|
||
subjects:
|
||
- kind: User
|
||
name: k8s-node
|
||
|
||
grafana]# vi dp.yaml
|
||
apiVersion: extensions/v1beta1
|
||
kind: Deployment
|
||
metadata:
|
||
labels:
|
||
app: grafana
|
||
name: grafana
|
||
name: grafana
|
||
namespace: infra
|
||
spec:
|
||
progressDeadlineSeconds: 600
|
||
replicas: 1
|
||
revisionHistoryLimit: 7
|
||
selector:
|
||
matchLabels:
|
||
name: grafana
|
||
strategy:
|
||
rollingUpdate:
|
||
maxSurge: 1
|
||
maxUnavailable: 1
|
||
type: RollingUpdate
|
||
template:
|
||
metadata:
|
||
labels:
|
||
app: grafana
|
||
name: grafana
|
||
spec:
|
||
containers:
|
||
- name: grafana
|
||
image: harbor.od.com/infra/grafana:v5.4.2
|
||
imagePullPolicy: IfNotPresent
|
||
ports:
|
||
- containerPort: 3000
|
||
protocol: TCP
|
||
volumeMounts:
|
||
- mountPath: /var/lib/grafana
|
||
name: data
|
||
imagePullSecrets:
|
||
- name: harbor
|
||
securityContext:
|
||
runAsUser: 0
|
||
volumes:
|
||
- nfs:
|
||
server: hdss7-200
|
||
path: /data/nfs-volume/grafana
|
||
name: data
|
||
|
||
grafana]# vi svc.yaml
|
||
apiVersion: v1
|
||
kind: Service
|
||
metadata:
|
||
name: grafana
|
||
namespace: infra
|
||
spec:
|
||
ports:
|
||
- port: 3000
|
||
protocol: TCP
|
||
targetPort: 3000
|
||
selector:
|
||
app: grafana
|
||
|
||
grafana]# vi ingress.yaml
|
||
apiVersion: extensions/v1beta1
|
||
kind: Ingress
|
||
metadata:
|
||
name: grafana
|
||
namespace: infra
|
||
spec:
|
||
rules:
|
||
- host: grafana.od.com
|
||
http:
|
||
paths:
|
||
- path: /
|
||
backend:
|
||
serviceName: grafana
|
||
servicePort: 3000
|
||
|
||
~~~
|
||
|
||

|
||
|
||
~~~
|
||
# 11机器,解析域名:
|
||
~]# vi /var/named/od.com.zone
|
||
serial 前滚一位
|
||
|
||
grafana A 10.4.7.10
|
||
~]# systemctl restart named
|
||
~]# ping grafana.od.com
|
||
~~~
|
||
|
||

|
||
|
||
~~~~
|
||
# 22机器,应用配置清单:
|
||
~]# kubectl apply -f http://k8s-yaml.od.com/grafana/rbac.yaml
|
||
~]# kubectl apply -f http://k8s-yaml.od.com/grafana/dp.yaml
|
||
~]# kubectl apply -f http://k8s-yaml.od.com/grafana/svc.yaml
|
||
~]# kubectl apply -f http://k8s-yaml.od.com/grafana/ingress.yaml
|
||
~~~~
|
||
|
||

|
||
|
||
[grafana.od.com](grafana.od.com)
|
||
|
||
默认账户和密码都是admin
|
||
|
||
修改密码:admin123
|
||
|
||

|
||
|
||
修改配置,修改如下图
|
||
|
||

|
||
|
||
##### 装插件
|
||
|
||
进入容器
|
||
|
||

|
||
|
||
~~~
|
||
# 第一个:kubenetes App
|
||
grafana# grafana-cli plugins install grafana-kubernetes-app
|
||
# 第二个:Clock Pannel
|
||
grafana# grafana-cli plugins install grafana-clock-panel
|
||
# 第三个:Pie Chart
|
||
grafana# grafana-cli plugins install grafana-piechart-panel
|
||
# 第四个:D3Gauge
|
||
grafana# grafana-cli plugins install briangann-gauge-panel
|
||
# 第五个:Discrete
|
||
grafana# grafana-cli plugins install natel-discrete-panel
|
||
~~~
|
||
|
||

|
||
|
||
装完后,可以在200机器查看
|
||
|
||
~~~
|
||
# 200机器:
|
||
cd /data/nfs-volume/grafana/plugins/
|
||
plugins]# ll
|
||
~~~
|
||
|
||

|
||
|
||
删掉让它重启
|
||
|
||

|
||
|
||
重启完成后
|
||
|
||
|
||
|
||
查看[grafana.od.com](grafana.od.com),刚刚安装的5个插件都在里面了(记得检查是否在里面了)
|
||
|
||

|
||
|
||
##### 添加数据源:Add data source
|
||
|
||

|
||
|
||

|
||
|
||
~~~
|
||
# 填入参数:
|
||
URL:http://prometheus.od.com
|
||
TLS Client Auth✔ With CA Cert✔
|
||
~~~
|
||
|
||

|
||
|
||
~~~
|
||
# 填入参数对应的pem参数:
|
||
# 200机器拿ca等:
|
||
~]# cat /opt/certs/ca.pem
|
||
~]# cat /opt/certs/client.pem
|
||
~]# cat /opt/certs/client-key.pem
|
||
~~~
|
||
|
||

|
||
|
||

|
||
|
||
保存
|
||
|
||
然后我们去配置plugins里面的kubernetes
|
||
|
||

|
||
|
||

|
||
|
||
右侧就多了个按钮,点击进去
|
||
|
||

|
||
|
||
~~~
|
||
# 按参数填入:
|
||
Name:myk8s
|
||
URL:https://10.4.7.10:7443
|
||
Access:Server
|
||
TLS Client Auth✔ With CA Cert✔
|
||
~~~
|
||
|
||

|
||
|
||
~~~
|
||
# 填入参数:
|
||
# 200机器拿ca等:
|
||
~]# cat /opt/certs/ca.pem
|
||
~]# cat /opt/certs/client.pem
|
||
~]# cat /opt/certs/client-key.pem
|
||
~~~
|
||
|
||

|
||
|
||
save后再点击右侧框的图标,并点击Name
|
||
|
||

|
||
|
||
可能抓取数据的时间会稍微慢些(两分钟左右)
|
||
|
||

|
||
|
||

|
||
|
||
点击右上角的K8s Cluster,选择你要看的东西
|
||
|
||

|
||
|
||
由于K8s Container里面数据不全,如下图
|
||
|
||

|
||
|
||
我们改下,把Cluster删了
|
||
|
||

|
||
|
||

|
||
|
||
container也删了
|
||
|
||

|
||
|
||
deployment也删了
|
||
|
||

|
||
|
||
node也删了
|
||
|
||

|
||
|
||

|
||
|
||

|
||
|
||
把我给你准备的dashboard的json文件import进来
|
||
|
||

|
||
|
||

|
||
|
||

|
||
|
||
用同样的方法把node、deployment、cluster、container这4个分别import进来
|
||
|
||

|
||
|
||
可以都看一下,已经正常了
|
||
|
||
然后把etcd、generic、traefik也import进来
|
||
|
||

|
||
|
||

|
||
|
||
还有另外一种import的方法(使用官网的):
|
||
|
||
[grafana官网](https://grafana.com/grafana/dashboards)
|
||
|
||
找一个别人写好的点进去
|
||
|
||

|
||
|
||
这个编号可以直接用
|
||
|
||

|
||
|
||
如下图,我们装blackbox的编号是9965
|
||
|
||

|
||
|
||

|
||
|
||
把名字和Prometheus修改一下
|
||
|
||

|
||
|
||
或者,你也可以用我上传的(我用的是7587)
|
||
|
||

|
||
|
||
你可以两个都用,自己做对比,都留着也可以,就是占一些资源
|
||
|
||
JMX
|
||
|
||

|
||
|
||
这个里面还什么都没有
|
||
|
||

|
||
|
||
|
||
|
||
#### 把Dubbo微服务数据弄到Grafana
|
||
|
||
dubbo-service
|
||
|
||

|
||
|
||
~~~
|
||
# Edit a Daemon Set,添加以下内容,注意给上一行加逗号
|
||
"prometheus_io_scrape": "true",
|
||
"prometheus_io_port": "12346",
|
||
"prometheus_io_path": "/"
|
||
# 直接加进去update,会自动对齐,
|
||
~~~
|
||
|
||

|
||
|
||
dubbo-consumer
|
||
|
||

|
||
|
||
~~~
|
||
# Edit a Daemon Set,添加以下内容,注意给上一行加逗号
|
||
"prometheus_io_scrape": "true",
|
||
"prometheus_io_port": "12346",
|
||
"prometheus_io_path": "/"
|
||
# 直接加进去update,会自动对齐,
|
||
~~~
|
||
|
||

|
||
|
||
刷新JMX(可能有点慢,我等了1分钟才出来service,我机器不行了)
|
||
|
||

|
||
|
||
完成
|
||
|
||
> 此时你可以感受到,Grafana明显比K8S自带的UI界面更加人性化
|
||
|
||
|
||
|
||
### 安装部署alertmanager
|
||
|
||
> **WHAT:** 从 Prometheus server 端接收到 alerts 后,会进行去除重复数据,分组,并路由到对方的接受方式,发出报警。常见的接收方式有:电子邮件,pagerduty 等。
|
||
>
|
||
> **WHY:**使得系统的警告随时让我们知道
|
||
|
||
~~~
|
||
# 200机器,准备镜像、资源清单:
|
||
~]# mkdir /data/k8s-yaml/alertmanager
|
||
~]# cd /data/k8s-yaml/alertmanager
|
||
alertmanager]# docker pull docker.io/prom/alertmanager:v0.14.0
|
||
# 注意,这里你如果不用14版本可能会报错
|
||
alertmanager]# docker images|grep alert
|
||
alertmanager]# docker tag 23744b2d645c harbor.od.com/infra/alertmanager:v0.14.0
|
||
alertmanager]# docker push harbor.od.com/infra/alertmanager:v0.14.0
|
||
# 注意下面记得修改成你自己的邮箱等信息,还有中文注释可以删掉
|
||
alertmanager]# vi cm.yaml
|
||
apiVersion: v1
|
||
kind: ConfigMap
|
||
metadata:
|
||
name: alertmanager-config
|
||
namespace: infra
|
||
data:
|
||
config.yml: |-
|
||
global:
|
||
# 在没有报警的情况下声明为已解决的时间
|
||
resolve_timeout: 5m
|
||
# 配置邮件发送信息
|
||
smtp_smarthost: 'smtp.163.com:25'
|
||
smtp_from: 'ben909336740@163.com'
|
||
smtp_auth_username: 'ben909336740@163.com'
|
||
smtp_auth_password: 'xxxxxx'
|
||
smtp_require_tls: false
|
||
# 所有报警信息进入后的根路由,用来设置报警的分发策略
|
||
route:
|
||
# 这里的标签列表是接收到报警信息后的重新分组标签,例如,接收到的报警信息里面有许多具有 cluster=A 和 alertname=LatncyHigh 这样的标签的报警信息将会批量被聚合到一个分组里面
|
||
group_by: ['alertname', 'cluster']
|
||
# 当一个新的报警分组被创建后,需要等待至少group_wait时间来初始化通知,这种方式可以确保您能有足够的时间为同一分组来获取多个警报,然后一起触发这个报警信息。
|
||
group_wait: 30s
|
||
|
||
# 当第一个报警发送后,等待'group_interval'时间来发送新的一组报警信息。
|
||
group_interval: 5m
|
||
|
||
# 如果一个报警信息已经发送成功了,等待'repeat_interval'时间来重新发送他们
|
||
repeat_interval: 5m
|
||
|
||
# 默认的receiver:如果一个报警没有被一个route匹配,则发送给默认的接收器
|
||
receiver: default
|
||
|
||
receivers:
|
||
- name: 'default'
|
||
email_configs:
|
||
- to: '909336740@qq.com'
|
||
send_resolved: true
|
||
|
||
alertmanager]# vi dp.yaml
|
||
apiVersion: extensions/v1beta1
|
||
kind: Deployment
|
||
metadata:
|
||
name: alertmanager
|
||
namespace: infra
|
||
spec:
|
||
replicas: 1
|
||
selector:
|
||
matchLabels:
|
||
app: alertmanager
|
||
template:
|
||
metadata:
|
||
labels:
|
||
app: alertmanager
|
||
spec:
|
||
containers:
|
||
- name: alertmanager
|
||
image: harbor.od.com/infra/alertmanager:v0.14.0
|
||
args:
|
||
- "--config.file=/etc/alertmanager/config.yml"
|
||
- "--storage.path=/alertmanager"
|
||
ports:
|
||
- name: alertmanager
|
||
containerPort: 9093
|
||
volumeMounts:
|
||
- name: alertmanager-cm
|
||
mountPath: /etc/alertmanager
|
||
volumes:
|
||
- name: alertmanager-cm
|
||
configMap:
|
||
name: alertmanager-config
|
||
imagePullSecrets:
|
||
- name: harbor
|
||
|
||
alertmanager]# vi svc.yaml
|
||
apiVersion: v1
|
||
kind: Service
|
||
metadata:
|
||
name: alertmanager
|
||
namespace: infra
|
||
spec:
|
||
selector:
|
||
app: alertmanager
|
||
ports:
|
||
- port: 80
|
||
targetPort: 9093
|
||
~~~
|
||
|
||

|
||
|
||
~~~
|
||
# 22机器,应用清单:
|
||
~]# kubectl apply -f http://k8s-yaml.od.com/alertmanager/cm.yaml
|
||
~]# kubectl apply -f http://k8s-yaml.od.com/alertmanager/dp.yaml
|
||
~]# kubectl apply -f http://k8s-yaml.od.com/alertmanager/svc.yaml
|
||
~~~
|
||
|
||

|
||
|
||

|
||
|
||
~~~
|
||
# 200机器,配置报警规则:
|
||
~]# vi /data/nfs-volume/prometheus/etc/rules.yml
|
||
groups:
|
||
- name: hostStatsAlert
|
||
rules:
|
||
- alert: hostCpuUsageAlert
|
||
expr: sum(avg without (cpu)(irate(node_cpu{mode!='idle'}[5m]))) by (instance) > 0.85
|
||
for: 5m
|
||
labels:
|
||
severity: warning
|
||
annotations:
|
||
summary: "{{ $labels.instance }} CPU usage above 85% (current value: {{ $value }}%)"
|
||
- alert: hostMemUsageAlert
|
||
expr: (node_memory_MemTotal - node_memory_MemAvailable)/node_memory_MemTotal > 0.85
|
||
for: 5m
|
||
labels:
|
||
severity: warning
|
||
annotations:
|
||
summary: "{{ $labels.instance }} MEM usage above 85% (current value: {{ $value }}%)"
|
||
- alert: OutOfInodes
|
||
expr: node_filesystem_free{fstype="overlay",mountpoint ="/"} / node_filesystem_size{fstype="overlay",mountpoint ="/"} * 100 < 10
|
||
for: 5m
|
||
labels:
|
||
severity: warning
|
||
annotations:
|
||
summary: "Out of inodes (instance {{ $labels.instance }})"
|
||
description: "Disk is almost running out of available inodes (< 10% left) (current value: {{ $value }})"
|
||
- alert: OutOfDiskSpace
|
||
expr: node_filesystem_free{fstype="overlay",mountpoint ="/rootfs"} / node_filesystem_size{fstype="overlay",mountpoint ="/rootfs"} * 100 < 10
|
||
for: 5m
|
||
labels:
|
||
severity: warning
|
||
annotations:
|
||
summary: "Out of disk space (instance {{ $labels.instance }})"
|
||
description: "Disk is almost full (< 10% left) (current value: {{ $value }})"
|
||
- alert: UnusualNetworkThroughputIn
|
||
expr: sum by (instance) (irate(node_network_receive_bytes[2m])) / 1024 / 1024 > 100
|
||
for: 5m
|
||
labels:
|
||
severity: warning
|
||
annotations:
|
||
summary: "Unusual network throughput in (instance {{ $labels.instance }})"
|
||
description: "Host network interfaces are probably receiving too much data (> 100 MB/s) (current value: {{ $value }})"
|
||
- alert: UnusualNetworkThroughputOut
|
||
expr: sum by (instance) (irate(node_network_transmit_bytes[2m])) / 1024 / 1024 > 100
|
||
for: 5m
|
||
labels:
|
||
severity: warning
|
||
annotations:
|
||
summary: "Unusual network throughput out (instance {{ $labels.instance }})"
|
||
description: "Host network interfaces are probably sending too much data (> 100 MB/s) (current value: {{ $value }})"
|
||
- alert: UnusualDiskReadRate
|
||
expr: sum by (instance) (irate(node_disk_bytes_read[2m])) / 1024 / 1024 > 50
|
||
for: 5m
|
||
labels:
|
||
severity: warning
|
||
annotations:
|
||
summary: "Unusual disk read rate (instance {{ $labels.instance }})"
|
||
description: "Disk is probably reading too much data (> 50 MB/s) (current value: {{ $value }})"
|
||
- alert: UnusualDiskWriteRate
|
||
expr: sum by (instance) (irate(node_disk_bytes_written[2m])) / 1024 / 1024 > 50
|
||
for: 5m
|
||
labels:
|
||
severity: warning
|
||
annotations:
|
||
summary: "Unusual disk write rate (instance {{ $labels.instance }})"
|
||
description: "Disk is probably writing too much data (> 50 MB/s) (current value: {{ $value }})"
|
||
- alert: UnusualDiskReadLatency
|
||
expr: rate(node_disk_read_time_ms[1m]) / rate(node_disk_reads_completed[1m]) > 100
|
||
for: 5m
|
||
labels:
|
||
severity: warning
|
||
annotations:
|
||
summary: "Unusual disk read latency (instance {{ $labels.instance }})"
|
||
description: "Disk latency is growing (read operations > 100ms) (current value: {{ $value }})"
|
||
- alert: UnusualDiskWriteLatency
|
||
expr: rate(node_disk_write_time_ms[1m]) / rate(node_disk_writes_completedl[1m]) > 100
|
||
for: 5m
|
||
labels:
|
||
severity: warning
|
||
annotations:
|
||
summary: "Unusual disk write latency (instance {{ $labels.instance }})"
|
||
description: "Disk latency is growing (write operations > 100ms) (current value: {{ $value }})"
|
||
- name: http_status
|
||
rules:
|
||
- alert: ProbeFailed
|
||
expr: probe_success == 0
|
||
for: 1m
|
||
labels:
|
||
severity: error
|
||
annotations:
|
||
summary: "Probe failed (instance {{ $labels.instance }})"
|
||
description: "Probe failed (current value: {{ $value }})"
|
||
- alert: StatusCode
|
||
expr: probe_http_status_code <= 199 OR probe_http_status_code >= 400
|
||
for: 1m
|
||
labels:
|
||
severity: error
|
||
annotations:
|
||
summary: "Status Code (instance {{ $labels.instance }})"
|
||
description: "HTTP status code is not 200-399 (current value: {{ $value }})"
|
||
- alert: SslCertificateWillExpireSoon
|
||
expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 30
|
||
for: 5m
|
||
labels:
|
||
severity: warning
|
||
annotations:
|
||
summary: "SSL certificate will expire soon (instance {{ $labels.instance }})"
|
||
description: "SSL certificate expires in 30 days (current value: {{ $value }})"
|
||
- alert: SslCertificateHasExpired
|
||
expr: probe_ssl_earliest_cert_expiry - time() <= 0
|
||
for: 5m
|
||
labels:
|
||
severity: error
|
||
annotations:
|
||
summary: "SSL certificate has expired (instance {{ $labels.instance }})"
|
||
description: "SSL certificate has expired already (current value: {{ $value }})"
|
||
- alert: BlackboxSlowPing
|
||
expr: probe_icmp_duration_seconds > 2
|
||
for: 5m
|
||
labels:
|
||
severity: warning
|
||
annotations:
|
||
summary: "Blackbox slow ping (instance {{ $labels.instance }})"
|
||
description: "Blackbox ping took more than 2s (current value: {{ $value }})"
|
||
- alert: BlackboxSlowRequests
|
||
expr: probe_http_duration_seconds > 2
|
||
for: 5m
|
||
labels:
|
||
severity: warning
|
||
annotations:
|
||
summary: "Blackbox slow requests (instance {{ $labels.instance }})"
|
||
description: "Blackbox request took more than 2s (current value: {{ $value }})"
|
||
- alert: PodCpuUsagePercent
|
||
expr: sum(sum(label_replace(irate(container_cpu_usage_seconds_total[1m]),"pod","$1","container_label_io_kubernetes_pod_name", "(.*)"))by(pod) / on(pod) group_right kube_pod_container_resource_limits_cpu_cores *100 )by(container,namespace,node,pod,severity) > 80
|
||
for: 5m
|
||
labels:
|
||
severity: warning
|
||
annotations:
|
||
summary: "Pod cpu usage percent has exceeded 80% (current value: {{ $value }}%)"
|
||
|
||
# 在最后面添加如下内容
|
||
~]# vi /data/nfs-volume/prometheus/etc/prometheus.yml
|
||
alerting:
|
||
alertmanagers:
|
||
- static_configs:
|
||
- targets: ["alertmanager"]
|
||
rule_files:
|
||
- "/data/etc/rules.yml"
|
||
~~~
|
||
|
||
> 
|
||
>
|
||
> **rules.yml文件:**这个文件就是报警规则
|
||
>
|
||
> 这时候可以重启Prometheus的pod,但生产商因为Prometheus太庞大,删掉容易拖垮集群,所以我们用另外一种方法,平滑加载(Prometheus支持):
|
||
|
||
~~~
|
||
# 21机器,因为我们起的Prometheus是在21机器,平滑加载:
|
||
~]# ps aux|grep prometheus
|
||
~]# kill -SIGHUP 1488
|
||
~~~
|
||
|
||

|
||
|
||

|
||
|
||
> 这时候报警规则就都有了
|
||
>
|
||
|
||
|
||
|
||
### 测试alertmanager报警功能
|
||
|
||
先把对应的两个邮箱的stmp都打开
|
||
|
||

|
||
|
||

|
||
|
||
我们测试一下,把dubbo-service停了,这样consumer就会报错
|
||
|
||
把service的scale改成0
|
||
|
||

|
||
|
||
[blackbox.od.com](blackbox.od.com)查看,已经failure了
|
||
|
||

|
||
|
||
[prometheus.od.com.alerts](prometheus.od.com.alerts)查看,两个变红了(一开始是变黄)
|
||
|
||

|
||
|
||

|
||
|
||
这时候可以在163邮箱看到已发送的报警
|
||
|
||

|
||
|
||
QQ邮箱收到报警
|
||
|
||

|
||
|
||
完成(service的scale记得改回1)
|
||
|
||
> 关于rules.yml:报警不能错报也不能漏报,在实际应用中,我们需要不断的修改rules的规则,以来贴近我们公司的实际需求。
|
||
|
||
|
||
|
||
#### 资源不足时,可关闭部分非必要资源
|
||
|
||
~~~
|
||
# 22机器,也可以用dashboard操作:
|
||
~]# kubectl scale deployment grafana --replicas=0 -n infra
|
||
# out : deployment.extensions/grafana scaled
|
||
~]# kubectl scale deployment alertmanager --replicas=0 -n infra
|
||
# out : deployment.extensions/alertmanager scaled
|
||
~]# kubectl scale deployment prometheus --replicas=0 -n infra
|
||
# out : deployment.extensions/prometheus scaled
|
||
~~~
|
||
|
||
|
||
|
||
### 通过K8S部署dubbo微服务接入ELK架构
|
||
|
||
> **WHAT:**ELK是三个开源软件的缩写,分别是:
|
||
>
|
||
> - E——ElasticSearch:分布式搜索引擎,提供搜集、分析、存储数据三大功能。
|
||
> - L——LogStash:对日志的搜集、分析、过滤日志的工具,支持大量的数据获取方式。
|
||
> - K——Kibana:为 Logstash 和 ElasticSearch 提供的日志分析友好的 Web 界面,可以帮助汇总、分析和搜索重要数据日志。
|
||
> - 还有新增的FileBeat(流式日志收集器):轻量级的日志收集处理工具,占用资源少,适合于在各个服务器上搜集日志后传输给Logstash,官方也推荐此工具,用来替代部分原本Logstash的工作。
|
||
>
|
||
> **WHY:** 随着容器编排的进行,业务容器在不断的被创建、摧毁、迁移、扩容缩容等,面对如此海量的数据,又分布在各个不同的地方,我们不可能用传统的方法登录到每台机器看,所以我们需要建立一套集中的方法。我们需要这样一套日志手机、分析的系统:
|
||
>
|
||
> - 收集——采集多种来源的日志数据(流式日志收集器)
|
||
> - 传输——稳定的把日志数据传输到中央系统(消息队列)
|
||
> - 存储——将日志以结构化数据的形式存储起来(搜索引擎)
|
||
> - 分析——支持方便的分析、检索等,有GUI管理系统(前端)
|
||
> - 警告——提供错误报告,监控机制(监控工具)
|
||
>
|
||
> #### 这就是ELK
|
||
|
||
#### ELK Stack概述
|
||
|
||

|
||
|
||
> **c1/c2:**container(容器)的缩写
|
||
>
|
||
> **filebeat:**收集业务容器的日志,把c和filebeat放在一个pod里让他们一起跑,这样耦合就紧了
|
||
>
|
||
> **kafka:**高吞吐量的[分布式](https://baike.baidu.com/item/%E5%88%86%E5%B8%83%E5%BC%8F/19276232)发布订阅消息系统,它可以处理消费者在网站中的所有动作流数据。filebeat收集数据以Topic形式发布到kafka。
|
||
>
|
||
> **Topic:**Kafka数据写入操作的基本单元
|
||
>
|
||
> **logstash:**取kafka里的topic,然后再往ElasticSearch上传(异步过程,即又取又传)
|
||
>
|
||
> **index-pattern:**把数据按环境分(按prod和test分),并传到kibana
|
||
>
|
||
> **kibana:**展示数据
|
||
|
||
|
||
|
||
### 制作tomcat容器的底包镜像
|
||
|
||
> 尝试用tomcat的方式,因为很多公司老项目都是用tomcat跑起来,之前我们用的是springboot
|
||
|
||
[tomcat官网](tomcat.apache.org)
|
||
|
||

|
||
|
||
~~~
|
||
# 200 机器:
|
||
cd /opt/src/
|
||
# 你也可以直接用我上传的,因为版本一直在变,之前的版本你是下载不下来的,如何查看新版本如上图
|
||
src]# wget http://mirrors.tuna.tsinghua.edu.cn/apache/tomcat/tomcat-8/v8.5.51/bin/apache-tomcat-8.5.51.tar.gz
|
||
src]# mkdir /data/dockerfile/tomcat
|
||
src]# tar xfv apache-tomcat-8.5.51.tar.gz -C /data/dockerfile/tomcat
|
||
src]# cd /data/dockerfile/tomcat
|
||
# 配置tomcat-关闭AJP端口
|
||
tomcat]# vi apache-tomcat-8.5.51/conf/server.xml
|
||
# 找到AJP,注释掉相应的一行,结果如下图,8.5.51是已经自动注释掉的
|
||
~~~
|
||
|
||

|
||
|
||
~~~
|
||
# 200机器,删掉不需要的日志:
|
||
tomcat]# vi apache-tomcat-8.5.51/conf/logging.properties
|
||
# 删掉3manager,4host-manager的handlers,并注释掉相关的,结果如下图
|
||
# 日志级别改成INFO
|
||
~~~
|
||
|
||

|
||
|
||

|
||
|
||

|
||
|
||
~~~
|
||
# 200机器,准备Dockerfile:
|
||
tomcat]# vi Dockerfile
|
||
From harbor.od.com/public/jre:8u112
|
||
RUN /bin/cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime &&\
|
||
echo 'Asia/Shanghai' >/etc/timezone
|
||
ENV CATALINA_HOME /opt/tomcat
|
||
ENV LANG zh_CN.UTF-8
|
||
ADD apache-tomcat-8.5.51/ /opt/tomcat
|
||
ADD config.yml /opt/prom/config.yml
|
||
ADD jmx_javaagent-0.3.1.jar /opt/prom/jmx_javaagent-0.3.1.jar
|
||
WORKDIR /opt/tomcat
|
||
ADD entrypoint.sh /entrypoint.sh
|
||
CMD ["/entrypoint.sh"]
|
||
|
||
tomcat]# vi config.yml
|
||
---
|
||
rules:
|
||
- pattern: '-*'
|
||
|
||
tomcat]# wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.3.1/jmx_prometheus_javaagent-0.3.1.jar -O jmx_javaagent-0.3.1.jar
|
||
tomcat]# vi entrypoint.sh
|
||
#!/bin/bash
|
||
M_OPTS="-Duser.timezone=Asia/Shanghai -javaagent:/opt/prom/jmx_javaagent-0.3.1.jar=$(hostname -i):${M_PORT:-"12346"}:/opt/prom/config.yml"
|
||
C_OPTS=${C_OPTS}
|
||
MIN_HEAP=${MIN_HEAP:-"128m"}
|
||
MAX_HEAP=${MAX_HEAP:-"128m"}
|
||
JAVA_OPTS=${JAVA_OPTS:-"-Xmn384m -Xss256k -Duser.timezone=GMT+08 -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=0 -XX:+CMSClassUnloadingEnabled -XX:LargePageSizeInBytes=128m -XX:+UseFastAccessorMethods -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=80 -XX:SoftRefLRUPolicyMSPerMB=0 -XX:+PrintClassHistogram -Dfile.encoding=UTF8 -Dsun.jnu.encoding=UTF8"}
|
||
CATALINA_OPTS="${CATALINA_OPTS}"
|
||
JAVA_OPTS="${M_OPTS} ${C_OPTS} -Xms${MIN_HEAP} -Xmx${MAX_HEAP} ${JAVA_OPTS}"
|
||
sed -i -e "1a\JAVA_OPTS=\"$JAVA_OPTS\"" -e "1a\CATALINA_OPTS=\"$CATALINA_OPTS\"" /opt/tomcat/bin/catalina.sh
|
||
|
||
cd /opt/tomcat && /opt/tomcat/bin/catalina.sh run 2>&1 >> /opt/tomcat/logs/stdout.log
|
||
|
||
tomcat]# chmod u+x entrypoint.sh
|
||
tomcat]# ll
|
||
tomcat]# docker build . -t harbor.od.com/base/tomcat:v8.5.51
|
||
tomcat]# docker push harbor.od.com/base/tomcat:v8.5.51
|
||
~~~
|
||
|
||
> **Dockerfile文件解析:**
|
||
>
|
||
> - FROM:镜像地址
|
||
> - RUN:修改时区
|
||
> - ENV:设置环境变量,把tomcat软件放到opt下
|
||
> - ENV:设置环境变量,字符集用zh_CN.UTF-8
|
||
> - ADD:把apache-tomcat-8.5.50包放到/opt/tomcat下
|
||
> - ADD:让prome基于文件的自动发现服务,这个可以不要,因为没在用prome
|
||
> - ADD:把jmx_javaagent-0.3.1.jar包放到/opt/...下,用来专门收集jvm的export,能提供一个http的接口
|
||
> - WORKDIR:工作目录
|
||
> - ADD:移动文件
|
||
> - CMD:运行文件
|
||
|
||

|
||
|
||
完成
|
||
|
||
|
||
|
||
### 交付tomcat形式的dubbo服务消费者到K8S集群
|
||
|
||
改造下dubbo-demo-web项目
|
||
|
||
由于是tomcat,我们需要多建一条Jenkins流水线
|
||
|
||

|
||
|
||

|
||
|
||

|
||
|
||
1
|
||
|
||

|
||
|
||
2
|
||
|
||

|
||
|
||
3
|
||
|
||

|
||
|
||
4
|
||
|
||

|
||
|
||
5
|
||
|
||

|
||
|
||
6
|
||
|
||

|
||
|
||
7
|
||
|
||

|
||
|
||
8
|
||
|
||

|
||
|
||
9
|
||
|
||

|
||
|
||
10
|
||
|
||

|
||
|
||
11
|
||
|
||

|
||
|
||
~~~shell
|
||
# 将如下内容填入pipeline:
|
||
pipeline {
|
||
agent any
|
||
stages {
|
||
stage('pull') { //get project code from repo
|
||
steps {
|
||
sh "git clone ${params.git_repo} ${params.app_name}/${env.BUILD_NUMBER} && cd ${params.app_name}/${env.BUILD_NUMBER} && git checkout ${params.git_ver}"
|
||
}
|
||
}
|
||
stage('build') { //exec mvn cmd
|
||
steps {
|
||
sh "cd ${params.app_name}/${env.BUILD_NUMBER} && /var/jenkins_home/maven-${params.maven}/bin/${params.mvn_cmd}"
|
||
}
|
||
}
|
||
stage('unzip') { //unzip target/*.war -c target/project_dir
|
||
steps {
|
||
sh "cd ${params.app_name}/${env.BUILD_NUMBER} && cd ${params.target_dir} && mkdir project_dir && unzip *.war -d ./project_dir"
|
||
}
|
||
}
|
||
stage('image') { //build image and push to registry
|
||
steps {
|
||
writeFile file: "${params.app_name}/${env.BUILD_NUMBER}/Dockerfile", text: """FROM harbor.od.com/${params.base_image}
|
||
ADD ${params.target_dir}/project_dir /opt/tomcat/webapps/${params.root_url}"""
|
||
sh "cd ${params.app_name}/${env.BUILD_NUMBER} && docker build -t harbor.od.com/${params.image_name}:${params.git_ver}_${params.add_tag} . && docker push harbor.od.com/${params.image_name}:${params.git_ver}_${params.add_tag}"
|
||
}
|
||
}
|
||
}
|
||
}
|
||
~~~
|
||
|
||

|
||
|
||
save
|
||
|
||
点击构建
|
||
|
||

|
||
|
||
~~~
|
||
# 填入指定参数,我的gittee是有tomcat的版本的,下面我依旧用的是gitlab
|
||
app_name: dubbo-demo-web
|
||
image_name: app/dubbo-demo-web
|
||
git_repo: http://gitlab.od.com:10000/909336740/dubbo-demo-web.git
|
||
git_ver: tomcat
|
||
add_tag: 20200214_1300
|
||
mvn_dir: ./
|
||
target_dir: ./dubbo-client/target
|
||
mvn_cmd: mvn clean package -Dmaven.test.skip=true
|
||
base_image: base/tomcat:v8.5.51
|
||
maven: 3.6.1-8u232
|
||
root_url: ROOT
|
||
# 点击Build进行构建,等待构建完成
|
||
~~~
|
||
|
||

|
||
|
||
build成功后
|
||
|
||
|
||
|
||
修改版本信息,删掉20880,如下图,然后update
|
||
|
||

|
||
|
||

|
||
|
||
浏览器输入[demo-test.od.com/hello?name=tomcat](demo-test.od.com/hello?name=tomcat)
|
||
|
||

|
||
|
||
完成
|
||
|
||
查看dashboard里的pod里
|
||
|
||

|
||
|
||

|
||
|
||
> 这些就是我们要收集的日志,收到ELK
|
||
|
||
|
||
|
||
### 二进制安装部署elasticsearch
|
||
|
||
> 我们这里只部一个es的节点,因为我们主要是了解数据流的方式
|
||
>
|
||
|
||
[官网下载包](https://www.elastic.co/downloads/past-releases/elasticsearch-6-8-6)右键复制链接
|
||
|
||

|
||
|
||
~~~
|
||
# 12机器:
|
||
~]# cd /opt/src/
|
||
src]# wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.8.6.tar.gz
|
||
src]# tar xfv elasticsearch-6.8.6.tar.gz -C /opt
|
||
src]# ln -s /opt/elasticsearch-6.8.6/ /opt/elasticsearch
|
||
src]# cd /opt/elasticsearch
|
||
# 配置
|
||
elasticsearch]# mkdir -p /data/elasticsearch/{data,logs}
|
||
# 修改以下内容
|
||
elasticsearch]# vi config/elasticsearch.yml
|
||
cluster.name: es.od.com
|
||
node.name: hdss7-12.host.com
|
||
path.data: /data/elasticsearch/data
|
||
path.logs: /data/elasticsearch/logs
|
||
bootstrap.memory_lock: true
|
||
network.host: 10.4.7.12
|
||
http.port: 9200
|
||
|
||
# 修改以下内容
|
||
elasticsearch]# vi config/jvm.options
|
||
-Xms512m
|
||
-Xmx512m
|
||
|
||
# 创建普通用户
|
||
elasticsearch]# useradd -s /bin/bash -M es
|
||
elasticsearch]# chown -R es.es /opt/elasticsearch-6.8.6/
|
||
elasticsearch]# chown -R es.es /data/elasticsearch/
|
||
# 文件描述符
|
||
elasticsearch]# vi /etc/security/limits.d/es.conf
|
||
es hard nofile 65536
|
||
es soft fsize unlimited
|
||
es hard memlock unlimited
|
||
es soft memlock unlimited
|
||
|
||
# 调整内核参数
|
||
elasticsearch]# sysctl -w vm.max_map_count=262144
|
||
elasticsearch]# echo "vm.max_map_count=262144" >> /etc/sysctl.conf
|
||
elasticsearch]# sysctl -p
|
||
# 启动
|
||
elasticsearch]# su -c "/opt/elasticsearch/bin/elasticsearch -d" es
|
||
elasticsearch]# netstat -luntp|grep 9200
|
||
# 调整ES日志模板
|
||
elasticsearch]# curl -H "Content-Type:application/json" -XPUT http://10.4.7.12:9200/_template/k8s -d '{
|
||
"template" : "k8s*",
|
||
"index_patterns": ["k8s*"],
|
||
"settings": {
|
||
"number_of_shards": 5,
|
||
"number_of_replicas": 0
|
||
}
|
||
}'
|
||
~~~
|
||
|
||

|
||
|
||
> 完成,你看我敲这么多遍就知道要等
|
||
|
||
|
||
|
||
### 安装部署kafka和kafka-manager
|
||
|
||
[官网](https://kafka.apache.org)
|
||
|
||
> 做kafka的时候不建议用超过2.2.0的版本
|
||
>
|
||
|
||
~~~
|
||
# 11机器:
|
||
cd /opt/src/
|
||
src]# wget https://archive.apache.org/dist/kafka/2.2.0/kafka_2.12-2.2.0.tgz
|
||
src]# tar xfv kafka_2.12-2.2.0.tgz -C /opt/
|
||
src]# ln -s /opt/kafka_2.12-2.2.0/ /opt/kafka
|
||
src]# cd /opt/kafka
|
||
kafka]# ll
|
||
~~~
|
||
|
||

|
||
|
||
~~~
|
||
# 11机器,配置:
|
||
kafka]# mkdir -pv /data/kafka/logs
|
||
# 修改以下配置,其中zk是不变的,最下面两行则新增到尾部
|
||
kafka]# vi config/server.properties
|
||
log.dirs=/data/kafka/logs
|
||
zookeeper.connect=localhost:2181
|
||
log.flush.interval.messages=10000
|
||
log.flush.interval.ms=1000
|
||
delete.topic.enable=true
|
||
host.name=hdss7-11.host.com
|
||
~~~
|
||
|
||

|
||
|
||
~~~
|
||
# 11机器,启动:
|
||
kafka]# bin/kafka-server-start.sh -daemon config/server.properties
|
||
kafka]# ps aux|grep kafka
|
||
kafka]# netstat -luntp|grep 80711
|
||
~~~
|
||
|
||

|
||
|
||

|
||
|
||
##### 部署kafka-manager
|
||
|
||
~~~
|
||
# 200机器,制作docker:
|
||
~]# mkdir /data/dockerfile/kafka-manager
|
||
~]# cd /data/dockerfile/kafka-manager
|
||
kafka-manager]# vi Dockerfile
|
||
FROM hseeberger/scala-sbt
|
||
|
||
ENV ZK_HOSTS=10.4.7.11:2181 \
|
||
KM_VERSION=2.0.0.2
|
||
|
||
RUN mkdir -p /tmp && \
|
||
cd /tmp && \
|
||
wget https://github.com/yahoo/kafka-manager/archive/${KM_VERSION}.tar.gz && \
|
||
tar xxf ${KM_VERSION}.tar.gz && \
|
||
cd /tmp/kafka-manager-${KM_VERSION} && \
|
||
sbt clean dist && \
|
||
unzip -d / ./target/universal/kafka-manager-${KM_VERSION}.zip && \
|
||
rm -fr /tmp/${KM_VERSION} /tmp/kafka-manager-${KM_VERSION}
|
||
|
||
WORKDIR /kafka-manager-${KM_VERSION}
|
||
|
||
EXPOSE 9000
|
||
ENTRYPOINT ["./bin/kafka-manager","-Dconfig.file=conf/application.conf"]
|
||
|
||
# 因为大,build过程比较慢,也比较容易失败,20分钟左右,
|
||
kafka-manager]# docker build . -t harbor.od.com/infra/kafka-manager:v2.0.0.2
|
||
# build一直失败就用我做好的,不跟你的机器也得是10.4.7.11等,因为dockerfile里面已经写死了
|
||
# kafka-manager]# docker pull 909336740/kafka-manager:v2.0.0.2
|
||
# kafka-manager]# docker tag 29badab5ea08 harbor.od.com/infra/kafka-manager:v2.0.0.2
|
||
kafka-manager]# docker images|grep kafka
|
||
kafka-manager]# docker push harbor.od.com/infra/kafka-manager:v2.0.0.2
|
||
~~~
|
||
|
||

|
||
|
||
~~~
|
||
# 200机器,配置资源清单:
|
||
mkdir /data/k8s-yaml/kafka-manager
|
||
cd /data/k8s-yaml/kafka-manager
|
||
kafka-manager]# vi dp.yaml
|
||
kind: Deployment
|
||
apiVersion: extensions/v1beta1
|
||
metadata:
|
||
name: kafka-manager
|
||
namespace: infra
|
||
labels:
|
||
name: kafka-manager
|
||
spec:
|
||
replicas: 1
|
||
selector:
|
||
matchLabels:
|
||
app: kafka-manager
|
||
strategy:
|
||
type: RollingUpdate
|
||
rollingUpdate:
|
||
maxUnavailable: 1
|
||
maxSurge: 1
|
||
revisionHistoryLimit: 7
|
||
progressDeadlineSeconds: 600
|
||
template:
|
||
metadata:
|
||
labels:
|
||
app: kafka-manager
|
||
spec:
|
||
containers:
|
||
- name: kafka-manager
|
||
image: harbor.od.com/infra/kafka-manager:v2.0.0.2
|
||
imagePullPolicy: IfNotPresent
|
||
ports:
|
||
- containerPort: 9000
|
||
protocol: TCP
|
||
env:
|
||
- name: ZK_HOSTS
|
||
value: zk1.od.com:2181
|
||
- name: APPLICATION_SECRET
|
||
value: letmein
|
||
imagePullSecrets:
|
||
- name: harbor
|
||
terminationGracePeriodSeconds: 30
|
||
securityContext:
|
||
runAsUser: 0
|
||
|
||
kafka-manager]# vi svc.yaml
|
||
kind: Service
|
||
apiVersion: v1
|
||
metadata:
|
||
name: kafka-manager
|
||
namespace: infra
|
||
spec:
|
||
ports:
|
||
- protocol: TCP
|
||
port: 9000
|
||
targetPort: 9000
|
||
selector:
|
||
app: kafka-manager
|
||
|
||
kafka-manager]# vi ingress.yaml
|
||
kind: Ingress
|
||
apiVersion: extensions/v1beta1
|
||
metadata:
|
||
name: kafka-manager
|
||
namespace: infra
|
||
spec:
|
||
rules:
|
||
- host: km.od.com
|
||
http:
|
||
paths:
|
||
- path: /
|
||
backend:
|
||
serviceName: kafka-manager
|
||
servicePort: 9000
|
||
~~~
|
||
|
||

|
||
|
||
~~~
|
||
# 11机器,解析域名:
|
||
~]# vi /var/named/od.com.zone
|
||
serial 前滚一位
|
||
|
||
km A 10.4.7.10
|
||
~]# systemctl restart named
|
||
~]# dig -t A km.od.com @10.4.7.11 +short
|
||
# out:10.4.7.10
|
||
~~~
|
||
|
||

|
||
|
||
~~~
|
||
# 22机器,应用资源:
|
||
~]# kubectl apply -f http://k8s-yaml.od.com/kafka-manager/dp.yaml
|
||
~]# kubectl apply -f http://k8s-yaml.od.com/kafka-manager/svc.yaml
|
||
~]# kubectl apply -f http://k8s-yaml.od.com/kafka-manager/ingress.yaml
|
||
~~~
|
||
|
||

|
||
|
||
文件大可能起不来,需要多拉几次(当然你的资源配置高应该是没问题的)
|
||
|
||

|
||
|
||
启动成功后浏览器输入[km.od.com](km.od.com)
|
||
|
||

|
||
|
||

|
||
|
||
填完上面三个值后就可以下拉save了
|
||
|
||
点击
|
||
|
||

|
||
|
||

|
||
|
||

|
||
|
||
完成
|
||
|
||
|
||
|
||
### 制作filebeat底包并接入dubbo服务消费者
|
||
|
||
[Filebeat官网](https://www.elastic.co/downloads/beats/filebeat)
|
||
|
||
下载指纹
|
||
|
||

|
||
|
||
打开后复制,后面的不需要复制
|
||
|
||

|
||
|
||
开始前,请确保你的这些服务都是起来的
|
||
|
||

|
||
|
||
~~~
|
||
# 200机器,准备镜像,资源配置清单:
|
||
mkdir /data/dockerfile/filebeat
|
||
~]# cd /data/dockerfile/filebeat
|
||
# 刚刚复制的指纹替代到下面的FILEBEAT_SHA1来,你用的是什么版本FILEBEAT_VERSION就用什么版本,更新的很快,我之前用的是5.1现在已经是6.1了
|
||
filebeat]# vi Dockerfile
|
||
FROM debian:jessie
|
||
|
||
ENV FILEBEAT_VERSION=7.6.1 \
|
||
FILEBEAT_SHA1=887edb2ab255084ef96dbc4c7c047bfa92dad16f263e23c0fcc80120ea5aca90a3a7a44d4783ba37b135dac76618971272a591ab4a24997d8ad40c7bc23ffabf
|
||
|
||
RUN set -x && \
|
||
apt-get update && \
|
||
apt-get install -y wget && \
|
||
wget https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-${FILEBEAT_VERSION}-linux-x86_64.tar.gz -O /opt/filebeat.tar.gz && \
|
||
cd /opt && \
|
||
echo "${FILEBEAT_SHA1} filebeat.tar.gz" | sha512sum -c - && \
|
||
tar xzvf filebeat.tar.gz && \
|
||
cd filebeat-* && \
|
||
cp filebeat /bin && \
|
||
cd /opt && \
|
||
rm -rf filebeat* && \
|
||
apt-get purge -y wget && \
|
||
apt-get autoremove -y && \
|
||
apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
|
||
|
||
COPY docker-entrypoint.sh /
|
||
ENTRYPOINT ["/docker-entrypoint.sh"]
|
||
|
||
filebeat]# vi docker-entrypoint.sh
|
||
#!/bin/bash
|
||
|
||
ENV=${ENV:-"test"}
|
||
PROJ_NAME=${PROJ_NAME:-"no-define"}
|
||
MULTILINE=${MULTILINE:-"^\d{2}"}
|
||
|
||
cat > /etc/filebeat.yaml << EOF
|
||
filebeat.inputs:
|
||
- type: log
|
||
fields_under_root: true
|
||
fields:
|
||
topic: logm-${PROJ_NAME}
|
||
paths:
|
||
- /logm/*.log
|
||
- /logm/*/*.log
|
||
- /logm/*/*/*.log
|
||
- /logm/*/*/*/*.log
|
||
- /logm/*/*/*/*/*.log
|
||
scan_frequency: 120s
|
||
max_bytes: 10485760
|
||
multiline.pattern: '$MULTILINE'
|
||
multiline.negate: true
|
||
multiline.match: after
|
||
multiline.max_lines: 100
|
||
- type: log
|
||
fields_under_root: true
|
||
fields:
|
||
topic: logu-${PROJ_NAME}
|
||
paths:
|
||
- /logu/*.log
|
||
- /logu/*/*.log
|
||
- /logu/*/*/*.log
|
||
- /logu/*/*/*/*.log
|
||
- /logu/*/*/*/*/*.log
|
||
- /logu/*/*/*/*/*/*.log
|
||
output.kafka:
|
||
hosts: ["10.4.7.11:9092"]
|
||
topic: k8s-fb-$ENV-%{[topic]}
|
||
version: 2.0.0
|
||
required_acks: 0
|
||
max_message_bytes: 10485760
|
||
EOF
|
||
|
||
set -xe
|
||
|
||
# If user don't provide any command
|
||
# Run filebeat
|
||
if [[ "$1" == "" ]]; then
|
||
exec filebeat -c /etc/filebeat.yaml
|
||
else
|
||
# Else allow the user to run arbitrarily commands like bash
|
||
exec "$@"
|
||
fi
|
||
|
||
filebeat]# chmod u+x docker-entrypoint.sh
|
||
filebeat]# docker build . -t harbor.od.com/infra/filebeat:v7.6.1
|
||
# build可能会失败很多次,我最长的是7次,下面有相关报错
|
||
filebeat]# docker images|grep filebeat
|
||
filebeat]# docker push harbor.od.com/infra/filebeat:v7.6.1
|
||
# 删掉原来的内容全部用新的,使用的两个镜像对应上你自己的镜像
|
||
filebeat]# vi /data/k8s-yaml/test/dubbo-demo-consumer/dp.yaml
|
||
kind: Deployment
|
||
apiVersion: extensions/v1beta1
|
||
metadata:
|
||
name: dubbo-demo-consumer
|
||
namespace: test
|
||
labels:
|
||
name: dubbo-demo-consumer
|
||
spec:
|
||
replicas: 1
|
||
selector:
|
||
matchLabels:
|
||
name: dubbo-demo-consumer
|
||
template:
|
||
metadata:
|
||
labels:
|
||
app: dubbo-demo-consumer
|
||
name: dubbo-demo-consumer
|
||
spec:
|
||
containers:
|
||
- name: dubbo-demo-consumer
|
||
image: harbor.od.com/app/dubbo-demo-web:tomcat_200307_1410
|
||
imagePullPolicy: IfNotPresent
|
||
ports:
|
||
- containerPort: 8080
|
||
protocol: TCP
|
||
env:
|
||
- name: C_OPTS
|
||
value: -Denv=fat -Dapollo.meta=http://apollo-configservice:8080
|
||
volumeMounts:
|
||
- mountPath: /opt/tomcat/logs
|
||
name: logm
|
||
- name: filebeat
|
||
image: harbor.od.com/infra/filebeat:v7.6.1
|
||
imagePullPolicy: IfNotPresent
|
||
env:
|
||
- name: ENV
|
||
value: test
|
||
- name: PROJ_NAME
|
||
value: dubbo-demo-web
|
||
volumeMounts:
|
||
- mountPath: /logm
|
||
name: logm
|
||
volumes:
|
||
- emptyDir: {}
|
||
name: logm
|
||
imagePullSecrets:
|
||
- name: harbor
|
||
restartPolicy: Always
|
||
terminationGracePeriodSeconds: 30
|
||
securityContext:
|
||
runAsUser: 0
|
||
schedulerName: default-scheduler
|
||
strategy:
|
||
type: RollingUpdate
|
||
rollingUpdate:
|
||
maxUnavailable: 1
|
||
maxSurge: 1
|
||
revisionHistoryLimit: 7
|
||
progressDeadlineSeconds: 600
|
||
~~~
|
||
|
||
> 相关报错(其它问题基本都是网络不稳定的问题):
|
||
>
|
||
> 因为你用的指纹不是自己的,或者版本没写对。
|
||
>
|
||
> **dp.yaml文件解析:** spec-containers下有两个name,对应的两个容器,这就是编车模式
|
||
|
||
~~~
|
||
# 22机器,应用资源清单:
|
||
~]# kubectl apply -f http://k8s-yaml.od.com/test/dubbo-demo-consumer/dp.yaml
|
||
#out: deployment.extensions/dubbo-demo-consumer configured
|
||
~]# kubectl get pods -n test
|
||
~~~
|
||
|
||

|
||
|
||
机器在21机器
|
||
|
||

|
||
|
||
~~~
|
||
# 查看filebeat日志,21机器:
|
||
~]# docker ps -a|grep consumer
|
||
~]# docker exec -ti a6adcd6e83b3 bash
|
||
:/# cd /logm
|
||
:/#/logm# ls
|
||
:/#/logm# cd ..
|
||
# 这个log,是你每一次刷新demo页面都会有数据,你把它夯在这里
|
||
:/# tail -fn 200 /logm/stdout.log
|
||
# 日志就都在这里了
|
||
~~~
|
||
|
||

|
||
|
||
~~~
|
||
# 浏览器输入:demo-test.com/hello?name=tomcat
|
||
~~~
|
||
|
||

|
||
|
||
刷新上面的页面,去21机器看log
|
||
|
||

|
||
|
||
[刷新km.od.com/clusters/kafka-od/topics](km.od.com/clusters/kafka-od/topics)
|
||
|
||

|
||
|
||
完成
|
||
|
||
|
||
|
||
### 部署logstash镜像
|
||
|
||
~~~
|
||
# 200机器,准备镜像、资源清单:
|
||
# logstash的版本需要和es的版本一样,11机器cd /opt/目录下即可查看到
|
||
~]# docker pull logstash:6.8.6
|
||
~]# docker images|grep logstash
|
||
~]# docker tag d0a2dac51fcb harbor.od.com/infra/logstash:v6.8.6
|
||
~]# docker push harbor.od.com/infra/logstash:v6.8.6
|
||
~]# mkdir /etc/logstash
|
||
~]# vi /etc/logstash/logstash-test.conf
|
||
input {
|
||
kafka {
|
||
bootstrap_servers => "10.4.7.11:9092"
|
||
client_id => "10.4.7.200"
|
||
consumer_threads => 4
|
||
group_id => "k8s_test"
|
||
topics_pattern => "k8s-fb-test-.*"
|
||
}
|
||
}
|
||
|
||
filter {
|
||
json {
|
||
source => "message"
|
||
}
|
||
}
|
||
|
||
output {
|
||
elasticsearch {
|
||
hosts => ["10.4.7.12:9200"]
|
||
index => "k8s-test-%{+YYYY.MM.DD}"
|
||
}
|
||
}
|
||
|
||
~]# vi /etc/logstash/logstash-prod.conf
|
||
input {
|
||
kafka {
|
||
bootstrap_servers => "10.4.7.11:9092"
|
||
client_id => "10.4.7.200"
|
||
consumer_threads => 4
|
||
group_id => "k8s_prod"
|
||
topics_pattern => "k8s-fb-prod-.*"
|
||
}
|
||
}
|
||
|
||
filter {
|
||
json {
|
||
source => "message"
|
||
}
|
||
}
|
||
|
||
output {
|
||
elasticsearch {
|
||
hosts => ["10.4.7.12:9200"]
|
||
index => "k8s-prod-%{+YYYY.MM.DD}"
|
||
}
|
||
}
|
||
|
||
# 启动
|
||
~]# docker run -d --name logstash-test -v /etc/logstash:/etc/logstash harbor.od.com/infra/logstash:v6.8.6 -f /etc/logstash/logstash-test.conf
|
||
~]# docker ps -a|grep logstash
|
||
~~~
|
||
|
||
[^:%s/test/prod/g]: 在vi命令行输入左边内容,即可将文本里面得test全部换成prod
|
||
|
||

|
||
|
||
我们刷新demo页面让kafka里面更新些日志
|
||
|
||

|
||
|
||
有日志了
|
||
|
||

|
||
|
||
~~~
|
||
# 200机器,验证ES索引(可能比较慢):
|
||
~]# curl http://10.4.7.12:9200/_cat/indices?v
|
||
~~~
|
||
|
||

|
||
|
||
> 这个反应有点慢,我等了快三分钟
|
||
|
||
完成
|
||
|
||
|
||
|
||
### 交付kibana到K8S集群
|
||
|
||
> 为什么用kibana:当然运维可以直接在dashboard里exec进去,然后命令行看情况,但是开发或者测试不行,那是机密的,我们得要一个页面供他们使用,使用需要kibana。
|
||
|
||
~~~
|
||
~]# docker pull kibana:6.8.6
|
||
~]# docker images|grep kibana
|
||
~]# docker tag adfab5632ef4 harbor.od.com/infra/kibana:v6.8.6
|
||
~]# docker push harbor.od.com/infra/kibana:v6.8.6
|
||
~]# mkdir /data/k8s-yaml/kibana
|
||
~]# cd /data/k8s-yaml/kibana/
|
||
kibana]# vi dp.yaml
|
||
kind: Deployment
|
||
apiVersion: extensions/v1beta1
|
||
metadata:
|
||
name: kibana
|
||
namespace: infra
|
||
labels:
|
||
name: kibana
|
||
spec:
|
||
replicas: 1
|
||
selector:
|
||
matchLabels:
|
||
name: kibana
|
||
template:
|
||
metadata:
|
||
labels:
|
||
app: kibana
|
||
name: kibana
|
||
spec:
|
||
containers:
|
||
- name: kibana
|
||
image: harbor.od.com/infra/kibana:v6.8.6
|
||
imagePullPolicy: IfNotPresent
|
||
ports:
|
||
- containerPort: 5601
|
||
protocol: TCP
|
||
env:
|
||
- name: ELASTICSEARCH_URL
|
||
value: http://10.4.7.12:9200
|
||
imagePullSecrets:
|
||
- name: harbor
|
||
securityContext:
|
||
runAsUser: 0
|
||
strategy:
|
||
type: RollingUpdate
|
||
rollingUpdate:
|
||
maxUnavailable: 1
|
||
maxSurge: 1
|
||
revisionHistoryLimit: 7
|
||
progressDeadlineSeconds: 600
|
||
|
||
kibana]# vi svc.yaml
|
||
kind: Service
|
||
apiVersion: v1
|
||
metadata:
|
||
name: kibana
|
||
namespace: infra
|
||
spec:
|
||
ports:
|
||
- protocol: TCP
|
||
port: 5601
|
||
targetPort: 5601
|
||
selector:
|
||
app: kibana
|
||
|
||
kibana]# vi ingress.yaml
|
||
kind: Ingress
|
||
apiVersion: extensions/v1beta1
|
||
metadata:
|
||
name: kibana
|
||
namespace: infra
|
||
spec:
|
||
rules:
|
||
- host: kibana.od.com
|
||
http:
|
||
paths:
|
||
- path: /
|
||
backend:
|
||
serviceName: kibana
|
||
servicePort: 5601
|
||
~~~
|
||
|
||

|
||
|
||
~~~
|
||
# 11机器,解析域名:
|
||
~]# vi /var/named/od.com.zone
|
||
serial 前滚一位
|
||
kibana A 10.4.7.10
|
||
|
||
~]# systemctl restart named
|
||
~]# dig -t A kibana.od.com @10.4.7.11 +short
|
||
~~~
|
||
|
||

|
||
|
||
~~~
|
||
# 22机器(21机器还夯着log),应用资源清单:
|
||
~]# kubectl apply -f http://k8s-yaml.od.com/kibana/dp.yaml
|
||
~]# kubectl apply -f http://k8s-yaml.od.com/kibana/svc.yaml
|
||
~]# kubectl apply -f http://k8s-yaml.od.com/kibana/ingress.yaml
|
||
~]# kubectl get pods -n infra
|
||
~~~
|
||
|
||

|
||
|
||
[kibana.od.com](kibana.od.com)
|
||
|
||
> 我用的低配8C32G,机器快跑不动了,还会显示service not yet
|
||
|
||

|
||
|
||

|
||
|
||
> 点完可能会转圈转很久
|
||
|
||

|
||
|
||

|
||
|
||
去创建
|
||
|
||

|
||
|
||

|
||
|
||

|
||
|
||
创建后,你就能看到日志
|
||
|
||

|
||
|
||
把prod里的configservice和admin依次起来
|
||
|
||

|
||
|
||
~~~
|
||
# 200机器:
|
||
cd /data/k8s-yaml/prod/dubbo-demo-consumer/
|
||
dubbo-demo-consumer]# cp ../../test/dubbo-demo-consumer/dp.yaml .
|
||
# y
|
||
# 修改namespace为prod,fat改成pro,http地址也改了
|
||
dubbo-demo-consumer]# vi dp.yaml
|
||
~~~
|
||
|
||

|
||
|
||

|
||
|
||
[config-prod.od.com](config-prod.od.com)
|
||
|
||

|
||
|
||
完成
|
||
|
||
|
||
|
||
### 详解Kibana生产实践方法
|
||
|
||
查看环境情况
|
||
|
||
确认Eureka有config和admin
|
||
|
||

|
||
|
||
确认Apollo里有两个环境
|
||
|
||

|
||
|
||
确认完后,我们先把service起来
|
||
|
||

|
||
|
||
然后到consumer,consumer需要接日志
|
||
|
||
~~~~
|
||
# 22机器:
|
||
~]# kubectl apply -f http://k8s-yaml.od.com/prod/dubbo-demo-consumer/dp.yaml
|
||
~]# kubectl get pods -n prod
|
||
~~~~
|
||
|
||

|
||
|
||
~~~
|
||
# 200机器:
|
||
# 启动
|
||
~]# docker run -d --name logstash-prod -v /etc/logstash:/etc/logstash harbor.od.com/infra/logstash:v6.8.6 -f /etc/logstash/logstash-prod.conf
|
||
~]# docker ps -a|grep logstash
|
||
# curl一下,这时候还只有test
|
||
~]# curl http://10.4.7.12:9200/_cat/indices?v
|
||
~~~
|
||
|
||

|
||
|
||
~~~
|
||
# 访问浏览器demo-prod.od.com/hello?name=prod
|
||
~~~
|
||
|
||

|
||
|
||
我们看一下调度到哪个节点了
|
||
|
||

|
||
|
||
~~~
|
||
# 调度到21节点,我们去21节点看一下:
|
||
~]# docker ps -a|grep consumer
|
||
~]# docker exec -ti 094e68c795b0 bash
|
||
:/# cd /logm
|
||
:/logm# ls
|
||
:/logm# tail -fn 200 stdout.log
|
||
~~~
|
||
|
||

|
||
|
||
夯住
|
||
|
||

|
||
|
||
去[http://km.od.com](http://km.od.com/)
|
||
|
||

|
||
|
||

|
||
|
||

|
||
|
||
已经有prod了
|
||
|
||
~~~
|
||
# 200机器,curl的时候可能要等一下才有(可以去多刷一下网页产生日志):
|
||
~]# curl http://10.4.7.12:9200/_cat/indices?v
|
||
~~~
|
||
|
||

|
||
|
||
去kibana配一下
|
||
|
||

|
||
|
||

|
||
|
||
##### 如何使用kibana
|
||
|
||
时间选择
|
||
|
||

|
||
|
||

|
||
|
||
> test没用数据的点下这个就有了,平常用的最多的也是today,后面突然没数据了你就可以刷新或者点时间,特别是配置差的同学
|
||
>
|
||
> 
|
||
|
||
环境选择器
|
||
|
||

|
||
|
||
关键字选择器
|
||
|
||
先把message顶上来,还有log.file.path、hostname
|
||
|
||

|
||
|
||
我们先制造一些错误,把service scale成0
|
||
|
||

|
||
|
||
然后刷新一下页面,让它报错,记得是test环境
|
||
|
||

|
||
|
||
搜exception关键字,并可展开
|
||
|
||

|
||
|
||

|
||
|
||
现在consumer日志已经完成了,记得把service的pod还原,并删掉consumer的pod让它重启
|
||
|
||
|
||
|
||
#### 课外作业(不是一定要完成,但是你做了我做的这些)
|
||
|
||
consumer日志已经完成,还可以做service日志
|
||
|
||
~~~
|
||
# 200机器:
|
||
# 修改一下内容
|
||
~]# cd /data/dockerfile/jre8/
|
||
# 修改以下内容
|
||
jre8]# vi entrypoint.sh
|
||
exec java -jar ${M_OPTS} ${C_OPTS} ${JAR_BALL} 2>&1 >> /opt/logs/stdout.log
|
||
|
||
jre8]# docker build . -t harbor.od.com/base/jre8:8u112_with_logs
|
||
jre8]# docker push harbor.od.com/base/jre8:8u112_with_logs
|
||
~~~
|
||
|
||

|
||
|
||

|
||
|
||
去修改一下Jenkins加一个底包
|
||
|
||

|
||
|
||

|
||
|
||

|
||
|
||
下面就要你接着做了
|
||
|