-
Notifications
You must be signed in to change notification settings - Fork 0
mo ob CI v0.7.0说明文档
mo可观测项目的ci流程使用github action的workflow作为支持,在项目根目录的 .github
文件夹下存放了ci中的配置文件以及脚本,而github则会执行 workflows
文件夹下定义的 .yml
工作流文件
.github
├── CODEOWNERS
├── actions 存放预定义的操作
│ └── setup_env
│ └── action.yml
├── observability 存放ci需要用到的docker-compose、脚本、配置
│ ├── agent-prometheus.yml prometheus agent的配置
│ ├── alertmanager.yml alertmanager的配置
│ ├── docker-compose.yaml 一键启动
│ ├── fluent-bit.conf fluent-bit的配置
│ ├── mo-agent.yaml mo-agent的配置
│ ├── mo-ruler.yaml mo-ruler的配置
│ ├── promql-test-config.yml mo-ruler的promql查询测试插件的配置
│ ├── promql-test-queries.yml mo-ruler的promql查询测试用例集
│ ├── ruler-prometheus.yml mo-ruler的规则读取配置
│ ├── rules 测试用告警规则
│ │ └── test_rule.yaml
│ ├── script ci流程用到的脚本文件
│ │ ├── alertmanager_template.yml
│ │ ├── data_check.go
│ │ ├── generate_smtp.go
│ │ └── mock-server.go
│ └── sql mo预先执行的sql建表语句
│ ├── log.sql
│ ├── metric.sql
│ └── trace.sql
└── workflows 存放工作流文件
├── ci.yml
├── docker-image.yml
└── email_chack.yml
我们定义了3个工作流分别:
-
MO Observability CI - ci.yml
在提PR时自动执行 -
MO Observability Email Check - email_chack.yml
定时任务在每天的早上10点30分执行(github action实际执行时间会有所延迟) -
Build Docker Image - docker-image.yml
目前仅手动执行
都支持手动执行
上述两个工作流都依赖于 .github/observability/docker-compose.yaml
定义的docker-compose套件,我们将会在工作流中使用该compose一次性启动测试所需要的所有 Observability 组件(也注释了一些工具如grafana),我们也可以在自己的机器上部署测试:
cd .github/observability
# 部署
docker-compose up -d
# 删除所有相关容器、数据卷、网络定义
docker-compose down -v
在ci流程中,docker-compose将会启动以下组件:
- fluent-bit
- prometheus-agent
- matrixone
- mo-agent
- mo-ruler
- alertmanager
CI流程主要测试可观测系统的3个核心功能:
- 数据写入:数据源发送metric、log、trace数据至mo-agent并被写成csv文件持久化,mo数据库可以正常读取到这些数据
- 告警规则执行:mo-ruler读取并解析规则文件,正确执行,在满足条件的情况下发送告警至alertmanager令其发邮件告警
- PromQL数据查询:支持绝大部分PromQL语句的查询并返回正确的时间序列数据
关于数据写入,数据源有三:
- metrics:prometheus-agent 作为数据源,将会使用sql分别查询四种类型的指标数据
- logs:fluent-bit 作为数据源,使用dummy测试插件,定义发送50条
message='test[12345]'
的日志信息 - trace:执行
example/trace/example.go
在ci流程中,将执行go脚本文件进行验证,其中关于metrics和logs的数据写入验证将执行.github/observability/script/data_check.go
,其实就是在agent、mo启动后执行sql语句看是否能查询到对应的数据,如果能,则说明mo-agent能够顺利接收数据源的数据并写入到csv文件中
-- metrics from prometheus agent
-- counter type metric
SELECT * FROM observability.metrics WHERE name='prometheus_agent_samples_appended_total' LIMIT 10;
-- gauge type metric
SELECT * FROM observability.metrics WHERE name='prometheus_target_metadata_cache_bytes' LIMIT 10;
-- summary type metric
SELECT * FROM observability.metrics WHERE name='prometheus_agent_data_replay_duration_seconds' LIMIT 10;
-- histogram type metric
SELECT * FROM observability.metrics WHERE name='prometheus_http_request_duration_seconds_bucket' LIMIT 10;
-- logs from fluentbit
SELECT * FROM observability.logs WHERE message='test1' LIMIT 10;
SELECT * FROM observability.logs WHERE message='test2' LIMIT 10;
SELECT * FROM observability.logs WHERE message='test3' LIMIT 10;
SELECT * FROM observability.logs WHERE message='test4' LIMIT 10;
SELECT * FROM observability.logs WHERE message='test5' LIMIT 10;
mo-ruler将会读取 .github/observability/rules
中定义的告警规则,如果启动后能收到告警邮件,并能进入在邮件中的alertmanager网页地址查看到与一下告警规则同名的信息即为验证成功
groups:
- name: test-ci
rules:
- alert: JustATest
expr: up{instance="localhost:9090", job="prometheus"} == 1
for: 1m
labels:
severity: info
annotations:
summary: Just a test
ci流程中,我们将引入prometheus官方的promql查询完备性检验插件 进行测试,经测试,MO-Ruler可通过了除含有正则表达式的promql语句之外的所有查询测试
# ci.yaml
- name: promql query test
run: make promql-test
# makefile
.PHONY: promql-test
promql-test:
git clone https://github.com/prometheus/compliance.git
cd ./compliance/promql && go get -u golang.org/x/sys && go build ./cmd/promql-compliance-tester
./compliance/promql/promql-compliance-tester -config-file=$(PROMQL_TEST_QUERIES) -config-file=$(PROMQL_TEST_CONFIG)
目的是检测alertmanager发送邮件的功能是否正常,邮件是否能顺利发出,这里其实就是定时执行一次MO Observability CI的第二步: 告警规则执行
设定是每天10:30执行,根据官方文档介绍,github action不保证时间准确,实测会有10-20分钟的误差,但这无伤大雅
name: MO Observability CI
on:
pull_request:
types: [ opened, synchronize, reopened ]
branches: [ main,'[0-9]+.[0-9]+.[0-9]+' ]
workflow_dispatch:
concurrency:
group: ${{ github.event.pull_request.head.repo.full_name}}/${{ github.event.pull_request.head.ref }}/${{ github.workflow }}
cancel-in-progress: true
jobs:
ut:
runs-on: ubuntu-latest
name: UT Test for MO-Agent and MO-Ruler
steps:
- uses: actions/checkout@v3
with:
fetch-depth: '3'
repository: ${{ github.event.pull_request.head.repo.full_name }}
ref: ${{ github.event.pull_request.head.ref }}
- name: Set up Go
uses: ./.github/actions/setup_env
- name: Set env
run: |
echo "endpoint=${{ secrets.S3ENDPOINT }}" >> $GITHUB_ENV
echo "region=${{ secrets.S3REGION }}" >> $GITHUB_ENV
echo "apikey=${{ secrets.S3APIKEY }}" >> $GITHUB_ENV
echo "apisecret=${{ secrets.S3APISECRET }}" >> $GITHUB_ENV
echo "bucket=${{ secrets.S3BUCKET }}" >> $GITHUB_ENV
- name: Unit Testing
run: |
cd $GITHUB_WORKSPACE && make clean
make ut
observability-test:
runs-on: ubuntu-latest
name: Observability Test
steps:
- uses: actions/checkout@v3
with:
fetch-depth: '3'
repository: ${{ github.event.pull_request.head.repo.full_name }}
ref: ${{ github.event.pull_request.head.ref }}
- name: Start docker-compose containers
run: docker-compose -f "./.github/observability/docker-compose.yaml" up -d --build
- name: Set up Go
uses: ./.github/actions/setup_env
- name: test observability
run: make observability-test
- name: promql query test
run: make promql-test
# 调试 ssh
# - name: Setup upterm session
# uses: lhotari/action-upterm@v1
- name: Stop containers
if: always()
run: docker-compose -f "./.github/observability/docker-compose.yaml" down
# Default workflow
name: MO Observability Email Check
on:
schedule:
- cron: "30 2 * * *"
workflow_dispatch:
concurrency:
group: ${{ github.event.pull_request.head.repo.full_name}}/${{ github.event.pull_request.head.ref }}/${{ github.workflow }}
cancel-in-progress: true
jobs:
Observability-Test:
runs-on: ubuntu-latest
name: observability test
steps:
- uses: actions/checkout@v3
with:
fetch-depth: '3'
repository: ${{ github.event.pull_request.head.repo.full_name }}
ref: ${{ github.event.pull_request.head.ref }}
- name: Start docker-compose containers
run: docker-compose -f "./.github/observability/docker-compose.yaml" up -d --build
- name: Set up Go
uses: ./.github/actions/setup_env
- name: test observability
run: make observability-test
# 调试 ssh
# - name: Setup upterm session
# uses: lhotari/action-upterm@v1
- name: Stop containers
if: always()
run: docker-compose -f "./.github/observability/docker-compose.yaml" down