Skip to content

mo ob CI v0.7.0说明文档

Jackson edited this page Jan 29, 2024 · 2 revisions

CI Background

mo可观测项目的ci流程使用github action的workflow作为支持,在项目根目录的 .github 文件夹下存放了ci中的配置文件以及脚本,而github则会执行 workflows 文件夹下定义的 .yml 工作流文件

.github
├── CODEOWNERS
├── actions                      存放预定义的操作
│   └── setup_env
│       └── action.yml
├── observability                存放ci需要用到的docker-compose、脚本、配置
│   ├── agent-prometheus.yml     prometheus agent的配置
│   ├── alertmanager.yml         alertmanager的配置
│   ├── docker-compose.yaml      一键启动
│   ├── fluent-bit.conf          fluent-bit的配置
│   ├── mo-agent.yaml            mo-agent的配置
│   ├── mo-ruler.yaml            mo-ruler的配置
│   ├── promql-test-config.yml   mo-ruler的promql查询测试插件的配置
│   ├── promql-test-queries.yml  mo-ruler的promql查询测试用例集
│   ├── ruler-prometheus.yml     mo-ruler的规则读取配置
│   ├── rules                    测试用告警规则
│   │   └── test_rule.yaml
│   ├── script                   ci流程用到的脚本文件
│   │   ├── alertmanager_template.yml
│   │   ├── data_check.go
│   │   ├── generate_smtp.go
│   │   └── mock-server.go
│   └── sql                      mo预先执行的sql建表语句
│       ├── log.sql
│       ├── metric.sql
│       └── trace.sql
└── workflows                    存放工作流文件
    ├── ci.yml
    ├── docker-image.yml
    └── email_chack.yml

我们定义了3个工作流分别:

  1. MO Observability CI - ci.yml 在提PR时自动执行
  2. MO Observability Email Check - email_chack.yml 定时任务在每天的早上10点30分执行(github action实际执行时间会有所延迟)
  3. Build Docker Image - docker-image.yml 目前仅手动执行

都支持手动执行

CI 中 Observability 组件的部署

上述两个工作流都依赖于 .github/observability/docker-compose.yaml 定义的docker-compose套件,我们将会在工作流中使用该compose一次性启动测试所需要的所有 Observability 组件(也注释了一些工具如grafana),我们也可以在自己的机器上部署测试:

cd .github/observability
# 部署
docker-compose up -d
# 删除所有相关容器、数据卷、网络定义
docker-compose down -v

在ci流程中,docker-compose将会启动以下组件:

  • fluent-bit
  • prometheus-agent
  • matrixone
  • mo-agent
  • mo-ruler
  • alertmanager

MO Observability CI

CI流程主要测试可观测系统的3个核心功能:

  1. 数据写入:数据源发送metric、log、trace数据至mo-agent并被写成csv文件持久化,mo数据库可以正常读取到这些数据
  2. 告警规则执行:mo-ruler读取并解析规则文件,正确执行,在满足条件的情况下发送告警至alertmanager令其发邮件告警
  3. PromQL数据查询:支持绝大部分PromQL语句的查询并返回正确的时间序列数据

数据写入

关于数据写入,数据源有三:

  1. metrics:prometheus-agent 作为数据源,将会使用sql分别查询四种类型的指标数据
  2. logs:fluent-bit 作为数据源,使用dummy测试插件,定义发送50条 message='test[12345]'的日志信息
  3. trace:执行 example/trace/example.go

在ci流程中,将执行go脚本文件进行验证,其中关于metrics和logs的数据写入验证将执行.github/observability/script/data_check.go,其实就是在agent、mo启动后执行sql语句看是否能查询到对应的数据,如果能,则说明mo-agent能够顺利接收数据源的数据并写入到csv文件中

-- metrics from prometheus agent 
-- counter type metric
SELECT * FROM observability.metrics WHERE name='prometheus_agent_samples_appended_total' LIMIT 10;
-- gauge type metric
SELECT * FROM observability.metrics WHERE name='prometheus_target_metadata_cache_bytes' LIMIT 10;
-- summary type metric
SELECT * FROM observability.metrics WHERE name='prometheus_agent_data_replay_duration_seconds' LIMIT 10;
-- histogram type metric
SELECT * FROM observability.metrics WHERE name='prometheus_http_request_duration_seconds_bucket' LIMIT 10;

-- logs from fluentbit
SELECT * FROM observability.logs WHERE message='test1' LIMIT 10;
SELECT * FROM observability.logs WHERE message='test2' LIMIT 10;
SELECT * FROM observability.logs WHERE message='test3' LIMIT 10;
SELECT * FROM observability.logs WHERE message='test4' LIMIT 10;
SELECT * FROM observability.logs WHERE message='test5' LIMIT 10;

告警规则执行

mo-ruler将会读取 .github/observability/rules中定义的告警规则,如果启动后能收到告警邮件,并能进入在邮件中的alertmanager网页地址查看到与一下告警规则同名的信息即为验证成功

groups:
- name: test-ci
  rules:
  - alert: JustATest
    expr: up{instance="localhost:9090", job="prometheus"} == 1
    for: 1m
    labels:
      severity: info
    annotations:
      summary: Just a test

PromQL查询

ci流程中,我们将引入prometheus官方的promql查询完备性检验插件 进行测试,经测试,MO-Ruler可通过了除含有正则表达式的promql语句之外的所有查询测试

# ci.yaml
- name: promql query test
  run: make promql-test

# makefile
.PHONY: promql-test
promql-test:
    git clone https://github.com/prometheus/compliance.git
    cd ./compliance/promql && go get -u golang.org/x/sys && go build ./cmd/promql-compliance-tester
    ./compliance/promql/promql-compliance-tester -config-file=$(PROMQL_TEST_QUERIES) -config-file=$(PROMQL_TEST_CONFIG)

MO Observability Email Check

目的是检测alertmanager发送邮件的功能是否正常,邮件是否能顺利发出,这里其实就是定时执行一次MO Observability CI的第二步: 告警规则执行

设定是每天10:30执行,根据官方文档介绍,github action不保证时间准确,实测会有10-20分钟的误差,但这无伤大雅

workflow文件预览

ci.yml

name: MO Observability CI
on: 
  pull_request:
    types: [ opened, synchronize, reopened ]
    branches: [ main,'[0-9]+.[0-9]+.[0-9]+' ]
  workflow_dispatch:

concurrency: 
  group: ${{ github.event.pull_request.head.repo.full_name}}/${{ github.event.pull_request.head.ref }}/${{ github.workflow }}
  cancel-in-progress: true

jobs: 
  ut:
    runs-on: ubuntu-latest
    name: UT Test for MO-Agent and MO-Ruler
    steps:
      - uses: actions/checkout@v3
        with:
          fetch-depth: '3'
          repository: ${{ github.event.pull_request.head.repo.full_name }}
          ref: ${{ github.event.pull_request.head.ref }}

      - name: Set up Go
        uses: ./.github/actions/setup_env

      - name: Set env
        run: |
          echo "endpoint=${{ secrets.S3ENDPOINT }}" >> $GITHUB_ENV
          echo "region=${{ secrets.S3REGION }}" >> $GITHUB_ENV
          echo "apikey=${{ secrets.S3APIKEY }}" >> $GITHUB_ENV
          echo "apisecret=${{ secrets.S3APISECRET }}" >> $GITHUB_ENV
          echo "bucket=${{ secrets.S3BUCKET }}" >> $GITHUB_ENV

      - name: Unit Testing
        run: |
          cd $GITHUB_WORKSPACE && make clean 
          make ut
  
  observability-test:
    runs-on: ubuntu-latest
    name: Observability Test
    steps:
      - uses: actions/checkout@v3
        with:
          fetch-depth: '3'
          repository: ${{ github.event.pull_request.head.repo.full_name }}
          ref: ${{ github.event.pull_request.head.ref }}

      - name: Start docker-compose containers
        run: docker-compose -f "./.github/observability/docker-compose.yaml" up -d --build
  
      - name: Set up Go
        uses: ./.github/actions/setup_env
  
      - name: test observability
        run: make observability-test

      - name: promql query test
        run: make promql-test
    
      # 调试 ssh
      # - name: Setup upterm session
      #   uses: lhotari/action-upterm@v1
  
      - name: Stop containers
        if: always()
        run: docker-compose -f "./.github/observability/docker-compose.yaml" down

email_chack.yml

# Default workflow
name: MO Observability Email Check


on:
  schedule:
    - cron: "30 2 * * *"
  workflow_dispatch:

concurrency: 
  group: ${{ github.event.pull_request.head.repo.full_name}}/${{ github.event.pull_request.head.ref }}/${{ github.workflow }}
  cancel-in-progress: true

jobs: 
  Observability-Test:
    runs-on: ubuntu-latest
    name: observability test
    steps:
      - uses: actions/checkout@v3
        with:
          fetch-depth: '3'
          repository: ${{ github.event.pull_request.head.repo.full_name }}
          ref: ${{ github.event.pull_request.head.ref }}

      - name: Start docker-compose containers
        run: docker-compose -f "./.github/observability/docker-compose.yaml" up -d --build
  
      - name: Set up Go
        uses: ./.github/actions/setup_env
  
      - name: test observability
        run: make observability-test
  
      # 调试 ssh
      # - name: Setup upterm session
      #   uses: lhotari/action-upterm@v1
  
      - name: Stop containers
        if: always()
        run: docker-compose -f "./.github/observability/docker-compose.yaml" down
Clone this wiki locally