rsyslog/REFACTOR_SUMMARY.md
dvirlabs db28c9da82
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Migrate from pushgateway to infinity
2026-04-21 12:41:09 +03:00

9.6 KiB
Raw Blame History

Implementation Summary: Pushgateway → gitops-status-server

Status: Complete

This document summarizes the refactoring of the rsyslog GitOps monitoring flow to use a centralized gitops-status-server instead of Pushgateway.


What Was Replaced

Old Architecture (Pushgateway-based)

Drift-check runs
    ↓
Exit code: 0 (synced) or 1 (drift)
    ↓
Send metric to Pushgateway
    ↓
Prometheus scrapes Pushgateway
    ↓
gitops_sync_status{repo="rsyslog",server="rsyslog-lab"} = 0 or 1
    ↓
Grafana queries Prometheus
    ↓
Dashboard shows only: SYNCED or OUT_OF_SYNC

Limitations:

  • Only 0/1 metric (no file-level details)
  • Requires Pushgateway, Prometheus infrastructure
  • Cannot show which files changed

New Architecture (gitops-status-server)

Drift-check runs + outputs DRIFTED_FILES=...
    ↓
update-gitops-status.sh script:
  1. Parse changed files
  2. Generate JSON
  3. POST to gitops-status-server
    ↓
gitops-status-server
    ↓
Serves /status.json with rich metadata
    ↓
Grafana Infinity datasource reads /status.json
    ↓
Dashboard shows:
  - Sync status
  - Drift count
  - List of changed files
  - Last check timestamp

Advantages:

  • ✓ Rich metadata (file-level details)
  • ✓ No Pushgateway/Prometheus for this use case
  • ✓ Centralized gitops-status-server
  • ✓ Easier to audit (JSON snapshot)
  • ✓ Better for multi-server/multi-repo

Files Changed

1. .woodpecker.yml (MAJOR UPDATE)

Before (Pushgateway):

update-sync-metric:
  commands:
    - printf 'gitops_sync_status{repo="rsyslog",server="rsyslog-lab"} %s\n' "$STATUS" | \
      curl ... --data-binary @- "$PUSHGATEWAY_URL/metrics/job/gitops_rsyslog/..."

After (gitops-status-server):

update-gitops-status:
  commands:
    - chmod +x update-gitops-status.sh
    - ./update-gitops-status.sh
  environment:
    GITOPS_STATUS_SERVER_URL: http://gitops-status-server.observability-stack.svc.cluster.local:80
    REPO_NAME: rsyslog
    SERVER_NAME: rsyslog-lab

Changes:

  • Removed PUSHGATEWAY_URL environment variable
  • Removed metric push command
  • Added script execution
  • Added GITOPS_STATUS_SERVER_URL configuration
  • Both update-gitops-status and gitops_sync_check steps now use the script

2. ansible/playbooks/drift-check.yml (ADDED OUTPUT)

Before:

- name: Fail if drift detected
  ansible.builtin.fail:
    msg: "Configuration drift detected..."
  when: drift_detected

After (ADDED before the fail task):

# New: Build structured list of changed files
- name: Initialize list of drifted files
  ansible.builtin.set_fact:
    drifted_files: []

- name: Add main config to drifted files if changed
  ansible.builtin.set_fact:
    drifted_files: "{{ drifted_files + ['/etc/rsyslog.conf'] }}"
  when: main_config_check.changed

# ... (more file collection tasks)

# New: Output structured markers for parsing
- name: Output structured list of drifted files
  ansible.builtin.debug:
    msg: "DRIFTED_FILES={{ drifted_files | join(',') }}"
  when: drift_detected

- name: Output sync status marker
  ansible.builtin.debug:
    msg: "SYNC_STATUS=OUT_OF_SYNC"
  when: drift_detected

Changes:

  • Builds list of drifted files in drifted_files fact
  • Outputs DRIFTED_FILES=file1,file2,file3 for script parsing
  • Outputs SYNC_STATUS=SYNCED or SYNC_STATUS=OUT_OF_SYNC markers
  • Original drift detection logic unchanged

3. update-gitops-status.sh (CORE SCRIPT)

New file created: Orchestrates the entire flow

Key functionality:

  1. Runs drift-check.yml playbook
  2. Captures output to temp file
  3. Parses DRIFTED_FILES=... and SYNC_STATUS=... markers
  4. Extracts changed file names
  5. Converts /etc/rsyslog.confrsyslog.conf (relative paths)
  6. Generates JSON with metadata
  7. POSTs JSON to gitops-status-server API

4-step process:

Step 1/4: Running drift-check playbook...
Step 2/4: Analyzing drift detection results...
Step 3/4: Building JSON payload...
Step 4/4: Sending status to gitops-status-server...

Generated JSON Format

Synced State:

{
  "repo": "rsyslog",
  "server": "rsyslog-lab",
  "sync_status": "SYNCED",
  "drift_count": 0,
  "files": [],
  "last_check": "2026-04-21T10:30:00Z"
}

Out of Sync State:

{
  "repo": "rsyslog",
  "server": "rsyslog-lab",
  "sync_status": "OUT_OF_SYNC",
  "drift_count": 2,
  "files": [
    { "name": "rsyslog.conf" },
    { "name": "rsyslog.d/30-lab.conf" }
  ],
  "last_check": "2026-04-21T10:30:00Z"
}

Data Flow Example

Scenario: Manual edit on server

  1. Manual change: Someone edits /etc/rsyslog.conf directly on server
  2. Cron trigger: Scheduled cron job runs (every 2 minutes)
  3. Woodpecker step: gitops_sync_check executes update-gitops-status.sh
  4. Drift detection: drift-check.yml runs and detects change
  5. Output parsing: Script extracts:
    • DRIFTED_FILES=/etc/rsyslog.conf
    • SYNC_STATUS=OUT_OF_SYNC
  6. JSON generation:
    {
      "repo": "rsyslog",
      "server": "rsyslog-lab",
      "sync_status": "OUT_OF_SYNC",
      "drift_count": 1,
      "files": [{ "name": "rsyslog.conf" }],
      "last_check": "2026-04-21T10:32:00Z"
    }
    
  7. API POST: Script POSTs JSON to:
    • URL: http://gitops-status-server.observability-stack.svc.cluster.local:80/api/status
    • Method: POST
    • Content-Type: application/json
  8. Server update: gitops-status-server receives JSON and updates /status.json
  9. Grafana update: Infinity datasource refreshes and displays new status
  10. Result: Dashboard shows OUT_OF_SYNC with rsyslog.conf listed

Time to detection: ≤ 2 minutes


Integration Points

Woodpecker Events Handled

  1. Pull Request:

    • syntax-check → validate (no drift check)
    • No gitops-status update
  2. Push to Master:

    • syntax-check → validate → deploy → update-gitops-status
    • After deployment, immediately verify sync and update status
  3. Scheduled Cron:

    • gitops_sync_check (every 2 minutes by default)
    • Continuous drift monitoring

Configuration

Required Environment Variables

GITOPS_STATUS_SERVER_URL: http://gitops-status-server.observability-stack.svc.cluster.local:80
REPO_NAME: rsyslog
SERVER_NAME: rsyslog-lab
SSH_PRIVATE_KEY: from_secret: SSH_PRIVATE_KEY
ANSIBLE_CONFIG: ansible.cfg

Cron Job Setup (Woodpecker UI)

  • Name: gitops_sync_check
  • Branch: master
  • Schedule: */2 * * * *

Backward Compatibility

  • Existing deploy logic: Unchanged (apply.yml still used)
  • Existing drift detection: Enhanced (now outputs file names)
  • PR validation: Unchanged (syntax-check, validate still used)
  • Server files: No changes needed

Security

  • ✓ SSH credentials in Woodpecker secrets (not exposed)
  • ✓ JSON contains only metadata (file names, counts, timestamps)
  • ✓ No actual rsyslog config contents exposed
  • ✓ Internal Kubernetes communication (ClusterIP)
  • ✓ No Pushgateway exposure

Testing Checklist

  • Cron job is created in Woodpecker
  • Cron job runs on schedule (every 2 minutes)
  • update-gitops-status.sh script is executable
  • Script runs successfully (HTTP 200 response)
  • gitops-status-server receives JSON POSTs
  • JSON format matches expected schema
  • Grafana dashboard displays sync status
  • Changed files appear in Grafana panel
  • Manual file edit on server is detected
  • Post-deployment status updates correctly

Migration Steps

  1. Commit and push changes:

    git add .woodpecker.yml ansible/playbooks/drift-check.yml update-gitops-status.sh
    git commit -m "refactor: replace pushgateway with gitops-status-server"
    git push
    
  2. Verify pipeline runs successfully

    • Check Woodpecker logs for new steps
  3. Create Woodpecker cron job

    • Name: gitops_sync_check
    • Schedule: */2 * * * *
  4. Test cron execution

    • Wait for cron trigger (within 2 minutes)
    • Verify JSON is sent to gitops-status-server
  5. Verify Grafana dashboard

    • Confirm Infinity datasource can read gitops-status-server
    • Dashboard shows sync status and changed files
  6. Monitor for 24 hours

    • Verify cron runs consistently
    • Check for any HTTP errors
    • Confirm drift detection works
  7. Decommission Pushgateway (when confident)

    • Stop sending metrics to Pushgateway
    • Remove Pushgateway from infrastructure

Rollback Plan

If issues arise:

  1. Revert Woodpecker changes:

    git revert <commit-hash>
    git push
    
  2. Remove cron job:

    • Delete gitops_sync_check from Woodpecker UI
  3. Restore Pushgateway metric push (if keeping Prometheus monitoring)


Key Improvements

Metric Old New
Data richness 0/1 only JSON with file names
Setup complexity Pushgateway + Prometheus Single service call
Audit trail Basic Structured snapshots
File-level visibility None Complete list
Update frequency After deployment Every 2 minutes + post-deploy
Infrastructure 2+ services 1 service (gitops-status-server)

Documentation Files

  1. GITOPS_STATUS_SERVER_INTEGRATION.md Comprehensive documentation
  2. QUICK_REFERENCE.md Quick start and troubleshooting
  3. IMPLEMENTATION_SUMMARY.md This file

Support

For issues, consult:

  1. .woodpecker.yml comments
  2. update-gitops-status.sh comments
  3. drift-check.yml comments
  4. Full documentation in GITOPS_STATUS_SERVER_INTEGRATION.md
  5. Woodpecker pipeline logs
  6. gitops-status-server application logs