rsyslog/REFACTOR_SUMMARY.md
dvirlabs db28c9da82
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Migrate from pushgateway to infinity
2026-04-21 12:41:09 +03:00

383 lines
9.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Implementation Summary: Pushgateway → gitops-status-server
## Status: ✅ Complete
This document summarizes the refactoring of the rsyslog GitOps monitoring flow to use a centralized gitops-status-server instead of Pushgateway.
---
## What Was Replaced
### Old Architecture (Pushgateway-based)
```
Drift-check runs
Exit code: 0 (synced) or 1 (drift)
Send metric to Pushgateway
Prometheus scrapes Pushgateway
gitops_sync_status{repo="rsyslog",server="rsyslog-lab"} = 0 or 1
Grafana queries Prometheus
Dashboard shows only: SYNCED or OUT_OF_SYNC
```
**Limitations:**
- Only 0/1 metric (no file-level details)
- Requires Pushgateway, Prometheus infrastructure
- Cannot show which files changed
---
### New Architecture (gitops-status-server)
```
Drift-check runs + outputs DRIFTED_FILES=...
update-gitops-status.sh script:
1. Parse changed files
2. Generate JSON
3. POST to gitops-status-server
gitops-status-server
Serves /status.json with rich metadata
Grafana Infinity datasource reads /status.json
Dashboard shows:
- Sync status
- Drift count
- List of changed files
- Last check timestamp
```
**Advantages:**
- ✓ Rich metadata (file-level details)
- ✓ No Pushgateway/Prometheus for this use case
- ✓ Centralized gitops-status-server
- ✓ Easier to audit (JSON snapshot)
- ✓ Better for multi-server/multi-repo
---
## Files Changed
### 1. `.woodpecker.yml` (MAJOR UPDATE)
#### Before (Pushgateway):
```yaml
update-sync-metric:
commands:
- printf 'gitops_sync_status{repo="rsyslog",server="rsyslog-lab"} %s\n' "$STATUS" | \
curl ... --data-binary @- "$PUSHGATEWAY_URL/metrics/job/gitops_rsyslog/..."
```
#### After (gitops-status-server):
```yaml
update-gitops-status:
commands:
- chmod +x update-gitops-status.sh
- ./update-gitops-status.sh
environment:
GITOPS_STATUS_SERVER_URL: http://gitops-status-server.observability-stack.svc.cluster.local:80
REPO_NAME: rsyslog
SERVER_NAME: rsyslog-lab
```
**Changes:**
- Removed `PUSHGATEWAY_URL` environment variable
- Removed metric push command
- Added script execution
- Added `GITOPS_STATUS_SERVER_URL` configuration
- Both `update-gitops-status` and `gitops_sync_check` steps now use the script
---
### 2. `ansible/playbooks/drift-check.yml` (ADDED OUTPUT)
#### Before:
```yaml
- name: Fail if drift detected
ansible.builtin.fail:
msg: "Configuration drift detected..."
when: drift_detected
```
#### After (ADDED before the fail task):
```yaml
# New: Build structured list of changed files
- name: Initialize list of drifted files
ansible.builtin.set_fact:
drifted_files: []
- name: Add main config to drifted files if changed
ansible.builtin.set_fact:
drifted_files: "{{ drifted_files + ['/etc/rsyslog.conf'] }}"
when: main_config_check.changed
# ... (more file collection tasks)
# New: Output structured markers for parsing
- name: Output structured list of drifted files
ansible.builtin.debug:
msg: "DRIFTED_FILES={{ drifted_files | join(',') }}"
when: drift_detected
- name: Output sync status marker
ansible.builtin.debug:
msg: "SYNC_STATUS=OUT_OF_SYNC"
when: drift_detected
```
**Changes:**
- Builds list of drifted files in `drifted_files` fact
- Outputs `DRIFTED_FILES=file1,file2,file3` for script parsing
- Outputs `SYNC_STATUS=SYNCED` or `SYNC_STATUS=OUT_OF_SYNC` markers
- Original drift detection logic unchanged
---
### 3. `update-gitops-status.sh` (CORE SCRIPT)
**New file created:** Orchestrates the entire flow
**Key functionality:**
1. Runs `drift-check.yml` playbook
2. Captures output to temp file
3. Parses `DRIFTED_FILES=...` and `SYNC_STATUS=...` markers
4. Extracts changed file names
5. Converts `/etc/rsyslog.conf``rsyslog.conf` (relative paths)
6. Generates JSON with metadata
7. POSTs JSON to gitops-status-server API
**4-step process:**
```
Step 1/4: Running drift-check playbook...
Step 2/4: Analyzing drift detection results...
Step 3/4: Building JSON payload...
Step 4/4: Sending status to gitops-status-server...
```
---
## Generated JSON Format
### Synced State:
```json
{
"repo": "rsyslog",
"server": "rsyslog-lab",
"sync_status": "SYNCED",
"drift_count": 0,
"files": [],
"last_check": "2026-04-21T10:30:00Z"
}
```
### Out of Sync State:
```json
{
"repo": "rsyslog",
"server": "rsyslog-lab",
"sync_status": "OUT_OF_SYNC",
"drift_count": 2,
"files": [
{ "name": "rsyslog.conf" },
{ "name": "rsyslog.d/30-lab.conf" }
],
"last_check": "2026-04-21T10:30:00Z"
}
```
---
## Data Flow Example
### Scenario: Manual edit on server
1. **Manual change:** Someone edits `/etc/rsyslog.conf` directly on server
2. **Cron trigger:** Scheduled cron job runs (every 2 minutes)
3. **Woodpecker step:** `gitops_sync_check` executes `update-gitops-status.sh`
4. **Drift detection:** `drift-check.yml` runs and detects change
5. **Output parsing:** Script extracts:
- `DRIFTED_FILES=/etc/rsyslog.conf`
- `SYNC_STATUS=OUT_OF_SYNC`
6. **JSON generation:**
```json
{
"repo": "rsyslog",
"server": "rsyslog-lab",
"sync_status": "OUT_OF_SYNC",
"drift_count": 1,
"files": [{ "name": "rsyslog.conf" }],
"last_check": "2026-04-21T10:32:00Z"
}
```
7. **API POST:** Script POSTs JSON to:
- URL: `http://gitops-status-server.observability-stack.svc.cluster.local:80/api/status`
- Method: POST
- Content-Type: application/json
8. **Server update:** gitops-status-server receives JSON and updates `/status.json`
9. **Grafana update:** Infinity datasource refreshes and displays new status
10. **Result:** Dashboard shows OUT_OF_SYNC with rsyslog.conf listed
**Time to detection:** ≤ 2 minutes
---
## Integration Points
### Woodpecker Events Handled
1. **Pull Request:**
- syntax-check → validate (no drift check)
- No gitops-status update
2. **Push to Master:**
- syntax-check → validate → deploy → **update-gitops-status**
- After deployment, immediately verify sync and update status
3. **Scheduled Cron:**
- **gitops_sync_check** (every 2 minutes by default)
- Continuous drift monitoring
---
## Configuration
### Required Environment Variables
```yaml
GITOPS_STATUS_SERVER_URL: http://gitops-status-server.observability-stack.svc.cluster.local:80
REPO_NAME: rsyslog
SERVER_NAME: rsyslog-lab
SSH_PRIVATE_KEY: from_secret: SSH_PRIVATE_KEY
ANSIBLE_CONFIG: ansible.cfg
```
### Cron Job Setup (Woodpecker UI)
- Name: `gitops_sync_check`
- Branch: `master`
- Schedule: `*/2 * * * *`
---
## Backward Compatibility
- ✓ **Existing deploy logic:** Unchanged (apply.yml still used)
- ✓ **Existing drift detection:** Enhanced (now outputs file names)
- ✓ **PR validation:** Unchanged (syntax-check, validate still used)
- ✓ **Server files:** No changes needed
---
## Security
- ✓ SSH credentials in Woodpecker secrets (not exposed)
- ✓ JSON contains only metadata (file names, counts, timestamps)
- ✓ No actual rsyslog config contents exposed
- ✓ Internal Kubernetes communication (ClusterIP)
- ✓ No Pushgateway exposure
---
## Testing Checklist
- [ ] Cron job is created in Woodpecker
- [ ] Cron job runs on schedule (every 2 minutes)
- [ ] `update-gitops-status.sh` script is executable
- [ ] Script runs successfully (HTTP 200 response)
- [ ] gitops-status-server receives JSON POSTs
- [ ] JSON format matches expected schema
- [ ] Grafana dashboard displays sync status
- [ ] Changed files appear in Grafana panel
- [ ] Manual file edit on server is detected
- [ ] Post-deployment status updates correctly
---
## Migration Steps
1. **Commit and push changes:**
```bash
git add .woodpecker.yml ansible/playbooks/drift-check.yml update-gitops-status.sh
git commit -m "refactor: replace pushgateway with gitops-status-server"
git push
```
2. **Verify pipeline runs successfully**
- Check Woodpecker logs for new steps
3. **Create Woodpecker cron job**
- Name: gitops_sync_check
- Schedule: */2 * * * *
4. **Test cron execution**
- Wait for cron trigger (within 2 minutes)
- Verify JSON is sent to gitops-status-server
5. **Verify Grafana dashboard**
- Confirm Infinity datasource can read gitops-status-server
- Dashboard shows sync status and changed files
6. **Monitor for 24 hours**
- Verify cron runs consistently
- Check for any HTTP errors
- Confirm drift detection works
7. **Decommission Pushgateway** (when confident)
- Stop sending metrics to Pushgateway
- Remove Pushgateway from infrastructure
---
## Rollback Plan
If issues arise:
1. **Revert Woodpecker changes:**
```bash
git revert <commit-hash>
git push
```
2. **Remove cron job:**
- Delete gitops_sync_check from Woodpecker UI
3. **Restore Pushgateway metric push** (if keeping Prometheus monitoring)
---
## Key Improvements
| Metric | Old | New |
|--------|-----|-----|
| Data richness | 0/1 only | JSON with file names |
| Setup complexity | Pushgateway + Prometheus | Single service call |
| Audit trail | Basic | Structured snapshots |
| File-level visibility | None | Complete list |
| Update frequency | After deployment | Every 2 minutes + post-deploy |
| Infrastructure | 2+ services | 1 service (gitops-status-server) |
---
## Documentation Files
1. **`GITOPS_STATUS_SERVER_INTEGRATION.md`** Comprehensive documentation
2. **`QUICK_REFERENCE.md`** Quick start and troubleshooting
3. **`IMPLEMENTATION_SUMMARY.md`** This file
---
## Support
For issues, consult:
1. `.woodpecker.yml` comments
2. `update-gitops-status.sh` comments
3. `drift-check.yml` comments
4. Full documentation in GITOPS_STATUS_SERVER_INTEGRATION.md
5. Woodpecker pipeline logs
6. gitops-status-server application logs