# Monitoring

## Overview

QuantumWing provides comprehensive monitoring through **Prometheus metrics** and **Grafana dashboards** for real-time visibility into blockchain operations.

**Status:** ✅ Production Ready

***

## Grafana Dashboard

### Access Information

**Production Dashboard:**

* **URL:** <http://localhost:3002>
* **Login:** admin / admin
* **Dashboard Name:** QuantumWing Metrics
* **Refresh Rate:** 5 seconds (auto-refresh)

### Dashboard Panels (14 Total)

#### 1. **Current Slot & Epoch**

* **Metric:** `quantumwing_beacon_current_slot`, `quantumwing_beacon_current_epoch`
* **Type:** Stat panel
* **Shows:** Real-time slot/epoch progression

#### 2. **Active Validators**

* **Metric:** `quantumwing_beacon_active_validators`
* **Type:** Gauge
* **Expected:** 3 validators (production testnet)

#### 3. **Block Production Stats**

* **Metric:** `quantumwing_beacon_blocks_proposed_total`
* **Type:** Time series graph
* **Shows:** Blocks produced over time

#### 4. **Attestation Rates**

* **Metric:** `quantumwing_beacon_attestations_received_total`
* **Type:** Time series graph
* **Shows:** Validator attestation participation

#### 5. **Blockchain Height**

* **Metric:** `quantumwing_execution_chain_height`
* **Type:** Stat panel
* **Shows:** Current block height

#### 6. **Finalized Epoch**

* **Metric:** `quantumwing_beacon_finalized_epoch`
* **Type:** Stat panel
* **Shows:** Latest finalized epoch (2 epochs behind current)

#### 7. **P2P Peer Counts**

* **Metrics:** `quantumwing_beacon_peer_count`, `quantumwing_execution_peer_count`
* **Type:** Gauge
* **Shows:** Connected peers (beacon + execution layers)

#### 8. **Mempool Size**

* **Metric:** `quantumwing_execution_mempool_size`
* **Type:** Time series graph
* **Shows:** Pending transactions in mempool

#### 9. **Gas Fees Collected**

* **Metric:** `quantumwing_execution_gas_fees_collected_wei`
* **Type:** Counter
* **Shows:** Total gas fees earned by validators (Wei)

#### 10. **Canonical Signature Verification**

* **Metric:** `quantum_canonical_sig_verify_ms`
* **Type:** Histogram (p95 latency)
* **Target:** p95 < 200ms
* **Shows:** Block signature verification time

#### 11. **VRF Proof Verification**

* **Metric:** `quantum_canonical_vrf_verify_ms`
* **Type:** Histogram (p95 latency)
* **Target:** p95 < 150ms
* **Shows:** VRF proof verification time

#### 12. **Canonical Bytes Length**

* **Metric:** `quantum_canonical_bytes_len`
* **Type:** Gauge
* **Expected:** 3409 bytes (SignedHeader format)
* **Alert:** If not 3409 → canonical encoding broken!

#### 13. **Transaction Throughput**

* **Metric:** `quantumwing_execution_transactions_processed_total`
* **Type:** Rate (per second)
* **Shows:** TPS (transactions per second)

#### 14. **Network Bandwidth**

* **Metrics:** `quantumwing_p2p_messages_sent_total`, `quantumwing_p2p_messages_received_total`
* **Type:** Time series graph
* **Shows:** P2P message throughput

***

## Prometheus Metrics

### Metrics Endpoint

**URL:** <http://localhost:8546/metrics> (Execution) + <http://localhost:8080/metrics> (Beacon)

**Format:** Prometheus exposition format

**Scrape Interval:** 5 seconds (default)

### Available Metrics (29 Total)

#### Beacon Chain Metrics

```prometheus
# Slot and Epoch
quantumwing_beacon_current_slot          # Current slot number
quantumwing_beacon_current_epoch         # Current epoch number
quantumwing_beacon_finalized_epoch       # Latest finalized epoch

# Validators
quantumwing_beacon_active_validators     # Number of active validators
quantumwing_beacon_total_validators      # Total validators (active + inactive)

# Blocks
quantumwing_beacon_blocks_proposed_total # Total blocks proposed
quantumwing_beacon_blocks_finalized_total # Total blocks finalized

# Attestations
quantumwing_beacon_attestations_received_total  # Total attestations received
quantumwing_beacon_attestation_pool_size        # Current attestation pool size

# Networking
quantumwing_beacon_peer_count            # Connected peers
```

#### Execution Layer Metrics

```prometheus
# Chain State
quantumwing_execution_chain_height       # Current block height
quantumwing_execution_accounts_total     # Total accounts in state

# Transactions
quantumwing_execution_transactions_processed_total  # Total txs processed
quantumwing_execution_mempool_size                  # Pending transactions
quantumwing_execution_transactions_rejected_total   # Rejected transactions

# Gas & Fees
quantumwing_execution_gas_fees_collected_wei        # Total gas fees (Wei)
quantumwing_execution_gas_used_total                # Total gas consumed

# Networking
quantumwing_execution_peer_count         # Connected peers
```

#### Canonical Encoding Metrics

```prometheus
# Signature Verification
quantum_canonical_sig_verify_ms          # Histogram: signature verification latency
quantum_canonical_sig_verified_total     # Counter: successful verifications
quantum_canonical_sig_fail_total         # Counter: failed verifications

# VRF Verification
quantum_canonical_vrf_verify_ms          # Histogram: VRF proof verification latency
quantum_canonical_vrf_fail_total         # Counter: failed VRF verifications

# Canonical Encoding
quantum_canonical_bytes_len              # Gauge: SignedHeader byte length (should be 3409)
quantum_canonical_sig_generated_total    # Counter: signatures generated

# Future: Domain Hash Cache
quantum_domain_hash_cache_hits           # Counter: domain hash cache hits (optimization)
```

#### P2P Networking Metrics

```prometheus
# Messages
quantumwing_p2p_messages_sent_total      # Total messages sent
quantumwing_p2p_messages_received_total  # Total messages received

# Peers
quantumwing_p2p_peer_connections_active  # Active peer connections
quantumwing_p2p_peer_discovery_total     # Peers discovered via mDNS/DHT

# Kyber Encryption
quantumwing_p2p_kyber_handshakes_total   # Total Kyber-1024 handshakes
quantumwing_p2p_encrypted_bytes_total    # Total encrypted data (bytes)
```

***

## Prometheus Configuration

### prometheus.yml

```yaml
global:
  scrape_interval: 5s
  evaluation_interval: 5s

scrape_configs:
  # Execution Layer
  - job_name: 'quantumwing-execution'
    static_configs:
      - targets: ['localhost:8546']
    metrics_path: '/metrics'

  # Beacon Chain
  - job_name: 'quantumwing-beacon'
    static_configs:
      - targets: ['localhost:8080']
    metrics_path: '/metrics'

  # Validators (if running separately)
  - job_name: 'quantumwing-validators'
    static_configs:
      - targets:
        - 'localhost:8547'  # validator-0
        - 'localhost:8548'  # validator-1
        - 'localhost:8549'  # validator-2
    metrics_path: '/metrics'
```

**Location:** `/home/home/Projects/QuantumWing/monitoring/prometheus/prometheus.yml`

***

## WebSocket Real-Time Events

**Status:** ✅ Production-ready (November 2025)

### Overview

Beacon chain provides WebSocket endpoint for **real-time event streaming** to validators, explorers, and external clients.

### WebSocket Endpoint

**URL:** `ws://localhost:8080/ws`

**Connection:**

```javascript
const ws = new WebSocket('ws://localhost:8080/ws');

ws.onopen = () => {
  console.log('✅ Connected to QuantumWing beacon');
};

ws.onmessage = (event) => {
  const message = JSON.parse(event.data);
  console.log('📡 Event:', message.type, message.data);
};

ws.onerror = (error) => {
  console.error('❌ WebSocket error:', error);
};
```

### Event Types

#### 1. **New Block Events**

```json
{
  "type": "new_block",
  "data": {
    "slot": 42,
    "block_hash": "0x3a4f8b2c...",
    "proposer": "0x78DDf7fB...",
    "transactions": 15
  },
  "timestamp": "2025-11-10T12:34:56Z"
}
```

#### 2. **Attestation Events**

```json
{
  "type": "new_attestation",
  "data": {
    "slot": 42,
    "validator": "0x78DDf7fB...",
    "block_hash": "0x3a4f8b2c..."
  },
  "timestamp": "2025-11-10T12:34:56Z"
}
```

#### 3. **Bridge BID Events** (NEW)

```json
{
  "type": "bridge_bid_registered",
  "data": {
    "bid": "0x7f9a3b...",
    "status": "PENDING",
    "amount": "10000000000000000",
    "chain": "sepolia"
  },
  "timestamp": "2025-11-10T12:34:56Z"
}
```

### Features

* ✅ **Automatic reconnection** - Client libraries handle disconnects
* ✅ **Ping/Pong** - 30-second keepalive for connection health
* ✅ **Broadcast to all clients** - Efficient event distribution
* ✅ **JSON format** - Easy parsing in any language
* ✅ **No authentication required** - Public read-only events

### Use Cases

**Validators:**

* Instant bridge BID notifications (10-second vs 60-second polling)
* Real-time block proposals for attestation
* Fast duty schedule updates

**Explorers:**

* Live block feed without API polling
* Real-time transaction updates
* Bridge status changes

**External Services:**

* Block notifications for indexers
* Bridge monitoring dashboards
* Analytics pipelines

### Example: Validator Client

```go
// cmd/validator/main.go
ws, _, err := websocket.DefaultDialer.Dial("ws://localhost:8080/ws", nil)
if err != nil {
    log.Fatal("WebSocket connection failed:", err)
}
defer ws.Close()

for {
    var msg WSMessage
    if err := ws.ReadJSON(&msg); err != nil {
        log.Warn("WebSocket read error:", err)
        continue
    }

    switch msg.Type {
    case "bridge_bid_registered":
        log.Info("🌉 New bridge BID detected, refreshing duties")
        v.RefreshDuties()  // Instant response!
    case "new_block":
        log.Info("🔷 New block detected:", msg.Data["block_hash"])
    }
}
```

### Performance

* **Latency:** <10ms from event generation to client notification
* **Throughput:** 1000+ events/second supported
* **Connections:** 100+ concurrent clients tested
* **Memory:** \~1KB per active connection

**Code:** [`blockchain/api/websocket.go`](https://github.com/dolfrin/QuantumWing/blob/master/blockchain/api/websocket.go)

***

## Grafana Setup

### Installation

```bash
# Start Grafana (Docker)
docker run -d \
  --name=grafana \
  -p 3002:3000 \
  -v grafana-storage:/var/lib/grafana \
  grafana/grafana:latest

# Or via Docker Compose (recommended)
cd /home/home/Projects/QuantumWing/monitoring
docker-compose up -d grafana
```

### Data Source Configuration

1. **Login:** <http://localhost:3002> (admin / admin)
2. **Configuration → Data Sources → Add data source**
3. **Select:** Prometheus
4. **URL:** <http://localhost:9090>
5. **Access:** Browser
6. **Save & Test**

### Dashboard Import

**Option 1: UI Import**

```
1. Dashboards → Import
2. Upload JSON: monitoring/dashboards/grafana-dashboard.json
3. Select Prometheus data source
4. Import
```

**Option 2: Provisioning** (recommended)

```yaml
# monitoring/grafana/provisioning/dashboards/dashboard.yml
apiVersion: 1

providers:
  - name: 'QuantumWing'
    folder: ''
    type: file
    disableDeletion: false
    updateIntervalSeconds: 10
    allowUiUpdates: true
    options:
      path: /var/lib/grafana/dashboards
```

**Dashboard Location:** `/home/home/Projects/QuantumWing/monitoring/dashboards/grafana-dashboard.json`

***

## Alerting

### Alert Rules (Prometheus)

```yaml
# alerts.yml
groups:
  - name: quantumwing_alerts
    interval: 30s
    rules:
      # Canonical Encoding Alert
      - alert: CanonicalBytesLengthIncorrect
        expr: quantum_canonical_bytes_len != 3409
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Canonical encoding broken (bytes != 3409)"
          description: "SignedHeader length is {{ $value }} bytes, expected 3409"

      # Validator Participation Alert
      - alert: LowValidatorParticipation
        expr: quantumwing_beacon_active_validators < 3
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Low validator count ({{ $value }} active)"

      # High Mempool Alert
      - alert: MempoolOverflow
        expr: quantumwing_execution_mempool_size > 10000
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "Mempool overflow ({{ $value }} pending txs)"

      # Block Production Stalled
      - alert: BlockProductionStalled
        expr: rate(quantumwing_beacon_blocks_proposed_total[5m]) == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Block production stopped"

      # VRF Verification Failures
      - alert: HighVRFFailureRate
        expr: rate(quantum_canonical_vrf_fail_total[5m]) > 0.1
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "High VRF verification failure rate"
```

### Grafana Alerts

Grafana can visualize Prometheus alerts or define its own:

1. **Dashboard → Panel → Alert tab**
2. **Set conditions** (e.g., `canonical_bytes_len != 3409`)
3. **Configure notifications** (email, Slack, PagerDuty)

***

## Performance Monitoring

### Key Performance Indicators (KPIs)

**Consensus Performance:**

* **Block Time:** 12 seconds (target)
* **Attestation Rate:** >95% validators participating
* **Finality Time:** 12.8 minutes (2 epochs)

**Execution Performance:**

* **TPS:** 2000+ (peak 2347 tested)
* **Transaction Latency:** 150ms gossip + 12s block
* **Contract Deployment:** <1s (target, currently 2-5s)

**Canonical Encoding Performance:**

* **Signature Verification:** p95 < 200ms
* **VRF Verification:** p95 < 150ms
* **Canonical Encoding:** 3409 bytes (always)

**P2P Performance:**

* **Peer Count:** 10-50 peers (healthy)
* **Message Propagation:** <1s across network
* **Bandwidth:** 10-20 MB/s (typical)

### Latency Breakdown

```
Transaction Flow:
├─ User submits TX: 0ms
├─ Mempool validation: 5ms
├─ P2P gossip: 150ms
├─ Wait for block: 0-12s (avg 6s)
├─ Block validation: 50ms
├─ State update: 10ms
└─ Finality: 12.8 min (2 epochs)

Total User Latency: ~6.2s (average)
```

***

## Troubleshooting

### Dashboard Shows No Data

**Check Prometheus:**

```bash
# Test Prometheus scraping
curl http://localhost:9090/api/v1/targets

# Expected: all targets "UP"
```

**Check Data Source:**

```bash
# In Grafana: Configuration → Data Sources → Prometheus → Test
# Should show: "Data source is working"
```

### Metrics Endpoint Not Responding

**Check Blockchain Running:**

```bash
ps aux | grep quantum-wing

# Should show:
# - quantum-wing-blockchain (execution)
# - quantum-wing-blockchain --beacon-only (beacon)
# - quantum-wing-validator (validators)
```

**Check Ports:**

```bash
lsof -i :8546  # Execution layer
lsof -i :8080  # Beacon chain
```

### Canonical Bytes Length Alert

**Problem:** `quantum_canonical_bytes_len != 3409`

**Diagnosis:**

```bash
# Check beacon logs
grep "canonical_bytes_len" logs/beacon/beacon.log

# Expected:
# canonical_bytes_len: 3409
```

**Solution:** If not 3409, canonical encoding is broken - check `blockchain/types/signed_header.go`

***

## Best Practices

### Retention

**Prometheus:**

* **Default:** 15 days
* **Production:** 90 days (use `--storage.tsdb.retention.time=90d`)

**Grafana:**

* Dashboards: Permanent (stored in database)
* Snapshots: Export for backup

### Backup

**Prometheus Data:**

```bash
# Backup TSDB
tar -czf prometheus-backup-$(date +%Y%m%d).tar.gz /path/to/prometheus/data
```

**Grafana Dashboards:**

```bash
# Export dashboard JSON
curl http://admin:admin@localhost:3002/api/dashboards/db/quantumwing-metrics
```

### Scaling

**High-Volume Monitoring:**

* Use **Prometheus federation** for multi-region
* Use **Thanos** for long-term storage
* Use **Grafana Loki** for log aggregation (pairs with Grafana)

***

## See Also

* [Metrics Reference](https://quantumwing.gitbook.io/quantumwing/operations/metrics) - Complete list of all metrics
* [Performance Tuning](https://github.com/dolfrin/QuantumWing/blob/master/docs/operations/performance.md) - Optimization guide
* [Operations Guide](https://github.com/dolfrin/QuantumWing/blob/master/docs/operations/README.md) - Backup, recovery, maintenance

***

**Last Updated:** 2025-10-31 **Status:** ✅ Production Ready
