Risk Scoring Model
How ShieldAgent computes a 0–100 risk score per agent, what each tier means, and how enforcement decisions are made.
Overview
Every agent monitored by ShieldAgent has a continuous risk score between 0 and 100. The score is computed from security events, compliance gaps, integrity checks, and operational signals captured over a 7-day sliding window. Recent events are weighted more heavily than older ones.
The score drives enforcement automatically: a Normal agent runs at full throughput; a Critical agent is blocked until a human releases it.
Risk Tiers
| Tier | Score | Enforcement |
|---|---|---|
| Normal | 0 – 59 | No restrictions. Full throughput. |
| Elevated | 60 – 79 | Reduced request rate. |
| High | 80 – 89 | Significantly rate-limited. Forced into monitoring mode. |
| Critical | 90 – 100 | Only lifecycle methods allowed. Manual release required. |
How Scoring Works
Risk scores combine security, compliance, integrity, and operational signals with time-weighted decay. Recent events carry more weight than older ones. Each component is scored independently and then combined into the overall 0-100 score using a weighted formula. Security signals carry the highest weight, followed by compliance, integrity, and operational factors.
Score Components
Security Score highest weight
Measures active threat signals — injection attempts and data loss events. Higher-severity signals like active injection attacks contribute more than lower-severity signals like automatic PII redaction.
Compliance Score
Measures policy violation patterns. Both deny rate and denial volume are factored in — a high rate with few calls matters less than a moderate rate with many calls.
Integrity Score
Measures tool supply-chain integrity: schema drift and active tool poisoning. Active manipulation events are weighted significantly higher than passive schema changes.
Operational Score lowest weight
Human review rate as a risk proxy. Frequent human-in-the-loop triggers indicate potential misconfiguration or boundary-testing behaviour.
Time-Weighted Decay
All event counts are weighted by recency over a 7-day sliding window. Events from the last hour carry full weight, while older events contribute progressively less. This means a clean week resets the score naturally without manual intervention.