ClawNex Shield Rules
ClawNex Shield Rules are the built-in detection rules used by Prompt Shield. The current ruleset contains 163 rules organized into 10 categories. Every LLM prompt and response is scanned against these rules to detect threats.
Rule Categories
| Category | Rules | Severity Range | False Positive Risk | Notes |
|---|---|---|---|---|
| secrets | 21 | CRITICAL-HIGH | Low | API keys, tokens, credentials. Very reliable. |
| commands | 19 | CRITICAL-HIGH | Low-Medium | Shell commands, reverse shells. Can trigger on code-generation tasks. |
| trust-exploit | 18 | HIGH-MEDIUM | Medium | ”Ignore previous instructions” and similar injection patterns. |
| sensitive-paths | 14 | HIGH-MEDIUM | Low | References to credential files (.ssh, .env, etc.). |
| c2 | 12 | CRITICAL-HIGH | Low | C2 beacons, webhook exfiltration, cloud metadata attacks. |
| cognitive-file | 8 | CRITICAL-HIGH | High (internal) | References to SOUL.md, MEMORY.md. Will false-positive on agent system prompts — use the whitelist. |
| jailbreak | 8 | HIGH | Medium | Known jailbreak patterns (grandma exploit, token smuggling). |
| financial | 7 | CRITICAL-MEDIUM | Medium | Credit cards, SSNs, IBANs. FIN-SWIFT-CODE may match all-caps words. |
| steganography | 6 | HIGH-MEDIUM | Low | Zero-width characters, homoglyphs, BIDI overrides. |
| encoding | 6 | MEDIUM | Medium | Base64, hex, ROT13 encoded payloads. |
Scoring and Verdicts
Score = SUM(severity_weight x confidence x min(matchCount, 5))
capped at 100
Severity weights: CRITICAL = 30, HIGH = 20, MEDIUM = 10, LOW = 5| Condition | Verdict |
|---|---|
| Any CRITICAL detection | BLOCK (regardless of score) |
| Score >= 60 | BLOCK |
| Score >= 25 | REVIEW |
| Score < 25 | ALLOW |
Three-Layer Detection
| Layer | Mechanism | Timing | Can Block? |
|---|---|---|---|
| Pre-call scan | LiteLLM async_pre_call_hook | Before model sees the prompt | Yes (if block mode is on) |
| Post-call scan | LiteLLM log_success_event | After response received | No (informational) |
| Retroactive scan | Session Watcher polls JSONL logs | Minutes to hours later | No (detection only) |
Outbound Detection (Data Leak Prevention)
Applied to model responses:
| Pattern | Severity |
|---|---|
| Private key material | CRITICAL |
| Password assignments | HIGH |
| Environment variable leaks | MEDIUM |
| Internal IP addresses | MEDIUM |
| Database connection URIs | HIGH |
Shield Modes
| Mode | Behavior |
|---|---|
| OBSERVE (default) | Traffic is scanned and logged. Threats are flagged but not blocked. |
| BLOCK | Threats that score BLOCK are actively rejected before reaching the AI model. |
Start in OBSERVE mode. Review traffic for a few days to understand what your agents send. When confident the shield is not producing false positives on legitimate traffic, switch to BLOCK mode.