
Life & Journey
LLM-Based Comprehensive Detection of Firewall Rule Anomalies
2025-8-29
Paper
Chang-Sheng Lee, I-Chen Lee, Ling-Jyh Chen. "Enhancing Firewall Rule Anomaly Detection via LLM Alignment " . International Conference on Technologies and Applications of Artificial Intelligence, Taiwan, 2025
Motivation
- Traditional firewall rule sets are difficult to maintain because old rules accumulate, leading to complexity and higher costs.
- Detecting anomalies (e.g., shadowing, redundancy, correlation) in firewall rules is a critical first step before simplifying them.
- Existing rule-based methods lack flexibility and generalization.
- Large Language Models (LLMs) offer a promising alternative due to their ability to recognize patterns and generalize.
Methods
-
Model Training
- Used Supervised Fine-Tuning (SFT) with a small dataset (75 examples) that included reasoning steps and anomaly labels.
- Applied Reinforcement Learning (RL) with ~36,000 examples using Group Relative Policy Optimization (GRPO).
- Designed reward functions focusing on both format correctness and answer accuracy.
-
Experiment Setup
- Models: Qwen3-4B (Base and Instruct versions).
- Training hardware: RTX 4090, H100 NVL/PCIe (via Runpods).
- Framework: Unsloth (for efficient training).
-
Testing
- Compared combinations of Base/Instruct with SFT and/or RL.
- Evaluated accuracy on anomaly detection tasks involving firewall rule pairs.
Results
-
Best performance: Instruct model with SFT + RL, achieving ~99.2% accuracy.
-
Both SFT and RL improved accuracy, but RL contributed more than SFT.
-
Pure Base model only achieved ~50% accuracy, Pure Instruct ~70%.
-
However, performance collapsed when evaluating multiple rules (100+ simultaneously).
- Models were good at two-rule comparisons but failed to generalize to larger rule sets.
Conclusion
- LLM alignment (SFT + RL) significantly enhances performance for detecting firewall rule anomalies in pairwise settings.
- Reinforcement learning is particularly powerful, while SFT shows limited benefit due to small dataset size.
- Current methods lack generalization to complex, multi-rule scenarios.
- Future work should test more pretrained models, employ curriculum learning, and experiment with different training strategies (prompts, reward functions, hyperparameters).
Share...