What Is Sigma and Why Should You Use It?
Sigma is an open, vendor-neutral rule format for SIEM detections, maintained by SigmaHQ. Think of it as “the YARA of log-based detection” — a single rule format that converts to Splunk SPL, Elastic KQL, Microsoft Sentinel (KQL), QRadar AQL, or any other SIEM via the sigma-cli converter.
The key value proposition: write your detection logic once and convert it to any platform. No more rewriting the same detection for three different SIEMs when you join a new team.
Anatomy of a Sigma Rule
Every Sigma rule is a YAML file with a defined structure. Here’s a complete example:
title: Suspicious PowerShell Encoded Command Execution
id: 3a4b5c6d-7e8f-9012-abcd-ef0123456789
status: experimental
description: |
Detects PowerShell executing commands via -EncodedCommand flag,
a common technique to obfuscate malicious payloads.
references:
- https://attack.mitre.org/techniques/T1059/001/
author: SOC Analyst Hub
date: 2025/11/01
modified: 2025/11/01
tags:
- attack.execution
- attack.t1059.001
logsource:
category: process_creation
product: windows
detection:
selection:
Image|endswith: '\powershell.exe'
CommandLine|contains|all:
- ' -e '
- ' '
selection_encoded:
CommandLine|contains:
- ' -en '
- ' -enc '
- ' -enco '
- ' -encodedcommand '
- ' -encodedCommand '
condition: selection or selection_encoded
falsepositives:
- Legitimate administrative scripts using encoded commands
- Software deployment tools
level: medium
Let’s break down each section.
Section Deep-Dive
logsource
The logsource block tells Sigma converters which event stream to query. There are two approaches:
Category-based (recommended):
logsource:
category: process_creation
product: windows
Category-based logsources use Sigma’s built-in field abstractions. process_creation on Windows automatically maps to Sysmon Event 1 OR Windows Security Event 4688, depending on your backend configuration. This is portable across environments.
Service-based (specific):
logsource:
product: windows
service: sysmon
Use when you need events from a specific log channel. Useful when you’re targeting events that don’t fit a standard category.
Common Logsource Categories
| Category | Typical Events |
|---|---|
process_creation | Sysmon 1, Security 4688 |
network_connection | Sysmon 3 |
file_event | Sysmon 11 |
registry_event | Sysmon 13, 14 |
process_access | Sysmon 10 |
image_load | Sysmon 7 |
dns_query | Sysmon 22 |
webserver | IIS, Apache, Nginx access logs |
detection
The detection block is where your logic lives. It has two components: named selections and a condition that combines them.
Simple AND (all conditions must match):
detection:
selection:
Image|endswith: '\cmd.exe'
CommandLine|contains: '/c whoami'
condition: selection
OR within a field (list notation):
detection:
selection:
CommandLine|contains:
- 'mimikatz'
- 'sekurlsa'
- 'lsadump'
condition: selection
AND across multiple fields (all|contains):
detection:
selection:
CommandLine|contains|all:
- 'powershell'
- '-nop'
- 'iex'
condition: selection
Multiple named selections with OR:
detection:
selection_a:
Image|endswith: '\wscript.exe'
selection_b:
Image|endswith: '\cscript.exe'
condition: selection_a or selection_b
NOT / negation:
detection:
selection:
Image|endswith: '\psexec.exe'
filter:
Image|startswith: 'C:\Program Files\'
condition: selection and not filter
Field Modifiers
Modifiers transform how field values are compared:
| Modifier | Behavior |
|---|---|
contains | Substring match |
startswith | Prefix match |
endswith | Suffix match |
| `contains | all` |
re | Regular expression |
| `base64offset | contains` |
windash | Match Windows short/long flags (- or /) |
cidr | CIDR notation IP range match |
The base64offset|contains modifier is particularly powerful for catching obfuscated PowerShell without having to enumerate every encoding variant:
detection:
selection:
CommandLine|base64offset|contains:
- 'IEX'
- 'Invoke-Expression'
- 'DownloadString'
condition: selection
Building Your First Rule: Step by Step
Step 1: Identify the technique. Start with MITRE ATT&CK. What’s the technique ID? What artifacts does it produce? For example, T1218.005 (mshta.exe abuse) produces process creation events with mshta.exe.
Step 2: Find representative log samples. Run the technique in a lab or review existing alert data. What does a real event look like? What fields are populated?
Step 3: Identify the minimum discriminating fields. What combination of fields separates malicious from benign? Usually 2–3 fields is enough. More isn’t always better — over-specific rules miss variations.
Step 4: Write the selection, then the filter. Write what you want to match, then add exclusions for known-good behavior.
Step 5: Set level and status honestly. experimental means you haven’t validated FP rates. test means you have. stable means it’s been validated in production.
Converting Rules with sigma-cli
# Install
pip install sigma-cli
pip install pySigma-backend-splunk pySigma-backend-elasticsearch
# Convert to Splunk
sigma convert -t splunk rules/mydetection.yml
# Convert to Elastic KQL
sigma convert -t elasticsearch-dsl rules/mydetection.yml
# Use a pipeline for field mappings
sigma convert -t splunk -p splunk_windows rules/mydetection.yml
Testing Your Rules
Before deploying, test against a sample dataset:
- Positive test — does the rule fire on a known-malicious sample? Use Atomic Red Team to generate events.
- Negative test — run in your SIEM against 30 days of prod logs. How many alerts? If > 10/day, it needs tuning.
- Review false positives — for every FP source, add a specific exclusion. Document why.
A rule with 500 FPs/day teaches analysts to ignore it. A rule with 2 FPs/day that analysts investigate is ten times more valuable.
Common Mistakes to Avoid
- Case sensitivity — Sigma string comparisons are case-insensitive by default. Most backends preserve this. Don’t rely on case to differentiate malicious from benign.
- Over-broad logsource — Using
product: windowswithout a category may generate queries across all Windows event logs, which is expensive. - No filter for known-good — Always add filters for legitimate admin tools, signed binaries, and expected software paths.
- ID collisions — Always generate a UUID for the
idfield. Usepython -c "import uuid; print(uuid.uuid4())".