Writing Sigma Rules from Scratch: A Practical Guide

What Is Sigma and Why Should You Use It?

Sigma is an open, vendor-neutral rule format for SIEM detections, maintained by SigmaHQ. Think of it as “the YARA of log-based detection” — a single rule format that converts to Splunk SPL, Elastic KQL, Microsoft Sentinel (KQL), QRadar AQL, or any other SIEM via the sigma-cli converter.

The value proposition: write your detection logic once and convert it to any platform. No more rewriting the same detection for three different SIEMs when you join a new team.

Anatomy of a Sigma Rule

Every Sigma rule is a YAML file with a defined structure. Here’s a complete example:

title: Suspicious PowerShell Encoded Command Execution
id: 3a4b5c6d-7e8f-9012-abcd-ef0123456789
status: experimental
description: |
  Detects PowerShell executing commands via -EncodedCommand flag,
  a common technique to obfuscate malicious payloads.
references:
  - https://attack.mitre.org/techniques/T1059/001/
author: SOC Analyst Hub
date: 2025/11/01
modified: 2025/11/01
tags:
  - attack.execution
  - attack.t1059.001
logsource:
  category: process_creation
  product: windows
detection:
  selection:
    Image|endswith: '\powershell.exe'
    CommandLine|contains|all:
      - ' -e '
      - ' '
  selection_encoded:
    CommandLine|contains:
      - ' -en '
      - ' -enc '
      - ' -enco '
      - ' -encodedcommand '
      - ' -encodedCommand '
  condition: selection or selection_encoded
falsepositives:
  - Legitimate administrative scripts using encoded commands
  - Software deployment tools
level: medium

Let’s break down each section.

Section Deep-Dive

`logsource`

The logsource block tells Sigma converters which event stream to query. There are two approaches:

Category-based (recommended):

logsource:
  category: process_creation
  product: windows

Category-based logsources use Sigma’s built-in field abstractions. process_creation on Windows automatically maps to Sysmon Event 1 OR Windows Security Event 4688, depending on your backend configuration. This is portable across environments.

Service-based (specific):

logsource:
  product: windows
  service: sysmon

Use when you need events from a specific log channel — useful when you’re targeting events that don’t fit a standard category.

Common Logsource Categories

Category	Typical Events
`process_creation`	Sysmon 1, Security 4688
`network_connection`	Sysmon 3
`file_event`	Sysmon 11
`registry_event`	Sysmon 13, 14
`process_access`	Sysmon 10
`image_load`	Sysmon 7
`dns_query`	Sysmon 22
`webserver`	IIS, Apache, Nginx access logs

`detection`

The detection block is where your logic lives. It has two components: named selections and a condition that combines them.

Simple AND (all conditions must match):

detection:
  selection:
    Image|endswith: '\cmd.exe'
    CommandLine|contains: '/c whoami'
  condition: selection

OR within a field (list notation):

detection:
  selection:
    CommandLine|contains:
      - 'mimikatz'
      - 'sekurlsa'
      - 'lsadump'
  condition: selection

AND across multiple fields (all|contains):

detection:
  selection:
    CommandLine|contains|all:
      - 'powershell'
      - '-nop'
      - 'iex'
  condition: selection

Multiple named selections with OR:

detection:
  selection_a:
    Image|endswith: '\wscript.exe'
  selection_b:
    Image|endswith: '\cscript.exe'
  condition: selection_a or selection_b

NOT / negation:

detection:
  selection:
    Image|endswith: '\psexec.exe'
  filter:
    Image|startswith: 'C:\Program Files\'
  condition: selection and not filter

Field Modifiers

Modifiers transform how field values are compared:

Modifier	Behaviour
`contains`	Substring match
`startswith`	Prefix match
`endswith`	Suffix match
`contains	all`
`re`	Regular expression
`base64offset	contains`
`windash`	Match Windows short/long flags (`-` or `/`)
`cidr`	CIDR notation IP range match

The base64offset|contains modifier is particularly powerful for catching obfuscated PowerShell without having to enumerate every encoding variant:

detection:
  selection:
    CommandLine|base64offset|contains:
      - 'IEX'
      - 'Invoke-Expression'
      - 'DownloadString'
  condition: selection

Building Your First Rule: Step by Step

Step 1: Identify the technique. Start with MITRE ATT&CK. What’s the technique ID? What artifacts does it produce? For T1218.005 (mshta.exe abuse), you’re looking for process creation events with mshta.exe.

Step 2: Find representative log samples. Run the technique in a lab or review existing alert data. What does a real event look like? What fields are populated?

Step 3: Identify the minimum discriminating fields. What combination of fields separates malicious from benign? Usually 2–3 fields is enough. Over-specific rules miss variations.

Step 4: Write the selection, then the filter. Write what you want to match, then add exclusions for known-good behaviour.

Step 5: Set level and status honestly. experimental means you haven’t validated FP rates. test means you have. stable means it’s been validated in production.

Converting Rules with sigma-cli

# Install
pip install sigma-cli
pip install pySigma-backend-splunk pySigma-backend-elasticsearch

# Convert to Splunk
sigma convert -t splunk rules/mydetection.yml

# Convert to Elastic KQL
sigma convert -t elasticsearch-dsl rules/mydetection.yml

# Use a pipeline for field mappings
sigma convert -t splunk -p splunk_windows rules/mydetection.yml

Testing Your Rules

Before deploying, test against a sample dataset:

Positive test — does the rule fire on a known-malicious sample? Use Atomic Red Team to generate events.
Negative test — run in your SIEM against 30 days of prod logs. How many alerts? If > 10/day, it needs tuning.
Review false positives — for every FP source, add a specific exclusion. Document why.

A rule with 500 FPs/day teaches analysts to ignore it. A rule with 2 FPs/day that analysts investigate is ten times more valuable.

Common Mistakes to Avoid

Case sensitivity — Sigma string comparisons are case-insensitive by default. Most backends preserve this. Don’t rely on case to differentiate malicious from benign.
Over-broad logsource — using product: windows without a category may generate queries across all Windows event logs, which is expensive.
No filter for known-good — always add filters for legitimate admin tools, signed binaries, and expected software paths.
ID collisions — always generate a UUID for the id field. Use python -c "import uuid; print(uuid.uuid4())".