Dataset Processors¶
On this page you can learn about the various dataset processors available with Cyber Range Kyoushi Dataset.
Util and Debug¶
Console Print (print
)¶
Debug processor that simply prints a message.
Examples:
- name: Print Hello World
type: print
msg: Hello World
msg: str
pydantic-field
required
¶
The message to print
File Manipulation¶
Create Directory (mkdir
)¶
GZip Decompress (gzip
)¶
Processor for decompressing gzip files.
It is possible to either define a glob
of gzip files
or a path
to a single gzip file. If a glob
is defined
it is resolved relative to the defined path
(default=<dataset dir>
).
Examples:
- name: Decompress all GZIP logs
type: gzip
path: gather
glob: "*/logs/**/*.gz"
File Template (template
)¶
Processor for rendering template files.
In addition to the normal processor context it
is also possible to define a template_context
.
If template_context
is defined it will be used for
rendering the template otherwise the normal processor
context will be used.
Examples:
- type: template
name: Rendering labeling rule {{ item.src }}
template_context:
var_files:
attacker: processing/config/attacker/attacker.yaml
escalate: processing/config/attacker/escalate.yaml
foothold: processing/config/attacker/foothold.yaml
servers: processing/config/servers.yaml
src: "processing/templates/rules/{{ item.src }}"
dest: "rules/{{ item.dest }}"
File Trimming (trim
)¶
Processor for trimming log files to a defined time frame.
This processor can be used to remove all log lines outside of defined dataset observation times.
Note
Currently only support simple time frames with a single start and end time.
Examples:
- name: Trim server logs to observation time
type: dataset.trim
context:
var_files:
groups: processing/config/groups.yaml
# we only want to trim the logs of servers that will be part
# of the IDS dataset
indices:
- attacker_0-*
end: datetime
pydantic-field
¶
The end time to trim the logs to (defaults to dataset end)
exclude: str
pydantic-field
¶
Indices to exclude from triming. This will overwrite/exclude indices from any patterns supplied in indices
indices: str
pydantic-field
¶
The log indices to trim (defaults to <dataset>-*
)
indices_prefix_dataset: bool
pydantic-field
¶
If set to true the <DATASET.name>-
is automatically prefixed to each pattern. This is a convenience setting as per default all dataset indices start with this prefix.
start: datetime
pydantic-field
¶
The start time to trim the logs to (defaults to dataset start)
PCAP Conversion (pcap.elasticsearch
)¶
Processor for converting PCAP files to ndjson format.
This processor uses tshark to convert PCAP files to
a line based JSON format (ek
output).
Examples:
- name: Convert attacker pcap to elasticsearch json
type: pcap.elasticsearch
pcap: gather/attacker_0/logs/ait.aecid.attacker.wpdiscuz/traffic.pcap
dest: gather/attacker_0/logs/ait.aecid.attacker.wpdiscuz/traffic.json
tls_keylog: gather/attacker_0/logs/ait.aecid.attacker.wpdiscuz/premaster.txt
read_filter: "tcp or udp or icmp"
create_destination_dirs: bool
pydantic-field
¶
If the processor should create missing destination parent directories
dest: Path
pydantic-field
required
¶
The destination file
force: bool
pydantic-field
¶
If the pcap should be created even when the destination file already exists.
packet_details: bool
pydantic-field
¶
If the packet details should be included, when packet_summary=False then details are always included (-V option).
packet_summary: bool
pydantic-field
¶
If the packet summaries should be included (-P option).
pcap: FilePath
pydantic-field
required
¶
The pcap file to convert
protocol_match_filter: str
pydantic-field
¶
Display filter for protocols and their fields (-J option).Parent and child nodes are included for all matches lower level protocols must be added explicitly.
protocol_match_filter_parent: str
pydantic-field
¶
Display filter for protocols and their fields. Only partent nodes are included (-j option).
read_filter: str
pydantic-field
¶
The read filter to use when reading the pcap file useful to reduce the number of packets (-Y option)
remove_filtered: bool
pydantic-field
¶
Remove filtered fields from the event dicts.
remove_index_messages: bool
pydantic-field
¶
If the elasticsearch bulk API index messages should be stripped from the output file. Useful when using logstash or similar instead of the bulk API.
tls_keylog: FilePath
pydantic-field
¶
TLS keylog file to decrypt TLS on the fly.
tshark_bin: FilePath
pydantic-field
¶
Path to your tshark binary (searches in common paths if not supplied)
Elasticsearch¶
Elasticsearch ingest pipeline (elasticsearch.ingest
)¶
Processor for creating Elasticsearch ingest pipelines.
This processor can be used to create Elasticsearch ingest pipelines for parsing log event. The log file parsing can then be configured to use the pipelines for upstream parsing instead of local Logstash parsing.
Examples:
- name: Add auditd ingest pipeline to elasticsearch
type: elasticsearch.ingest
ingest_pipeline: processing/logstash/auditd-ingest.yml
ingest_pipeline_id: auditd-logs
Elasticsearch Index Template (elasticsearch.template
)¶
Processor for configuring Elasticsearch index templates.
This processor can be used to configure Elasticsearch index templates. To prepare the Elasticsearch instance for the parsing phase. See the index templates doc for more details.
Examples:
- name: Add pcap index mapping
type: elasticsearch.template
template: processing/logstash/pcap-index-template.json
template_name: pcap
index_patterns: ["pcap-*"]
composed_of: str
pydantic-field
¶
Optional list of component templates the index template should be composed of.
create_only: bool
pydantic-field
¶
If true then an existing template with the given name will not be replaced.
index_patterns: str
pydantic-field
¶
The index patterns the template should be applied to. If this is not set then the index template file must contain this information already!
indices_prefix_dataset: bool
pydantic-field
¶
If set to true the <DATASET.name>-
is automatically prefixed to each pattern. This is a convenience setting as per default all dataset indices start with this prefix.
priority: int
pydantic-field
¶
The priority to assign to this index template (higher values take precedent).
template: FilePath
pydantic-field
required
¶
The index template to add to elasticsearch
template_name: str
pydantic-field
required
¶
The name to use for the index template
Elasticsearch Index Component Template (elasticsearch.component_template
)¶
Processor for creating Elasticsearch index component templates.
This processor can be used to create Elasticsearch index component templates. To prepare the Elasticsearch instance for the parsing phase. See the index templates doc for more details.
Examples:
- name: Add pcap component template
type: elasticsearch.component_template
template: processing/logstash/pcap-component-template.json
template_name: pcap
create_only: bool
pydantic-field
¶
If true then an existing template with the given name will not be replaced.
template: FilePath
pydantic-field
required
¶
The index component template to add to elasticsearch
template_name: str
pydantic-field
required
¶
The name to use for the index component template
Elasticsearch Legacy Index Template (elasticsearch.legacy_template
)¶
Processor for configuring Elasticsearch legacy index templates.
This processor can be used to configure Elasticsearch index templates. To prepare the Elasticsearch instance for the parsing phase. See the legacy index templates doc for more details.
Examples:
- name: Add pcap index mapping
type: elasticsearch.legacy_template
template: processing/logstash/pcap-index-template.json
template_name: pcap
index_patterns: ["pcap-*"]
create_only: bool
pydantic-field
¶
If true then an existing template with the given name will not be replaced.
index_patterns: str
pydantic-field
¶
The index patterns the template should be applied to. If this is not set then the index template file must contain this information already!
indices_prefix_dataset: bool
pydantic-field
¶
If set to true the <DATASET.name>-
is automatically prefixed to each pattern. This is a convenience setting as per default all dataset indices start with this prefix.
order: int
pydantic-field
¶
The order to assign to this index template (higher values take precedent).
template: FilePath
pydantic-field
required
¶
The index template to add to elasticsearch
template_name: str
pydantic-field
required
¶
The name to use for the index template
Logstash¶
Logstash setup (logstash.setup
)¶
Logstash parser setup processor.
This processor is used to create all the configuration files required for the Logstash parser (e.g., input and filter configs). Unless you provide a static Logstash parsing configuration you must invoke this processor at somepoint during the pre-processing phase.
Note
The processor only does the basic setup any Logstash parsing filters used for processing specific log events must be prepared separately.
Examples:
- name: Setup logstash pipeline
type: logstash.setup
context:
var_files:
servers: processing/config/servers.yaml
servers: "{{ servers }}"
index_template_template: Path
pydantic-field
¶
The template to use for the elasticsearch dataset index patterns index template
input_config_name: str
pydantic-field
¶
The name of the log inputs config file. (relative to the pipeline config dir)
input_template: Path
pydantic-field
¶
The template to use for the file input plugin configuration
legacy_index_template_template: Path
pydantic-field
¶
The template to use for the elasticsearch dataset legacy index patterns index template
logstash_template: Path
pydantic-field
¶
The template to use for the logstash configuration
output_config_name: str
pydantic-field
¶
The name of the log outputs config file. (relative to the pipeline config dir)
output_template: Path
pydantic-field
¶
The template to use for the file output plugin configuration
piplines_template: Path
pydantic-field
¶
The template to use for the logstash pipelines configuration
pre_process_name: str
pydantic-field
¶
The file name to use for the pre process filters config. This is prefixed with 0000_ to ensure that the filters are run first.
pre_process_template: Path
pydantic-field
¶
The template to use for the file output plugin configuration
servers: Any
pydantic-field
required
¶
Dictionary of servers and their log configurations
use_legacy_template: bool
pydantic-field
¶
If the output config should use the legacy index template or the modern index template
Data Flow and Logic¶
ForEach Loop (foreach
)¶
For each processor
This is a special processor container allowing for the dynamic creation of a list of processor based on a list of items.
Examples:
- name: Render labeling rules
type: foreach
# processing/templates/rules
items:
- src: 0_auth.yaml.j2
dest: 0_auth.yaml
- src: apache.yaml.j2
dest: apache.yaml
- src: audit.yaml.j2
dest: audit.yaml
- src: openvpn.yaml.j2
dest: openvpn.yaml
processor:
type: template
name: Rendering labeling rule {{ item.src }}
template_context:
var_files:
attacker: processing/config/attacker/attacker.yaml
escalate: processing/config/attacker/escalate.yaml
foothold: processing/config/attacker/foothold.yaml
servers: processing/config/servers.yaml
src: "processing/templates/rules/{{ item.src }}"
dest: "rules/{{ item.dest }}"