Skip to content

Dataset Processors

On this page you can learn about the various dataset processors available with Cyber Range Kyoushi Dataset.

Util and Debug

Console Print (print)

Debug processor that simply prints a message.


- name: Print Hello World
  type: print
  msg: Hello World

msg: str pydantic-field required

The message to print

File Manipulation

Create Directory (mkdir)

Processor for creating file directories.


- name: Ensure processing config directory exists
  type: mkdir
  path: processing/config

path: Path pydantic-field required

The directory path to create

recursive: bool pydantic-field

If all missing parent directories should als be created

GZip Decompress (gzip)

Processor for decompressing gzip files.

It is possible to either define a glob of gzip files or a path to a single gzip file. If a glob is defined it is resolved relative to the defined path (default=<dataset dir>).


- name: Decompress all GZIP logs
  type: gzip
  path: gather
  glob: "*/logs/**/*.gz"

glob: str pydantic-field

The file glob expression to use

path: Path pydantic-field

The base path to search for the gzipped files.

File Template (template)

Processor for rendering template files.

In addition to the normal processor context it is also possible to define a template_context. If template_context is defined it will be used for rendering the template otherwise the normal processor context will be used.


- type: template
  name: Rendering labeling rule {{ item.src }}
    attacker: processing/config/attacker/attacker.yaml
    escalate: processing/config/attacker/escalate.yaml
    foothold: processing/config/attacker/foothold.yaml
    servers: processing/config/servers.yaml
  src: "processing/templates/rules/{{ item.src }}"
  dest: "rules/{{ item.dest }}"

dest: Path pydantic-field required

The destination to save the rendered file to

src: Path pydantic-field required

The template file to render

template_context: ProcessorContext pydantic-field

Optional template context if this is not set the processor context is used instead

File Trimming (trim)

Processor for trimming log files to a defined time frame.

This processor can be used to remove all log lines outside of defined dataset observation times.


Currently only support simple time frames with a single start and end time.


- name: Trim server logs to observation time
  type: dataset.trim
    groups: processing/config/groups.yaml
  # we only want to trim the logs of servers that will be part
  # of the IDS dataset
    - attacker_0-*

end: datetime pydantic-field

The end time to trim the logs to (defaults to dataset end)

exclude: str pydantic-field

Indices to exclude from triming. This will overwrite/exclude indices from any patterns supplied in indices

indices: str pydantic-field

The log indices to trim (defaults to <dataset>-*)

indices_prefix_dataset: bool pydantic-field

If set to true the <>- is automatically prefixed to each pattern. This is a convenience setting as per default all dataset indices start with this prefix.

start: datetime pydantic-field

The start time to trim the logs to (defaults to dataset start)

PCAP Conversion (pcap.elasticsearch)

Processor for converting PCAP files to ndjson format.

This processor uses tshark to convert PCAP files to a line based JSON format (ek output).


- name: Convert attacker pcap to elasticsearch json
  type: pcap.elasticsearch
  pcap: gather/attacker_0/logs/ait.aecid.attacker.wpdiscuz/traffic.pcap
  dest: gather/attacker_0/logs/ait.aecid.attacker.wpdiscuz/traffic.json
  tls_keylog: gather/attacker_0/logs/ait.aecid.attacker.wpdiscuz/premaster.txt
  read_filter: "tcp or udp or icmp"

create_destination_dirs: bool pydantic-field

If the processor should create missing destination parent directories

dest: Path pydantic-field required

The destination file

force: bool pydantic-field

If the pcap should be created even when the destination file already exists.

packet_details: bool pydantic-field

If the packet details should be included, when packet_summary=False then details are always included (-V option).

packet_summary: bool pydantic-field

If the packet summaries should be included (-P option).

pcap: FilePath pydantic-field required

The pcap file to convert

protocol_match_filter: str pydantic-field

Display filter for protocols and their fields (-J option).Parent and child nodes are included for all matches lower level protocols must be added explicitly.

protocol_match_filter_parent: str pydantic-field

Display filter for protocols and their fields. Only partent nodes are included (-j option).

read_filter: str pydantic-field

The read filter to use when reading the pcap file useful to reduce the number of packets (-Y option)

remove_filtered: bool pydantic-field

Remove filtered fields from the event dicts.

remove_index_messages: bool pydantic-field

If the elasticsearch bulk API index messages should be stripped from the output file. Useful when using logstash or similar instead of the bulk API.

tls_keylog: FilePath pydantic-field

TLS keylog file to decrypt TLS on the fly.

tshark_bin: FilePath pydantic-field

Path to your tshark binary (searches in common paths if not supplied)


Elasticsearch ingest pipeline (elasticsearch.ingest)

Processor for creating Elasticsearch ingest pipelines.

This processor can be used to create Elasticsearch ingest pipelines for parsing log event. The log file parsing can then be configured to use the pipelines for upstream parsing instead of local Logstash parsing.


- name: Add auditd ingest pipeline to elasticsearch
  type: elasticsearch.ingest
  ingest_pipeline: processing/logstash/auditd-ingest.yml
  ingest_pipeline_id: auditd-logs

ingest_pipeline: FilePath pydantic-field required

The ingest pipeline to add to elasticsearch

ingest_pipeline_id: str pydantic-field required

The id to use for the ingest pipeline

Elasticsearch Index Template (elasticsearch.template)

Processor for configuring Elasticsearch index templates.

This processor can be used to configure Elasticsearch index templates. To prepare the Elasticsearch instance for the parsing phase. See the index templates doc for more details.


- name: Add pcap index mapping
  type: elasticsearch.template
  template: processing/logstash/pcap-index-template.json
  template_name: pcap
  index_patterns: ["pcap-*"]

composed_of: str pydantic-field

Optional list of component templates the index template should be composed of.

create_only: bool pydantic-field

If true then an existing template with the given name will not be replaced.

index_patterns: str pydantic-field

The index patterns the template should be applied to. If this is not set then the index template file must contain this information already!

indices_prefix_dataset: bool pydantic-field

If set to true the <>- is automatically prefixed to each pattern. This is a convenience setting as per default all dataset indices start with this prefix.

priority: int pydantic-field

The priority to assign to this index template (higher values take precedent).

template: FilePath pydantic-field required

The index template to add to elasticsearch

template_name: str pydantic-field required

The name to use for the index template

Elasticsearch Index Component Template (elasticsearch.component_template)

Processor for creating Elasticsearch index component templates.

This processor can be used to create Elasticsearch index component templates. To prepare the Elasticsearch instance for the parsing phase. See the index templates doc for more details.


- name: Add pcap component template
  type: elasticsearch.component_template
  template: processing/logstash/pcap-component-template.json
  template_name: pcap

create_only: bool pydantic-field

If true then an existing template with the given name will not be replaced.

template: FilePath pydantic-field required

The index component template to add to elasticsearch

template_name: str pydantic-field required

The name to use for the index component template

Elasticsearch Legacy Index Template (elasticsearch.legacy_template)

Processor for configuring Elasticsearch legacy index templates.

This processor can be used to configure Elasticsearch index templates. To prepare the Elasticsearch instance for the parsing phase. See the legacy index templates doc for more details.


- name: Add pcap index mapping
  type: elasticsearch.legacy_template
  template: processing/logstash/pcap-index-template.json
  template_name: pcap
  index_patterns: ["pcap-*"]

create_only: bool pydantic-field

If true then an existing template with the given name will not be replaced.

index_patterns: str pydantic-field

The index patterns the template should be applied to. If this is not set then the index template file must contain this information already!

indices_prefix_dataset: bool pydantic-field

If set to true the <>- is automatically prefixed to each pattern. This is a convenience setting as per default all dataset indices start with this prefix.

order: int pydantic-field

The order to assign to this index template (higher values take precedent).

template: FilePath pydantic-field required

The index template to add to elasticsearch

template_name: str pydantic-field required

The name to use for the index template


Logstash setup (logstash.setup)

Logstash parser setup processor.

This processor is used to create all the configuration files required for the Logstash parser (e.g., input and filter configs). Unless you provide a static Logstash parsing configuration you must invoke this processor at somepoint during the pre-processing phase.


The processor only does the basic setup any Logstash parsing filters used for processing specific log events must be prepared separately.


- name: Setup logstash pipeline
  type: logstash.setup
      servers: processing/config/servers.yaml
  servers: "{{ servers }}"

index_template_template: Path pydantic-field

The template to use for the elasticsearch dataset index patterns index template

input_config_name: str pydantic-field

The name of the log inputs config file. (relative to the pipeline config dir)

input_template: Path pydantic-field

The template to use for the file input plugin configuration

legacy_index_template_template: Path pydantic-field

The template to use for the elasticsearch dataset legacy index patterns index template

logstash_template: Path pydantic-field

The template to use for the logstash configuration

output_config_name: str pydantic-field

The name of the log outputs config file. (relative to the pipeline config dir)

output_template: Path pydantic-field

The template to use for the file output plugin configuration

piplines_template: Path pydantic-field

The template to use for the logstash pipelines configuration

pre_process_name: str pydantic-field

The file name to use for the pre process filters config. This is prefixed with 0000_ to ensure that the filters are run first.

pre_process_template: Path pydantic-field

The template to use for the file output plugin configuration

servers: Any pydantic-field required

Dictionary of servers and their log configurations

use_legacy_template: bool pydantic-field

If the output config should use the legacy index template or the modern index template

Data Flow and Logic

ForEach Loop (foreach)

For each processor

This is a special processor container allowing for the dynamic creation of a list of processor based on a list of items.


- name: Render labeling rules
  type: foreach
  # processing/templates/rules
    - src: 0_auth.yaml.j2
      dest: 0_auth.yaml
    - src: apache.yaml.j2
      dest: apache.yaml
    - src: audit.yaml.j2
      dest: audit.yaml
    - src: openvpn.yaml.j2
      dest: openvpn.yaml
    type: template
    name: Rendering labeling rule {{ item.src }}
        attacker: processing/config/attacker/attacker.yaml
        escalate: processing/config/attacker/escalate.yaml
        foothold: processing/config/attacker/foothold.yaml
        servers: processing/config/servers.yaml
    src: "processing/templates/rules/{{ item.src }}"
    dest: "rules/{{ item.dest }}"

items: Any pydantic-field required

List of items to create processors for

loop_var: str pydantic-field

The variable name to use for current loops item in the processor context

processor: Any pydantic-field required

The processor template config to create multiple instances of