Dataset Processors¶

On this page you can learn about the various dataset processors available with Cyber Range Kyoushi Dataset.

Util and Debug¶

Console Print (`print`)¶

Debug processor that simply prints a message.

Examples:

- name: Print Hello World
  type: print
  msg: Hello World

`msg: str` `pydantic-field` `required` ¶

The message to print

File Manipulation¶

Create Directory (`mkdir`)¶

Processor for creating file directories.

Examples:

- name: Ensure processing config directory exists
  type: mkdir
  path: processing/config

`path: Path` `pydantic-field` `required` ¶

The directory path to create

`recursive: bool` `pydantic-field` ¶

If all missing parent directories should als be created

GZip Decompress (`gzip`)¶

Processor for decompressing gzip files.

It is possible to either define a glob of gzip files or a path to a single gzip file. If a glob is defined it is resolved relative to the defined path (default=<dataset dir>).

Examples:

- name: Decompress all GZIP logs
  type: gzip
  path: gather
  glob: "*/logs/**/*.gz"

`glob: str` `pydantic-field` ¶

The file glob expression to use

`path: Path` `pydantic-field` ¶

The base path to search for the gzipped files.

File Template (`template`)¶

Processor for rendering template files.

In addition to the normal processor context it is also possible to define a template_context. If template_context is defined it will be used for rendering the template otherwise the normal processor context will be used.

Examples:

- type: template
  name: Rendering labeling rule {{ item.src }}
  template_context:
    var_files:
    attacker: processing/config/attacker/attacker.yaml
    escalate: processing/config/attacker/escalate.yaml
    foothold: processing/config/attacker/foothold.yaml
    servers: processing/config/servers.yaml
  src: "processing/templates/rules/{{ item.src }}"
  dest: "rules/{{ item.dest }}"

`dest: Path` `pydantic-field` `required` ¶

The destination to save the rendered file to

`src: Path` `pydantic-field` `required` ¶

The template file to render

`template_context: ProcessorContext` `pydantic-field` ¶

Optional template context if this is not set the processor context is used instead

File Trimming (`trim`)¶

Processor for trimming log files to a defined time frame.

This processor can be used to remove all log lines outside of defined dataset observation times.

Note

Currently only support simple time frames with a single start and end time.

Examples:

- name: Trim server logs to observation time
  type: dataset.trim
  context:
  var_files:
    groups: processing/config/groups.yaml
  # we only want to trim the logs of servers that will be part
  # of the IDS dataset
  indices:
    - attacker_0-*

`end: datetime` `pydantic-field` ¶

The end time to trim the logs to (defaults to dataset end)

`exclude: str` `pydantic-field` ¶

Indices to exclude from triming. This will overwrite/exclude indices from any patterns supplied in indices

`indices: str` `pydantic-field` ¶

The log indices to trim (defaults to <dataset>-*)

`indices_prefix_dataset: bool` `pydantic-field` ¶

If set to true the <DATASET.name>- is automatically prefixed to each pattern. This is a convenience setting as per default all dataset indices start with this prefix.

`start: datetime` `pydantic-field` ¶

The start time to trim the logs to (defaults to dataset start)

PCAP Conversion (`pcap.elasticsearch`)¶

Processor for converting PCAP files to ndjson format.

This processor uses tshark to convert PCAP files to a line based JSON format (ek output).

Examples:

- name: Convert attacker pcap to elasticsearch json
  type: pcap.elasticsearch
  pcap: gather/attacker_0/logs/ait.aecid.attacker.wpdiscuz/traffic.pcap
  dest: gather/attacker_0/logs/ait.aecid.attacker.wpdiscuz/traffic.json
  tls_keylog: gather/attacker_0/logs/ait.aecid.attacker.wpdiscuz/premaster.txt
  read_filter: "tcp or udp or icmp"

`create_destination_dirs: bool` `pydantic-field` ¶

If the processor should create missing destination parent directories

`dest: Path` `pydantic-field` `required` ¶

The destination file

`force: bool` `pydantic-field` ¶

If the pcap should be created even when the destination file already exists.

`packet_details: bool` `pydantic-field` ¶

If the packet details should be included, when packet_summary=False then details are always included (-V option).

`packet_summary: bool` `pydantic-field` ¶

If the packet summaries should be included (-P option).

`pcap: FilePath` `pydantic-field` `required` ¶

The pcap file to convert

`protocol_match_filter: str` `pydantic-field` ¶

Display filter for protocols and their fields (-J option).Parent and child nodes are included for all matches lower level protocols must be added explicitly.

`protocol_match_filter_parent: str` `pydantic-field` ¶

Display filter for protocols and their fields. Only partent nodes are included (-j option).

`read_filter: str` `pydantic-field` ¶

The read filter to use when reading the pcap file useful to reduce the number of packets (-Y option)

`remove_filtered: bool` `pydantic-field` ¶

Remove filtered fields from the event dicts.

`remove_index_messages: bool` `pydantic-field` ¶

If the elasticsearch bulk API index messages should be stripped from the output file. Useful when using logstash or similar instead of the bulk API.

`tls_keylog: FilePath` `pydantic-field` ¶

TLS keylog file to decrypt TLS on the fly.

`tshark_bin: FilePath` `pydantic-field` ¶

Path to your tshark binary (searches in common paths if not supplied)

Elasticsearch¶

Elasticsearch ingest pipeline (`elasticsearch.ingest`)¶

Processor for creating Elasticsearch ingest pipelines.

This processor can be used to create Elasticsearch ingest pipelines for parsing log event. The log file parsing can then be configured to use the pipelines for upstream parsing instead of local Logstash parsing.

Examples:

- name: Add auditd ingest pipeline to elasticsearch
  type: elasticsearch.ingest
  ingest_pipeline: processing/logstash/auditd-ingest.yml
  ingest_pipeline_id: auditd-logs

`ingest_pipeline: FilePath` `pydantic-field` `required` ¶

The ingest pipeline to add to elasticsearch

`ingest_pipeline_id: str` `pydantic-field` `required` ¶

The id to use for the ingest pipeline

Elasticsearch Index Template (`elasticsearch.template`)¶

Processor for configuring Elasticsearch index templates.

This processor can be used to configure Elasticsearch index templates. To prepare the Elasticsearch instance for the parsing phase. See the index templates doc for more details.

Examples:

- name: Add pcap index mapping
  type: elasticsearch.template
  template: processing/logstash/pcap-index-template.json
  template_name: pcap
  index_patterns: ["pcap-*"]

`composed_of: str` `pydantic-field` ¶

Optional list of component templates the index template should be composed of.

`create_only: bool` `pydantic-field` ¶

If true then an existing template with the given name will not be replaced.

`index_patterns: str` `pydantic-field` ¶

The index patterns the template should be applied to. If this is not set then the index template file must contain this information already!

`indices_prefix_dataset: bool` `pydantic-field` ¶

If set to true the <DATASET.name>- is automatically prefixed to each pattern. This is a convenience setting as per default all dataset indices start with this prefix.

`priority: int` `pydantic-field` ¶

The priority to assign to this index template (higher values take precedent).

`template: FilePath` `pydantic-field` `required` ¶

The index template to add to elasticsearch

`template_name: str` `pydantic-field` `required` ¶

The name to use for the index template

Elasticsearch Index Component Template (`elasticsearch.component_template`)¶

Processor for creating Elasticsearch index component templates.

This processor can be used to create Elasticsearch index component templates. To prepare the Elasticsearch instance for the parsing phase. See the index templates doc for more details.

Examples:

- name: Add pcap component template
  type: elasticsearch.component_template
  template: processing/logstash/pcap-component-template.json
  template_name: pcap

`create_only: bool` `pydantic-field` ¶

If true then an existing template with the given name will not be replaced.

`template: FilePath` `pydantic-field` `required` ¶

The index component template to add to elasticsearch

`template_name: str` `pydantic-field` `required` ¶

The name to use for the index component template

Elasticsearch Legacy Index Template (`elasticsearch.legacy_template`)¶

Processor for configuring Elasticsearch legacy index templates.

This processor can be used to configure Elasticsearch index templates. To prepare the Elasticsearch instance for the parsing phase. See the legacy index templates doc for more details.

Examples:

- name: Add pcap index mapping
  type: elasticsearch.legacy_template
  template: processing/logstash/pcap-index-template.json
  template_name: pcap
  index_patterns: ["pcap-*"]

`create_only: bool` `pydantic-field` ¶

If true then an existing template with the given name will not be replaced.

`index_patterns: str` `pydantic-field` ¶

The index patterns the template should be applied to. If this is not set then the index template file must contain this information already!

`indices_prefix_dataset: bool` `pydantic-field` ¶

If set to true the <DATASET.name>- is automatically prefixed to each pattern. This is a convenience setting as per default all dataset indices start with this prefix.

`order: int` `pydantic-field` ¶

The order to assign to this index template (higher values take precedent).

`template: FilePath` `pydantic-field` `required` ¶

The index template to add to elasticsearch

`template_name: str` `pydantic-field` `required` ¶

The name to use for the index template

Logstash¶

Logstash setup (`logstash.setup`)¶

Logstash parser setup processor.

This processor is used to create all the configuration files required for the Logstash parser (e.g., input and filter configs). Unless you provide a static Logstash parsing configuration you must invoke this processor at somepoint during the pre-processing phase.

Note

The processor only does the basic setup any Logstash parsing filters used for processing specific log events must be prepared separately.

Examples:

- name: Setup logstash pipeline
  type: logstash.setup
  context:
    var_files:
      servers: processing/config/servers.yaml
  servers: "{{ servers }}"

`index_template_template: Path` `pydantic-field` ¶

The template to use for the elasticsearch dataset index patterns index template

`input_config_name: str` `pydantic-field` ¶

The name of the log inputs config file. (relative to the pipeline config dir)

`input_template: Path` `pydantic-field` ¶

The template to use for the file input plugin configuration

`legacy_index_template_template: Path` `pydantic-field` ¶

The template to use for the elasticsearch dataset legacy index patterns index template

`logstash_template: Path` `pydantic-field` ¶

The template to use for the logstash configuration

`output_config_name: str` `pydantic-field` ¶

The name of the log outputs config file. (relative to the pipeline config dir)

`output_template: Path` `pydantic-field` ¶

The template to use for the file output plugin configuration

`piplines_template: Path` `pydantic-field` ¶

The template to use for the logstash pipelines configuration

`pre_process_name: str` `pydantic-field` ¶

The file name to use for the pre process filters config. This is prefixed with 0000_ to ensure that the filters are run first.

`pre_process_template: Path` `pydantic-field` ¶

The template to use for the file output plugin configuration

`servers: Any` `pydantic-field` `required` ¶

Dictionary of servers and their log configurations

`use_legacy_template: bool` `pydantic-field` ¶

If the output config should use the legacy index template or the modern index template

Data Flow and Logic¶

ForEach Loop (`foreach`)¶

For each processor

This is a special processor container allowing for the dynamic creation of a list of processor based on a list of items.

Examples:

- name: Render labeling rules
  type: foreach
  # processing/templates/rules
  items:
    - src: 0_auth.yaml.j2
      dest: 0_auth.yaml
    - src: apache.yaml.j2
      dest: apache.yaml
    - src: audit.yaml.j2
      dest: audit.yaml
    - src: openvpn.yaml.j2
      dest: openvpn.yaml
  processor:
    type: template
    name: Rendering labeling rule {{ item.src }}
    template_context:
      var_files:
        attacker: processing/config/attacker/attacker.yaml
        escalate: processing/config/attacker/escalate.yaml
        foothold: processing/config/attacker/foothold.yaml
        servers: processing/config/servers.yaml
    src: "processing/templates/rules/{{ item.src }}"
    dest: "rules/{{ item.dest }}"

`items: Any` `pydantic-field` `required` ¶

List of items to create processors for

`loop_var: str` `pydantic-field` ¶

The variable name to use for current loops item in the processor context

`processor: Any` `pydantic-field` `required` ¶

The processor template config to create multiple instances of

Dataset Processors¶

Util and Debug¶

Console Print (print)¶

msg: str pydantic-field required ¶

File Manipulation¶

Create Directory (mkdir)¶

path: Path pydantic-field required ¶

recursive: bool pydantic-field ¶

GZip Decompress (gzip)¶

glob: str pydantic-field ¶

path: Path pydantic-field ¶

File Template (template)¶

dest: Path pydantic-field required ¶

src: Path pydantic-field required ¶

template_context: ProcessorContext pydantic-field ¶

File Trimming (trim)¶

end: datetime pydantic-field ¶

exclude: str pydantic-field ¶

indices: str pydantic-field ¶

indices_prefix_dataset: bool pydantic-field ¶

start: datetime pydantic-field ¶

PCAP Conversion (pcap.elasticsearch)¶

create_destination_dirs: bool pydantic-field ¶

dest: Path pydantic-field required ¶

force: bool pydantic-field ¶

packet_details: bool pydantic-field ¶

packet_summary: bool pydantic-field ¶

pcap: FilePath pydantic-field required ¶

protocol_match_filter: str pydantic-field ¶

protocol_match_filter_parent: str pydantic-field ¶

read_filter: str pydantic-field ¶

remove_filtered: bool pydantic-field ¶

remove_index_messages: bool pydantic-field ¶

tls_keylog: FilePath pydantic-field ¶

tshark_bin: FilePath pydantic-field ¶

Elasticsearch¶

Elasticsearch ingest pipeline (elasticsearch.ingest)¶

ingest_pipeline: FilePath pydantic-field required ¶

ingest_pipeline_id: str pydantic-field required ¶

Elasticsearch Index Template (elasticsearch.template)¶

composed_of: str pydantic-field ¶

create_only: bool pydantic-field ¶

index_patterns: str pydantic-field ¶

indices_prefix_dataset: bool pydantic-field ¶

priority: int pydantic-field ¶

template: FilePath pydantic-field required ¶

template_name: str pydantic-field required ¶

Elasticsearch Index Component Template (elasticsearch.component_template)¶

create_only: bool pydantic-field ¶

template: FilePath pydantic-field required ¶

template_name: str pydantic-field required ¶

Elasticsearch Legacy Index Template (elasticsearch.legacy_template)¶

create_only: bool pydantic-field ¶

index_patterns: str pydantic-field ¶

indices_prefix_dataset: bool pydantic-field ¶

order: int pydantic-field ¶

template: FilePath pydantic-field required ¶

template_name: str pydantic-field required ¶

Logstash¶

Logstash setup (logstash.setup)¶

index_template_template: Path pydantic-field ¶

input_config_name: str pydantic-field ¶

input_template: Path pydantic-field ¶

legacy_index_template_template: Path pydantic-field ¶

logstash_template: Path pydantic-field ¶

output_config_name: str pydantic-field ¶

output_template: Path pydantic-field ¶

piplines_template: Path pydantic-field ¶

pre_process_name: str pydantic-field ¶

pre_process_template: Path pydantic-field ¶

servers: Any pydantic-field required ¶

use_legacy_template: bool pydantic-field ¶

Data Flow and Logic¶

ForEach Loop (foreach)¶

items: Any pydantic-field required ¶

loop_var: str pydantic-field ¶

processor: Any pydantic-field required ¶

Console Print (`print`)¶

`msg: str` `pydantic-field` `required` ¶

Create Directory (`mkdir`)¶

`path: Path` `pydantic-field` `required` ¶

`recursive: bool` `pydantic-field` ¶

GZip Decompress (`gzip`)¶

`glob: str` `pydantic-field` ¶

`path: Path` `pydantic-field` ¶

File Template (`template`)¶

`dest: Path` `pydantic-field` `required` ¶

`src: Path` `pydantic-field` `required` ¶

`template_context: ProcessorContext` `pydantic-field` ¶

File Trimming (`trim`)¶

`end: datetime` `pydantic-field` ¶

`exclude: str` `pydantic-field` ¶

`indices: str` `pydantic-field` ¶

`indices_prefix_dataset: bool` `pydantic-field` ¶

`start: datetime` `pydantic-field` ¶

PCAP Conversion (`pcap.elasticsearch`)¶

`create_destination_dirs: bool` `pydantic-field` ¶

`dest: Path` `pydantic-field` `required` ¶

`force: bool` `pydantic-field` ¶

`packet_details: bool` `pydantic-field` ¶

`packet_summary: bool` `pydantic-field` ¶

`pcap: FilePath` `pydantic-field` `required` ¶

`protocol_match_filter: str` `pydantic-field` ¶

`protocol_match_filter_parent: str` `pydantic-field` ¶

`read_filter: str` `pydantic-field` ¶

`remove_filtered: bool` `pydantic-field` ¶

`remove_index_messages: bool` `pydantic-field` ¶

`tls_keylog: FilePath` `pydantic-field` ¶

`tshark_bin: FilePath` `pydantic-field` ¶

Elasticsearch ingest pipeline (`elasticsearch.ingest`)¶

`ingest_pipeline: FilePath` `pydantic-field` `required` ¶

`ingest_pipeline_id: str` `pydantic-field` `required` ¶

Elasticsearch Index Template (`elasticsearch.template`)¶

`composed_of: str` `pydantic-field` ¶

`create_only: bool` `pydantic-field` ¶

`index_patterns: str` `pydantic-field` ¶

`indices_prefix_dataset: bool` `pydantic-field` ¶

`priority: int` `pydantic-field` ¶

`template: FilePath` `pydantic-field` `required` ¶

`template_name: str` `pydantic-field` `required` ¶

Elasticsearch Index Component Template (`elasticsearch.component_template`)¶

`create_only: bool` `pydantic-field` ¶

`template: FilePath` `pydantic-field` `required` ¶

`template_name: str` `pydantic-field` `required` ¶

Elasticsearch Legacy Index Template (`elasticsearch.legacy_template`)¶

`create_only: bool` `pydantic-field` ¶

`index_patterns: str` `pydantic-field` ¶

`indices_prefix_dataset: bool` `pydantic-field` ¶

`order: int` `pydantic-field` ¶

`template: FilePath` `pydantic-field` `required` ¶

`template_name: str` `pydantic-field` `required` ¶

Logstash setup (`logstash.setup`)¶

`index_template_template: Path` `pydantic-field` ¶

`input_config_name: str` `pydantic-field` ¶

`input_template: Path` `pydantic-field` ¶

`legacy_index_template_template: Path` `pydantic-field` ¶

`logstash_template: Path` `pydantic-field` ¶

`output_config_name: str` `pydantic-field` ¶

`output_template: Path` `pydantic-field` ¶

`piplines_template: Path` `pydantic-field` ¶

`pre_process_name: str` `pydantic-field` ¶

`pre_process_template: Path` `pydantic-field` ¶

`servers: Any` `pydantic-field` `required` ¶

`use_legacy_template: bool` `pydantic-field` ¶

ForEach Loop (`foreach`)¶

`items: Any` `pydantic-field` `required` ¶

`loop_var: str` `pydantic-field` ¶

`processor: Any` `pydantic-field` `required` ¶