Config module

This module contains all configuration model definitions.

DatasetConfig pydantic-model

Configuration model for the dataset defintion.

This model controls the attributes of the dataset (e.g., name) currently being processed. These configuration values are set during the dataset preparation phase.


name: example-dataset
start: 2021-10-10T12:00
end: 2021-10-12T12:00

end: datetime pydantic-field required

The end time of the observation period.

name: str pydantic-field required

The name of the dataset. This is for example used as part of the elasticsearch index.

start: datetime pydantic-field required

The start time of the observation period.

LogstashLogConfig pydantic-model

Configuration model for to be parsed log files.

This model is used to create a Logstash input configuration for raw dataset log files.


- type: kyoushi
  codec: json
  path: sm.log*
  save_parse: false
   - *.gz
   - *.zip
  file_sort_direction: desc
  file_chunk_size: 320000

   - statemachine
   - kyoushi
      '[@metadata][kyoushi][sm]': user

add_field: Any pydantic-field

A dict of fields to add to each log event.

codec: Union[str, Dict[str, Dict[str, Any]]] pydantic-field

The file codec to use for reading.

delimiter: str pydantic-field

The newline delimiter (does not work for compressed files).

exclude: Union[str, List[str]] pydantic-field

Glob/s to exclude from reading.

file_chunk_size: int pydantic-field

The size of the chunks to read from the file (in bytes). Default is 32kb set this to a higher value if your log file contains very long lines.

file_sort_direction: typing_extensions.Literal['asc', 'desc'] pydantic-field

The sort direction for multiple files.

path: Union[str, List[str]] pydantic-field required

The log file path/s to read.

save_parsed: bool pydantic-field

If this log should be saved to the disk after parsing. (Overrides parser.save_parsed)

tags: str pydantic-field

The tags to assign to each log event for this log source.

type: str pydantic-field required

The type to tag the log input with.

LogstashParserConfig pydantic-model

Configuration model defining the logstash parser settings.

This is used to configure how logstash is used as dataset parser (e.g., log level)


settings_dir: processing/logstash
conf_dir: processing/logstash/conf.d
log_level: debug
log_dir: processing/logstash/log
completed_log: processing/logstash/log/file-completed.log
data_dir: processing/logstash/data
parsed_dir: parsed
save_parsed: false

completed_log: Path pydantic-field

The logstash file input completed log (defaults to <log_dir>/file-completed.log

conf_dir: Path pydantic-field

The path to the logstash pipeline config (defaults to <settings_dir>/conf.d)

data_dir: Path pydantic-field

The directory logstash should use for persistent data (e.g., sincedb).

log_dir: Path pydantic-field

The directory logstash should use for logging

log_level: str pydantic-field

The log level to pass to the logstash cli

parsed_dir: Path pydantic-field

The directory to save the parsed log files in, when save_parsed=true for any log. (defaults to <dataset>/parsed)

save_parsed: bool pydantic-field

If the log files should be saved to the disk after parsing. Is overridden by log.save_parsed.

settings_dir: Path pydantic-field

The logstash settings directory containing the logstash.yml (use for path.settings).

default_completed_log(val, *, values, **kwargs) classmethod

Validator for setting default completed_log


Name Type Description Default
val Optional[pathlib.Path]

The completed_log config value.

values Dict[str, Any]

The model attribute dict.



Type Description

Path: The completed_log path.

Source code in dataset/
@validator("completed_log", pre=True, always=True)
def default_completed_log(
    cls, val: Optional[Path], *, values: Dict[str, Any], **kwargs
) -> Path:
    """Validator for setting default completed_log

        val: The completed_log config value.
        values: The model attribute dict.

        Path: The completed_log path.
    return val or values["log_dir"].joinpath("file-completed.log")

default_conf_dir(val, *, values, **kwargs) classmethod

Validator for setting default conf_dir


Name Type Description Default
val Optional[pathlib.Path]

The conf_dir config value.

values Dict[str, Any]

The model attribute dict.



Type Description

Path: The conf_dir path.

Source code in dataset/
@validator("conf_dir", pre=True, always=True)
def default_conf_dir(
    cls, val: Optional[Path], *, values: Dict[str, Any], **kwargs
) -> Path:
    """Validator for setting default conf_dir

        val: The conf_dir config value.
        values: The model attribute dict.

        Path: The conf_dir path.
    return val or values["settings_dir"].joinpath("conf.d")

default_parsed_dir(val, *, values, **kwargs) classmethod

Validator for setting default parsed_dir


Name Type Description Default
val Optional[pathlib.Path]

The parsed_dir config value.

values Dict[str, Any]

The model attribute dict.



Type Description

Path: The parsed_dir path.

Source code in dataset/
@validator("parsed_dir", pre=True, always=True)
def default_parsed_dir(
    cls, val: Optional[Path], *, values: Dict[str, Any], **kwargs
) -> Path:
    """Validator for setting default parsed_dir

        val: The parsed_dir config value.
        values: The model attribute dict.

        Path: The parsed_dir path.
    return val or Path("parsed")

ProcessingConfig pydantic-model

Configuration model for the processing pipeline.

The pipline configuration is split into the three steps - pre-processing (pre_processors): List of Cyber Range Kyoushi processors executed before parsing the dataset. - parsing (parser): Logstash parser configuration. - post-processing (post_processors): List of Cyber Range Kyoushi processors executed after the dataset has been parsed.

parser: LogstashParserConfig pydantic-field

The logstash parser configuration.

post_processors: Dict[str, Any] pydantic-field

The processors to apply to the dataset after parsing and publishing the log data to elasticsearch.

pre_processors: Dict[str, Any] pydantic-field

The processors to apply to the dataset before parsing and publishing the log data to elasticsearch.

check_processor_required_fields(val) classmethod

Validator for ensuring that processors have name and type fields.


Name Type Description Default
val Dict[str, Any]

Processor configuration dict



Type Description
Dict[str, Any]

Validated processor configuration dict

Source code in dataset/
@validator("pre_processors", "post_processors", each_item=True)
def check_processor_required_fields(cls, val: Dict[str, Any]) -> Dict[str, Any]:
    """Validator for ensuring that processors have `name` and `type` fields.

        val: Processor configuration dict

        Validated processor configuration dict
    assert "name" in val, "A processor must have a name"
    assert (
        "type" in val
    ), f"A processor must have a type, but {val['name']} has none"
    return val