Config module¶
This module contains all configuration model definitions.
DatasetConfig
pydantic-model
¶
Configuration model for the dataset defintion.
This model controls the attributes of the dataset (e.g., name) currently being processed. These configuration values are set during the dataset preparation phase.
Examples:
name: example-dataset
start: 2021-10-10T12:00
end: 2021-10-12T12:00
LogstashLogConfig
pydantic-model
¶
Configuration model for to be parsed log files.
This model is used to create a Logstash input
configuration
for raw dataset log files.
Examples:
- type: kyoushi
codec: json
path: sm.log*
save_parse: false
exclude:
- *.gz
- *.zip
file_sort_direction: desc
file_chunk_size: 320000
delimiter:
tags:
- statemachine
- kyoushi
add_field:
'[@metadata][kyoushi][sm]': user
add_field: Any
pydantic-field
¶
A dict of fields to add to each log event.
codec: Union[str, Dict[str, Dict[str, Any]]]
pydantic-field
¶
The file codec to use for reading.
delimiter: str
pydantic-field
¶
The newline delimiter (does not work for compressed files).
exclude: Union[str, List[str]]
pydantic-field
¶
Glob/s to exclude from reading.
file_chunk_size: int
pydantic-field
¶
The size of the chunks to read from the file (in bytes). Default is 32kb set this to a higher value if your log file contains very long lines.
file_sort_direction: typing_extensions.Literal['asc', 'desc']
pydantic-field
¶
The sort direction for multiple files.
path: Union[str, List[str]]
pydantic-field
required
¶
The log file path/s to read.
save_parsed: bool
pydantic-field
¶
If this log should be saved to the disk after parsing. (Overrides parser.save_parsed)
tags: str
pydantic-field
¶
The tags to assign to each log event for this log source.
type: str
pydantic-field
required
¶
The type to tag the log input with.
LogstashParserConfig
pydantic-model
¶
Configuration model defining the logstash parser settings.
This is used to configure how logstash is used as dataset parser (e.g., log level)
Examples:
settings_dir: processing/logstash
conf_dir: processing/logstash/conf.d
log_level: debug
log_dir: processing/logstash/log
completed_log: processing/logstash/log/file-completed.log
data_dir: processing/logstash/data
parsed_dir: parsed
save_parsed: false
completed_log: Path
pydantic-field
¶
The logstash file input completed log (defaults to <log_dir>/file-completed.log
conf_dir: Path
pydantic-field
¶
The path to the logstash pipeline config (defaults to <settings_dir>/conf.d
)
data_dir: Path
pydantic-field
¶
The directory logstash should use for persistent data (e.g., sincedb).
log_dir: Path
pydantic-field
¶
The directory logstash should use for logging
log_level: str
pydantic-field
¶
The log level to pass to the logstash cli
parsed_dir: Path
pydantic-field
¶
The directory to save the parsed log files in, when save_parsed=true for any log. (defaults to <dataset>/parsed
)
save_parsed: bool
pydantic-field
¶
If the log files should be saved to the disk after parsing. Is overridden by log.save_parsed.
settings_dir: Path
pydantic-field
¶
The logstash settings directory containing the logstash.yml (use for path.settings
).
default_completed_log(val, *, values, **kwargs)
classmethod
¶
Validator for setting default completed_log
Parameters:
Name | Type | Description | Default |
---|---|---|---|
val |
Optional[pathlib.Path] |
The completed_log config value. |
required |
values |
Dict[str, Any] |
The model attribute dict. |
required |
Returns:
Type | Description |
---|---|
Path |
Path: The completed_log path. |
Source code in dataset/config.py
@validator("completed_log", pre=True, always=True)
def default_completed_log(
cls, val: Optional[Path], *, values: Dict[str, Any], **kwargs
) -> Path:
"""Validator for setting default completed_log
Args:
val: The completed_log config value.
values: The model attribute dict.
Returns:
Path: The completed_log path.
"""
return val or values["log_dir"].joinpath("file-completed.log")
default_conf_dir(val, *, values, **kwargs)
classmethod
¶
Validator for setting default conf_dir
Parameters:
Name | Type | Description | Default |
---|---|---|---|
val |
Optional[pathlib.Path] |
The conf_dir config value. |
required |
values |
Dict[str, Any] |
The model attribute dict. |
required |
Returns:
Type | Description |
---|---|
Path |
Path: The conf_dir path. |
Source code in dataset/config.py
@validator("conf_dir", pre=True, always=True)
def default_conf_dir(
cls, val: Optional[Path], *, values: Dict[str, Any], **kwargs
) -> Path:
"""Validator for setting default conf_dir
Args:
val: The conf_dir config value.
values: The model attribute dict.
Returns:
Path: The conf_dir path.
"""
return val or values["settings_dir"].joinpath("conf.d")
default_parsed_dir(val, *, values, **kwargs)
classmethod
¶
Validator for setting default parsed_dir
Parameters:
Name | Type | Description | Default |
---|---|---|---|
val |
Optional[pathlib.Path] |
The parsed_dir config value. |
required |
values |
Dict[str, Any] |
The model attribute dict. |
required |
Returns:
Type | Description |
---|---|
Path |
Path: The parsed_dir path. |
Source code in dataset/config.py
@validator("parsed_dir", pre=True, always=True)
def default_parsed_dir(
cls, val: Optional[Path], *, values: Dict[str, Any], **kwargs
) -> Path:
"""Validator for setting default parsed_dir
Args:
val: The parsed_dir config value.
values: The model attribute dict.
Returns:
Path: The parsed_dir path.
"""
return val or Path("parsed")
ProcessingConfig
pydantic-model
¶
Configuration model for the processing pipeline.
The pipline configuration is split into the three steps
- pre-processing (pre_processors
): List of Cyber Range Kyoushi processors
executed before parsing the dataset.
- parsing (parser
): Logstash parser configuration.
- post-processing (post_processors
): List of Cyber Range Kyoushi processors
executed after the dataset has been parsed.
parser: LogstashParserConfig
pydantic-field
¶
The logstash parser configuration.
post_processors: Dict[str, Any]
pydantic-field
¶
The processors to apply to the dataset after parsing and publishing the log data to elasticsearch.
pre_processors: Dict[str, Any]
pydantic-field
¶
The processors to apply to the dataset before parsing and publishing the log data to elasticsearch.
check_processor_required_fields(val)
classmethod
¶
Validator for ensuring that processors have name
and type
fields.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
val |
Dict[str, Any] |
Processor configuration dict |
required |
Returns:
Type | Description |
---|---|
Dict[str, Any] |
Validated processor configuration dict |
Source code in dataset/config.py
@validator("pre_processors", "post_processors", each_item=True)
def check_processor_required_fields(cls, val: Dict[str, Any]) -> Dict[str, Any]:
"""Validator for ensuring that processors have `name` and `type` fields.
Args:
val: Processor configuration dict
Returns:
Validated processor configuration dict
"""
assert "name" in val, "A processor must have a name"
assert (
"type" in val
), f"A processor must have a type, but {val['name']} has none"
return val