Labeling Rules¶

On this page you can learn about the various labeling rules available with Cyber Range Kyoushi Dataset.

DSL Query Rule (`elasticsearch.query`)¶

Applies labels based on a simple Elasticsearch DSL query.

Examples:

- type: elasticsearch.query
  id: attacker.foothold.vpn.ip
  labels:
    - attacker_vpn
    - foothold
  description: >-
    This rule applies the labels to all openvpn log rows that have
    the attacker server as source ip and are within the foothold phase.
  index:
    - openvpn-vpn
  filter:
    range:
    "@timestamp":
        # foothold phase start
        gte: "2021-03-23 20:31:00+00:00"
        # foothold phase stop
        lte: "2021-03-23 21:13:52+00:00"
  query:
    - match:
        source.ip: '192.42.0.255'

`description: str` `pydantic-field` ¶

An optional description for the rule

`exclude: Union[List[Dict[str, Any]], Dict[str, Any]]` `pydantic-field` ¶

Similar to filters, but used to exclude results

`filter_: Union[List[Dict[str, Any]], Dict[str, Any]]` `pydantic-field` ¶

The filter/s to limit queried to documents to only those that match the filters

`id_: str` `pydantic-field` `required` ¶

The unique rule id

`index: Union[List[str], str]` `pydantic-field` ¶

The indices to query (by default prefixed with the dataset name)

`indices_prefix_dataset: bool` `pydantic-field` ¶

If set to true the <DATASET.name>- is automatically prefixed to each pattern. This is a convenience setting as per default all dataset indices start with this prefix.

`labels: str` `pydantic-field` `required` ¶

The list of labels to apply to log lines matching this rule

`query: Union[List[Dict[str, Any]], Dict[str, Any]]` `pydantic-field` `required` ¶

The query/s to use for identifying log lines to apply the tags to.

`type_field: str` `pydantic-field` `required` ¶

The rule type as passed in from the config

EQL Sequence Rule (`elasticsearch.sequence`)¶

Applies labels to a sequence of log events defined by an EQL query.

This labeling rule is defined as an EQL query. Using this syntax it is possible to define a sequence of related events and retrieve them. All events part of retrieved sequences are then labeled.

Examples:

- type: elasticsearch.sequence
  id: attacker.webshell.upload.seq
  labels: [webshell_upload]
  description: >-
    This rule labels the web shell upload step by matching the 3 step sequence
    within the foothold phase.
  index:
    - apache_access-intranet_server
  # since we do these requests very fast
  # we need the line number as tie breaker
  tiebreaker_field: log.file.line
  by: source.address
  max_span: 2m
  filter:
    range:
      "@timestamp":
        # foothold phase start
        gte: "2021-03-23 20:31:00+00:00"
        # foothold phase stop
        lte: "2021-03-23 21:13:52+00:00"
  sequences:
    - '[ apache where event.action == "access" and url.original == "/" ]'
    - '[ apache where event.action == "access" and url.original == "/?p=5" ]'
    - '[ apache where event.action == "access" and http.request.method == "POST" and url.original == "/wp-admin/admin-ajax.php" ]'

`batch_size: int` `pydantic-field` ¶

The amount of sequences to update with each batch. Cannot be bigger than max_result_window

`by: Union[List[str], str]` `pydantic-field` ¶

Optional global sequence by fields

`description: str` `pydantic-field` ¶

An optional description for the rule

`event_category_field: str` `pydantic-field` ¶

The field used to categories events

`filter_: Union[List[Dict[str, Any]], Dict[str, Any]]` `pydantic-field` ¶

The filter/s to limit queried to documents to only those that match the filters

`id_: str` `pydantic-field` `required` ¶

The unique rule id

`index: Union[List[str], str]` `pydantic-field` ¶

The indices to query (by default prefixed with the dataset name)

`indices_prefix_dataset: bool` `pydantic-field` ¶

If set to true the <DATASET.name>- is automatically prefixed to each pattern. This is a convenience setting as per default all dataset indices start with this prefix.

`labels: str` `pydantic-field` `required` ¶

The list of labels to apply to log lines matching this rule

`max_result_window: int` `pydantic-field` ¶

The max result window allowed on the elasticsearch instance

`max_span: str` `pydantic-field` ¶

Optional max time span in which a sequence must occur to be considered a match

`sequences: str` `pydantic-field` `required` ¶

Event sequences to search. Must contain at least two events.

`tiebreaker_field: str` `pydantic-field` ¶

(Optional, string) Field used to sort hits with the same timestamp in ascending order.

`timestamp_field: str` `pydantic-field` ¶

The field containing the event timestamp

`type_field: str` `pydantic-field` `required` ¶

The rule type as passed in from the config

`until: str` `pydantic-field` ¶

Optional until event marking the end of valid sequences. The until event will not be labeled.

DSL Sub Query Rule (`elasticsearch.sub_query`)¶

Labeling rule that labels the results of multiple sub queries.

This labeling rule first executes a base query to retrieve information. It then renders and executes a templated sub query for each row retrieved from the base query. The result rows of these dynamically generated sub queries are then labled.

Note

The sub query uses Jinja2 syntax for templating. The information retrieved by the base query can be accessed through the HIT variable.

Examples:

- type: elasticsearch.sub_query
  id: attacker.foothold.apache.access_dropped
  labels:
    - attacker_http
    - foothold
  description: >-
    This rule tries to match attacker requests that we where unable to
    match to a labeled response with access log entries. Such cases can
    happen if the corresponding response gets lost in the network or
    otherwise is not sent.
  index:
    - pcap-attacker_0
  # obligatory match all
  query:
    - term:
        destination.ip: "172.16.0.217"
  filter:
    - term:
        event.category: http
    - term:
        event.action: request
    # we are looking for requests that have not been marked as attacker http yet
    # most likely they did not have a matching response due to some network error
    # or timeout
    - bool:
        must_not:
        - script:
            script:
                id: test_dataset_kyoushi_label_filter
                params:
                labels: [attacker_http]
  sub_query:
    index:
      - apache_access-intranet_server
    query:
      - term:
          url.full: "{{ HIT.url.full }}"
      - term:
          source.address: "172.16.100.151"
    filter:
      - range:
          "@timestamp":
            # the access log entry should be after the request, but since the access log
            # does not have microseconds we drop them here as well
            gte: "{{ (HIT['@timestamp'] | as_datetime).replace(microsecond=0) }}"
            # the type of error we are looking for should create an access log entry almost immediately
            # se we keep the time frame short
            lte: "{{ ( HIT['@timestamp'] | as_datetime).replace(microsecond=0) + timedelta(seconds=1) }}"

`description: str` `pydantic-field` ¶

An optional description for the rule

`exclude: Union[List[Dict[str, Any]], Dict[str, Any]]` `pydantic-field` ¶

Similar to filters, but used to exclude results

`filter_: Union[List[Dict[str, Any]], Dict[str, Any]]` `pydantic-field` ¶

The filter/s to limit queried to documents to only those that match the filters

`id_: str` `pydantic-field` `required` ¶

The unique rule id

`index: Union[List[str], str]` `pydantic-field` ¶

The indices to query (by default prefixed with the dataset name)

`indices_prefix_dataset: bool` `pydantic-field` ¶

If set to true the <DATASET.name>- is automatically prefixed to each pattern. This is a convenience setting as per default all dataset indices start with this prefix.

`labels: str` `pydantic-field` `required` ¶

The list of labels to apply to log lines matching this rule

`query: Union[List[Dict[str, Any]], Dict[str, Any]]` `pydantic-field` `required` ¶

The query/s to use for identifying log lines to apply the tags to.

`sub_query: QueryBase` `pydantic-field` `required` ¶

The templated sub query to use to apply the labels. Executed for each hit of the parent query.

`type_field: str` `pydantic-field` `required` ¶

The rule type as passed in from the config

DSL Parent Query Rule (`elasticsearch.parent_query`)¶

Applies the labels to all rows of a base query for which a parent query returns results.

This labeling rule first executes a base query to retrieve rows we might want to apply labels to. It then renders and executes a templated parent query for each retrieved row. The parent queries are then used to indicate if the initial result row should be labeled or not. By default result rows of the base query are labeled if the corresponding parent query returns at leas one row. It is possible to configure this minimum number e.g., to require at least two results.

Note

The sub query uses Jinja2 syntax for templating. The information retrieved by the base query can be accessed through the HIT variable.

Examples:

- type: elasticsearch.parent_query
  id: attacker.foothold.apache.error_access
  labels:
    - attacker_http
    - foothold
  description: >-
    This rule looks for unlabeled error messages resulting from VPN server
    traffic within the attack time and tries to match it to an already labeled
    access log row.
  index:
    - apache_error-intranet_server
  query:
    match:
    source.address: "172.16.100.151"
  filter:
    # use script query to match only entries that
    # are not already tagged for as attacker http in the foothold phase
    - bool:
        must_not:
        - script:
            script:
                id: test_dataset_kyoushi_label_filter
                params:
                labels: [attacker_http]
  parent_query:
    index:
      - apache_access-intranet_server
    query:
      - term:
          url.full: "{{ HIT.url.full }}"
      - term:
          source.address: "{{ HIT.source.address }}"
    # we are looking for parents that are labeled as attacker http
      - bool:
          must:
            - script:
                script:
                  id: test_dataset_kyoushi_label_filter
                  params:
                    labels: [attacker_http]
    filter:
      - range:
        # parent must be within +-1s of potential child
          "@timestamp":
             gte: "{{ (HIT['@timestamp'] | as_datetime) - timedelta(seconds=1) }}"
             lte: "{{ ( HIT['@timestamp'] | as_datetime) + timedelta(seconds=1) }}"

`description: str` `pydantic-field` ¶

An optional description for the rule

`exclude: Union[List[Dict[str, Any]], Dict[str, Any]]` `pydantic-field` ¶

Similar to filters, but used to exclude results

`filter_: Union[List[Dict[str, Any]], Dict[str, Any]]` `pydantic-field` ¶

The filter/s to limit queried to documents to only those that match the filters

`id_: str` `pydantic-field` `required` ¶

The unique rule id

`index: Union[List[str], str]` `pydantic-field` ¶

The indices to query (by default prefixed with the dataset name)

`indices_prefix_dataset: bool` `pydantic-field` ¶

If set to true the <DATASET.name>- is automatically prefixed to each pattern. This is a convenience setting as per default all dataset indices start with this prefix.

`labels: str` `pydantic-field` `required` ¶

The list of labels to apply to log lines matching this rule

`max_result_window: int` `pydantic-field` ¶

The max result window allowed on the elasticsearch instance

`min_match: int` `pydantic-field` ¶

The minimum number of parent matches needed for the main query to be labeled.

`parent_query: QueryBase` `pydantic-field` `required` ¶

The templated parent query to check if the labels should be applied to a query hit.

`query: Union[List[Dict[str, Any]], Dict[str, Any]]` `pydantic-field` `required` ¶

The query/s to use for identifying log lines to apply the tags to.

`type_field: str` `pydantic-field` `required` ¶

The rule type as passed in from the config

`check_parent(self, parent_query, min_match, dataset_config, es)` ¶

Executes a parent query and returns if there were enough result rows.

Parameters:

Name	Type	Description	Default
`parent_query`	`QueryBase`	The parent query to execute	required
`min_match`	`int`	The minimum number of result rows required	required
`dataset_config`	`DatasetConfig`	The dataset configuration	required
`es`	`Elasticsearch`	The elasticsearch client object	required

Returns:

Type	Description
`bool`	`True` if the query returned >= min_match rows and `False` otherwise.

Labeling Rules¶

DSL Query Rule (elasticsearch.query)¶

description: str pydantic-field ¶

exclude: Union[List[Dict[str, Any]], Dict[str, Any]] pydantic-field ¶

filter_: Union[List[Dict[str, Any]], Dict[str, Any]] pydantic-field ¶

id_: str pydantic-field required ¶

index: Union[List[str], str] pydantic-field ¶

indices_prefix_dataset: bool pydantic-field ¶

labels: str pydantic-field required ¶

query: Union[List[Dict[str, Any]], Dict[str, Any]] pydantic-field required ¶

type_field: str pydantic-field required ¶

EQL Sequence Rule (elasticsearch.sequence)¶

batch_size: int pydantic-field ¶

by: Union[List[str], str] pydantic-field ¶

description: str pydantic-field ¶

event_category_field: str pydantic-field ¶

filter_: Union[List[Dict[str, Any]], Dict[str, Any]] pydantic-field ¶

id_: str pydantic-field required ¶

index: Union[List[str], str] pydantic-field ¶

indices_prefix_dataset: bool pydantic-field ¶

labels: str pydantic-field required ¶

max_result_window: int pydantic-field ¶

max_span: str pydantic-field ¶

sequences: str pydantic-field required ¶

tiebreaker_field: str pydantic-field ¶

timestamp_field: str pydantic-field ¶

type_field: str pydantic-field required ¶

until: str pydantic-field ¶

DSL Sub Query Rule (elasticsearch.sub_query)¶

description: str pydantic-field ¶

exclude: Union[List[Dict[str, Any]], Dict[str, Any]] pydantic-field ¶

filter_: Union[List[Dict[str, Any]], Dict[str, Any]] pydantic-field ¶

id_: str pydantic-field required ¶

index: Union[List[str], str] pydantic-field ¶

indices_prefix_dataset: bool pydantic-field ¶

labels: str pydantic-field required ¶

query: Union[List[Dict[str, Any]], Dict[str, Any]] pydantic-field required ¶

sub_query: QueryBase pydantic-field required ¶

type_field: str pydantic-field required ¶

DSL Parent Query Rule (elasticsearch.parent_query)¶

description: str pydantic-field ¶

exclude: Union[List[Dict[str, Any]], Dict[str, Any]] pydantic-field ¶

filter_: Union[List[Dict[str, Any]], Dict[str, Any]] pydantic-field ¶

id_: str pydantic-field required ¶

index: Union[List[str], str] pydantic-field ¶

indices_prefix_dataset: bool pydantic-field ¶

labels: str pydantic-field required ¶

max_result_window: int pydantic-field ¶

min_match: int pydantic-field ¶

parent_query: QueryBase pydantic-field required ¶

query: Union[List[Dict[str, Any]], Dict[str, Any]] pydantic-field required ¶

type_field: str pydantic-field required ¶

check_parent(self, parent_query, min_match, dataset_config, es) ¶

DSL Query Rule (`elasticsearch.query`)¶

`description: str` `pydantic-field` ¶

`exclude: Union[List[Dict[str, Any]], Dict[str, Any]]` `pydantic-field` ¶

`filter_: Union[List[Dict[str, Any]], Dict[str, Any]]` `pydantic-field` ¶

`id_: str` `pydantic-field` `required` ¶

`index: Union[List[str], str]` `pydantic-field` ¶

`indices_prefix_dataset: bool` `pydantic-field` ¶

`labels: str` `pydantic-field` `required` ¶

`query: Union[List[Dict[str, Any]], Dict[str, Any]]` `pydantic-field` `required` ¶

`type_field: str` `pydantic-field` `required` ¶

EQL Sequence Rule (`elasticsearch.sequence`)¶

`batch_size: int` `pydantic-field` ¶

`by: Union[List[str], str]` `pydantic-field` ¶

`description: str` `pydantic-field` ¶

`event_category_field: str` `pydantic-field` ¶

`filter_: Union[List[Dict[str, Any]], Dict[str, Any]]` `pydantic-field` ¶

`id_: str` `pydantic-field` `required` ¶

`index: Union[List[str], str]` `pydantic-field` ¶

`indices_prefix_dataset: bool` `pydantic-field` ¶

`labels: str` `pydantic-field` `required` ¶

`max_result_window: int` `pydantic-field` ¶

`max_span: str` `pydantic-field` ¶

`sequences: str` `pydantic-field` `required` ¶

`tiebreaker_field: str` `pydantic-field` ¶

`timestamp_field: str` `pydantic-field` ¶

`type_field: str` `pydantic-field` `required` ¶

`until: str` `pydantic-field` ¶

DSL Sub Query Rule (`elasticsearch.sub_query`)¶

`description: str` `pydantic-field` ¶

`exclude: Union[List[Dict[str, Any]], Dict[str, Any]]` `pydantic-field` ¶

`filter_: Union[List[Dict[str, Any]], Dict[str, Any]]` `pydantic-field` ¶

`id_: str` `pydantic-field` `required` ¶

`index: Union[List[str], str]` `pydantic-field` ¶

`indices_prefix_dataset: bool` `pydantic-field` ¶

`labels: str` `pydantic-field` `required` ¶

`query: Union[List[Dict[str, Any]], Dict[str, Any]]` `pydantic-field` `required` ¶

`sub_query: QueryBase` `pydantic-field` `required` ¶

`type_field: str` `pydantic-field` `required` ¶

DSL Parent Query Rule (`elasticsearch.parent_query`)¶

`description: str` `pydantic-field` ¶

`exclude: Union[List[Dict[str, Any]], Dict[str, Any]]` `pydantic-field` ¶

`filter_: Union[List[Dict[str, Any]], Dict[str, Any]]` `pydantic-field` ¶

`id_: str` `pydantic-field` `required` ¶

`index: Union[List[str], str]` `pydantic-field` ¶

`indices_prefix_dataset: bool` `pydantic-field` ¶

`labels: str` `pydantic-field` `required` ¶

`max_result_window: int` `pydantic-field` ¶

`min_match: int` `pydantic-field` ¶

`parent_query: QueryBase` `pydantic-field` `required` ¶

`query: Union[List[Dict[str, Any]], Dict[str, Any]]` `pydantic-field` `required` ¶

`type_field: str` `pydantic-field` `required` ¶

`check_parent(self, parent_query, min_match, dataset_config, es)` ¶