Skip to content

Pcap module

This module defines utility functions for working and converting PCAP files.

convert_pcap_to_ecs(pcap, dest, tls_keylog=None, tshark_bin=None, remove_index_messages=False, remove_filtered=False, packet_summary=False, packet_details=True, read_filter=None, protocol_match_filter=None, protocol_match_filter_parent=None)

Converts the given pcap file into elasticsearch compatbile format.

Calling convert_pcap_to_ecs(pcap, dest) is equivalent to

tshark -r pcap -T ek > dest

Note

See https://www.wireshark.org/docs/man-pages/tshark.html#j-protocol-match-filter for details on the match filters and other options.

Parameters:

Name Type Description Default
pcap Path

The pcap file to convert

required
dest Path

The destination file

required
tls_keylog Optional[pathlib.Path]

TLS keylog file to decrypt TLS on the fly.

None
tshark_bin Optional[pathlib.Path]

Path to your tshark binary (searches in common paths if not supplied)

None
remove_index_messages bool

If the elasticsearch bulk API index messages should be stripped from the output file. Useful when using logstash or similar instead of the bulk API.

False
remove_filtered bool

Remove filtered fields from the event dicts.

False
packet_summary bool

If the packet summaries should be included (-P option).

False
packet_details bool

If the packet details should be included, when packet_summary=False then details are always included (-V option).

True
read_filter Optional[str]

The read filter to use when reading the pcap file useful to reduce the number of packets (-Y option)

None
protocol_match_filter Optional[str]

Display filter for protocols and their fields (-J option). Parent and child nodes are included for all matches lower level protocols must be added explicitly.

None
protocol_match_filter_parent Optional[str]

Display filter for protocols and their fields. Only partent nodes are included (-j option).

None
Source code in dataset/pcap.py
def convert_pcap_to_ecs(
    pcap: Path,
    dest: Path,
    tls_keylog: Optional[Path] = None,
    tshark_bin: Optional[Path] = None,
    remove_index_messages: bool = False,
    remove_filtered: bool = False,
    packet_summary: bool = False,
    packet_details: bool = True,
    read_filter: Optional[str] = None,
    protocol_match_filter: Optional[str] = None,
    protocol_match_filter_parent: Optional[str] = None,
):
    """Converts the given pcap file into elasticsearch compatbile format.

    Calling `convert_pcap_to_ecs(pcap, dest)` is equivalent to
    ```
    tshark -r pcap -T ek > dest
    ```

    !!! Note
        See https://www.wireshark.org/docs/man-pages/tshark.html#j-protocol-match-filter
        for details on the match filters and other options.

    Args:
        pcap: The pcap file to convert
        dest: The destination file
        tls_keylog: TLS keylog file to decrypt TLS on the fly.
        tshark_bin: Path to your tshark binary (searches in common paths if not supplied)
        remove_index_messages: If the elasticsearch bulk API index messages should be stripped from the output file.
                               Useful when using logstash or similar instead of the bulk API.
        remove_filtered: Remove filtered fields from the event dicts.
        packet_summary: If the packet summaries should be included (-P option).
        packet_details: If the packet details should be included, when packet_summary=False then details are always included (-V option).
        read_filter: The read filter to use when reading the pcap file useful to reduce the number of packets (-Y option)
        protocol_match_filter: Display filter for protocols and their fields (-J option).
                               Parent and child nodes are included for all matches lower level protocols must be added explicitly.
        protocol_match_filter_parent: Display filter for protocols and their fields. Only partent nodes are included (-j option).
    """
    # set path to tshark bin use argument or search in common paths
    tshark_path = (
        tshark_bin.absolute() if tshark_bin is not None else get_process_path()
    )
    tshark_version = get_tshark_version(tshark_path)
    args = [tshark_path, "-r", str(pcap.absolute()), "-T", "ek"]

    # configure tls keylog file for decryption
    if tls_keylog is not None:
        keylog_pref = (
            "tls.keylog_file"
            # all ssl prefs were renamed to tls with wireshark 3.0
            if tshark_version >= LooseVersion("3.0")
            else "ssl.keylog_file"
        )
        args.extend(["-o", f"{keylog_pref}:{tls_keylog.absolute()}"])

    if packet_summary:
        args.append("-P")

    if packet_details:
        args.append("-V")

    if read_filter is not None:
        args.extend(["-Y", read_filter])

    if protocol_match_filter is not None:
        args.extend(["-J", protocol_match_filter])

    if protocol_match_filter_parent is not None:
        args.extend(["-j", protocol_match_filter_parent])

    proc = subprocess.Popen(args, stdout=subprocess.PIPE)
    # regex used to skip all index lines from the bulk format
    index_regex = re.compile(r'{"index":{"_index":".*","_type":".*"}}')
    with open(dest, "w") as dest_file:
        assert proc.stdout is not None, "TShark process stdout should be available"
        for line in io.TextIOWrapper(proc.stdout, encoding="utf-8", errors="replace"):
            # when remove index is true discard all index lines
            if not remove_index_messages or not index_regex.match(line):
                if remove_filtered:
                    line = pcap_ecs_remove_filtered(line)
                dest_file.write(line)
    # ensure tshark process has finished
    proc.wait()

pcap_ecs_remove_filtered(line)

Removes any useless filtered keys from a ek JSON line.

Depending on the used display and read filters the tshark conversion process adds filtered keys to fields that have are beeing filter through read or display filters. These markers do not add any value and even break the PCAP field mapping. As such we check for them and remove any we find.

Parameters:

Name Type Description Default
line str

The ek JSON line

required

Returns:

Type Description
str

The modified JSON line

Source code in dataset/pcap.py
def pcap_ecs_remove_filtered(line: str) -> str:
    """Removes any useless filtered keys from a `ek` JSON line.

    Depending on the used display and read filters the tshark
    conversion process adds `filtered` keys to fields that have
    are beeing filter through read or display filters. These markers
    do not add any value and even break the PCAP field mapping.
    As such we check for them and remove any we find.

    Args:
        line: The `ek` JSON line

    Returns:
        The modified JSON line
    """
    data = ujson.loads(line)
    # we need to re-add the line break that gets lost due to json load
    return ujson.dumps(__pcap_ecs_remove_filtered(data, object())) + "\n"