Building a Mini EDR with eBPF 2: Improving File Access Detection

In part 1, I described the MVP structure: observe Linux syscalls through tracepoints, pass events created by an eBPF program to user space through a BPF ring buffer, then process them with a rule engine and a JSON Lines writer. The flow was simple.

Linux syscall
  -> tracepoint
  -> eBPF program
  -> BPF ring buffer
  -> user-space loader
  -> rule engine
  -> JSONL log file

This post does not rebuild that structure. It improves file access detection on top of the existing MVP. The target is openat.

In the MVP, I observed three syscalls: execve, openat, and connect. Among them, openat was used to detect access to /etc/shadow. The current eBPF program attaches to tracepoint/syscalls/sys_enter_openat, reads the filename and flags at syscall entry, and sends an event immediately.

The current event structure also reflects that design. It stores common process metadata along with path and flags, but it does not store the syscall return value after completion.

struct event {
    unsigned long long timestamp_ns;
    unsigned int event_type;
    unsigned int pid;
    unsigned int ppid;
    unsigned int uid;
    unsigned int gid;
    char comm[TASK_COMM_LEN];
    char parent_comm[TASK_COMM_LEN];
    char path[EDR_PATH_LEN];
    char cmdline[EDR_CMDLINE_LEN];
    unsigned long long flags;
    unsigned int dst_ip;
    unsigned short dst_port;
    unsigned int address_family;
    unsigned long long mnt_ns;
    unsigned long long pid_ns;
};

This structure was enough for the MVP. The fact that /etc/shadow appeared as a syscall argument is already worth investigating. But in security logs, an attempt and a success have different meanings. In part 2, I connect syscall enter and exit to record that difference.

Why Enter Alone Is Not Enough

sys_enter_openat is the point where the process enters the openat syscall. At this point, I can see which path the user is trying to open and which flags were passed.

For example, suppose a normal user runs this command.

cat /etc/shadow

The current MVP can record this as access to /etc/shadow, because the openat argument contains /etc/shadow. That is why the original rule was named SHADOW_READ.

But this event does not mean that reading /etc/shadow succeeded. For a normal user, the open usually fails because of permissions. The kernel checks access rights, file existence, LSM policy, and other conditions before deciding whether the syscall succeeds or fails. That decision is available at syscall exit, not at syscall entry.

So the current SHADOW_READ is not strictly a detection of a successful /etc/shadow read. It is closer to detecting an attempt to open /etc/shadow.

This distinction matters when interpreting logs. A failed access attempt still has value. The fact that someone tried to open a sensitive file is worth investigating. But a case where the process actually receives a file descriptor is a stronger signal. For the same /etc/shadow event, a failed command from a normal user and a successful root access should naturally have different severity.

So the goal of part 2 is to move openat detection from attempt-based logging to result-aware logging. At sys_enter_openat, I temporarily store the path and flags. At sys_exit_openat, I check the return value and emit one combined result event. A non-negative return value means a successful file descriptor. A negative value means a failed errno.

After this improvement, the log should say not only who touched the name /etc/shadow, but also whether that access actually succeeded.

Connecting Enter and Exit

To know the result of openat, I need sys_exit_openat. The problem is that the exit tracepoint alone does not provide the filename and flags. At syscall exit, the return value is available, but the original path argument is not something I can simply read again in the same way.

So I need a small state store between enter and exit. At enter, I store the path and flags. At exit, I find the same syscall flow and combine it with the return value.

For this implementation, I add a BPF hash map.

struct pending_openat {
    char path[EDR_PATH_LEN];
    unsigned long long flags;
};

struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 8192);
    __type(key, u64);
    __type(value, struct pending_openat);
} pending_openat SEC(".maps");

The key is pid_tgid.

u64 pid_tgid = bpf_get_current_pid_tgid();

The upper 32 bits are the process id, and the lower 32 bits are the thread id. I do not use only pid because multiple threads in the same process can call openat at the same time. To match enter and exit per thread, pid_tgid is safer.

The flow is this.

sys_enter_openat
  -> read filename and flags
  -> store in pending_openat[pid_tgid]

sys_exit_openat
  -> look up pending_openat[pid_tgid]
  -> read retval
  -> combine path, flags, and retval into EVENT_OPENAT
  -> send through ring buffer
  -> delete pending_openat[pid_tgid]

As a diagram, it looks like this.

    flowchart TD
    A[sys_enter_openat] --> B[read path and flags]
    B --> C[store in pending_openat map]
    D[sys_exit_openat] --> E[look up pending_openat map]
    E --> F[check retval]
    F --> G[create openat result event]
    G --> H[BPF ring buffer]
    H --> I[user-space rule engine]
    I --> J[JSONL log]

With this change, I no longer send an event immediately at enter. An openat event is sent once at exit. That way, a single log can contain path, flags, retval, and success together.

Of course, storing state creates more things to care about. A value inserted into the map at enter must be deleted at exit. If cleanup is missed, stale pending values remain and can confuse later interpretation. I also need to handle cases where the exit side cannot find a pending value. I should not assume every kernel event flow will always match my expectations.

So the core of this change is not simply adding one more tracepoint. It is managing short-lived state inside the kernel to turn one openat syscall into a result event.

Extending the Event Structure

Even if enter and exit are connected, user space cannot receive the result unless the event structure has fields for it. So I add fields to struct event in include/events.h to represent the openat result.

struct event {
    unsigned long long timestamp_ns;
    unsigned int event_type;
    unsigned int pid;
    unsigned int ppid;
    unsigned int uid;
    unsigned int gid;
    char comm[TASK_COMM_LEN];
    char parent_comm[TASK_COMM_LEN];
    char path[EDR_PATH_LEN];
    char cmdline[EDR_CMDLINE_LEN];
    unsigned long long flags;
    unsigned int dst_ip;
    unsigned short dst_port;
    unsigned int address_family;
    unsigned long long mnt_ns;
    unsigned long long pid_ns;
    long long retval;
    int error_code;
    unsigned char success;
};

There are three new fields.

  • retval: the raw syscall return value
  • error_code: positive errno value when the syscall fails
  • success: whether the syscall succeeded

Linux syscalls usually return a non-negative value on success and a negative errno on failure. For openat, a successful return value is a file descriptor. For example, retval of 3 means the file was opened and fd 3 was returned.

If retval is -13, it is a failure. The errno value 13 corresponds to EACCES, so it can be interpreted as a permission failure. In the log, I keep the negative raw value in retval and store the easier-to-read positive errno in error_code.

retval >= 0 -> success=true,  error_code=0
retval < 0  -> success=false, error_code=-retval

I keep success as a separate field to make the rule engine and log queries simple. I do not want to interpret the sign of retval every time I query JSONL.

jq -c 'select(.event_type == "openat" and .success == true)' log/events.jsonl

These fields are currently meaningful only for openat. But putting them in the common event structure lets the JSON writer and rule engine continue to use one event format. It keeps the MVP structure simple while adding only the result information that is needed.

Extending JSON Lines Output

Even if the kernel fills retval, error_code, and success, they are not useful unless JSONL writes them. So I update src/json_writer.c to include these three fields for openat events.

For execve and connect, these fields are not meaningful yet, so I do not print them for every event. I print them only when event_type is EVENT_OPENAT.

{"event_type":"openat","path":"/etc/shadow","flags":0,"retval":-13,"error_code":13,"success":false}

A successful event looks like this.

{"event_type":"openat","path":"/etc/shadow","flags":0,"retval":3,"error_code":0,"success":true}

Now the JSONL log can show whether /etc/shadow access failed as an attempt or actually received a file descriptor. For example, I can query only successful openat events like this.

jq -c 'select(.event_type == "openat" and .success == true)' log/events.jsonl

I can also query permission failures by error_code.

jq -c 'select(.event_type == "openat" and .error_code == 13)' log/events.jsonl

This change also prepares the rule engine for more accurate decisions. Before, it only looked at the /etc/shadow path and attached SHADOW_READ. Now, even for the same path, it can separate attempts from successes based on success.

Rule Policy Change

The old rule name was SHADOW_READ. But as described above, that name does not accurately describe the behavior. When only openat enter was observed, the program did not know whether reading actually succeeded. Even after connecting exit, what the program observes directly is not the read syscall, but the result of opening the file.

So I split the rule into two rules.

SHADOW_OPEN_ATTEMPT: failed openat attempt for /etc/shadow
SHADOW_OPEN_SUCCESS: successful openat for /etc/shadow

The config file follows the same split.

"shadow_open_attempt": {
  "enabled": true,
  "severity": "medium",
  "paths": ["/etc/shadow"]
},
"shadow_open_success": {
  "enabled": true,
  "severity": "high",
  "paths": ["/etc/shadow"]
}

The policy is simple. If a process tries to open /etc/shadow but fails, I record SHADOW_OPEN_ATTEMPT with medium severity. A failed attempt is worth investigating, but it does not mean the file contents were obtained.

If success=true, I record SHADOW_OPEN_SUCCESS with high severity. In this case, the syscall return value is a file descriptor, so it means the sensitive file was successfully opened.

The rule engine checks success first, then failed attempts.

event_type == openat
  -> path == /etc/shadow
  -> success == true  -> SHADOW_OPEN_SUCCESS
  -> success == false -> SHADOW_OPEN_ATTEMPT

I changed the names to avoid misleading the log reader. SHADOW_READ is short and familiar, but it implies stronger behavior than the implementation actually observes. Since the goal of this post is to separate attempts from successes, the rule names should express that difference directly.

Improving the Test Script

Now that the rule is split, tests should trigger both a failed attempt and a successful access. The old scripts/test_shadow_read.sh only ran a normal-user attempt to read /etc/shadow. Now it triggers two cases in order.

cat /etc/shadow >/dev/null 2>&1 || true
sudo -n cat /etc/shadow >/dev/null 2>&1 || true

The first command is the normal-user failure case. It is usually recorded with retval=-13, error_code=13, and success=false. This event should match SHADOW_OPEN_ATTEMPT.

The second command is the root-permission success case. For test automation, it uses sudo -n. The -n option prevents sudo from asking for a password. If the environment does not allow non-interactive sudo, it fails and the script skips the success case.

The log can be checked like this.

jq -c 'select(.path == "/etc/shadow")' log/events.jsonl

The expected failed event looks like this.

{"event_type":"openat","path":"/etc/shadow","retval":-13,"error_code":13,"success":false,"rule":"SHADOW_OPEN_ATTEMPT","severity":"medium"}

In an environment where non-interactive sudo works, a success event is also recorded.

{"event_type":"openat","path":"/etc/shadow","retval":3,"error_code":0,"success":true,"rule":"SHADOW_OPEN_SUCCESS","severity":"high"}

Errno values can differ depending on distribution, permissions, and security policy. So the key fields to check in tests are success, rule, and severity.

When I test as a normal user, a failed attempt is recorded like this.

{"timestamp":11318339737214,"event_type":"openat","pid":100982,"ppid":100978,"uid":1000,"gid":1000,"comm":"cat","parent_comm":"bash","cmdline":"","path":"/etc/shadow","flags":0,"retval":-13,"error_code":13,"success":false,"dst_ip":null,"dst_port":null,"address_family":0,"mnt_ns":4026531832,"pid_ns":4026531836,"rule":"SHADOW_OPEN_ATTEMPT","severity":"medium"}

In the previous MVP, this event would have been recorded simply as SHADOW_READ. Now retval=-13 and success=false make it clear that the file was not actually opened.

Remaining Limitations

This change makes openat detection more precise, but it does not complete file access detection. In fact, connecting enter and exit makes the remaining problems more visible.

First, the program still observes only openat. There are many ways to open files depending on kernel and libc behavior. Other syscalls such as open, openat2, and creat are not handled yet. The goal of this post is not to cover all file access, but to narrow one specific limitation in the MVP's openat handling.

Second, path interpretation is still simple. The program records the filename string passed as the syscall argument. Absolute paths such as /etc/shadow are easy to interpret, but relative paths combined with dirfd are different. As the name suggests, openat can open a relative path based on a directory file descriptor. In that case, reconstructing the final absolute path from the string alone is difficult.

Third, connecting enter and exit has state-management cost. Values stored in the pending_openat map must be deleted at exit. Even when ring buffer reserve fails, cleanup must still happen. The exit event also needs to handle cases where no pending value is found. This is much more delicate than simply adding one tracepoint.

Fourth, the program does not inspect file contents. It only records whether /etc/shadow was opened, not what was read from it. Building a security tool does not mean sensitive data should be written to logs. In this project, I keep only behavior metadata and do not handle sensitive file contents.

Fifth, success=true does not automatically mean an attack succeeded. Root may open /etc/shadow during normal administration. This log is not a final attack verdict. It is a signal that raises investigation priority. That is why the rule is named SHADOW_OPEN_SUCCESS, not something that declares an attack.

Part 2 Summary

The MVP from part 1 could record file access attempts. But because it only observed sys_enter_openat, it could not tell whether the access actually succeeded. Even when a /etc/shadow event appeared, the log could not distinguish a failed attempt from a successful file descriptor return.

In this post, I solved that by connecting sys_enter_openat and sys_exit_openat. At enter, the program stores path and flags in a pending_openat BPF hash map. At exit, it finds the pending value with the same pid_tgid and combines it with retval. As a result, an openat event becomes a result event containing path, flags, retval, error_code, and success.

In user space, I added retval, error_code, and success to the JSONL output. I also split the rule policy from one SHADOW_READ rule into SHADOW_OPEN_ATTEMPT and SHADOW_OPEN_SUCCESS. Failed attempts are recorded as medium, and successful opens are recorded as high.

With this improvement, the mini EDR can now say not only who touched a sensitive filename, but also whether that access actually succeeded.

In the next post, I move to reducing false positives on the network side. The current SUSPICIOUS_CONNECT only looks at the process name and whether the destination is an external IPv4 address. In part 3, I enrich the same connect event with command line and allowlist context in user space, then interpret it more carefully.

GitHub Repository

The code for this post is available at ebpf-guard part-2.