Building a Small Rust eBPF EDR 3: Process Lifecycle Telemetry
Table of Contents
This post covers process lifecycle telemetry in rand-guard. Process events are one of the most basic signals in an EDR. To interpret file or network activity, the runtime needs to know which process ran, what its parent was, and when it exited.
rand-guard does not treat process events as simple execution logs only. It keeps a userspace process table and uses it as the basis for enriching later events with process context.
Collection Targets
The process-related config in config.example.toml looks like this.
[process]
enabled = true
hooks = ["execve", "fork", "exit", "execveat"]
collect_args = false
collect_env = false
collect_cwd = false
The currently implemented process hooks are execve, execveat, fork, and exit. Command line arguments, environment variables, and current working directory are not collected yet. This is an intentional limit. Handling long argv or env values in eBPF introduces verifier constraints, length limits, and privacy concerns. The MVP focuses on collecting process identity and lifecycle data reliably.
Tracepoint Choice
The userspace runtime attaches tracepoints based on the configured hooks. The key tracepoints are:
sched:sched_process_execsched:sched_process_forksched:sched_process_exitsyscalls:sys_enter_execvesyscalls:sys_enter_execveat
sched_process_exec is a scheduler tracepoint that provides information after a process has actually executed. It can provide the executed filename and process metadata. sys_enter_execve and sys_enter_execveat are used to distinguish the syscall source at entry time.
Why use both? With only sched_process_exec, the runtime can see that an exec occurred, but it is harder to distinguish whether the source was execve or execveat. With only syscall entry tracepoints, it is awkward to represent completed exec lifecycle information. So userspace briefly stores the syscall source in the process table and links it to the following exec event.
Process Table
ProcessTable in crates/user/src/process_table.rs stores process records keyed by (pid, tid). A record contains:
pid,tid,ppiduid,gidcommexe_pathfirst_seen,last_seenexit_timestamp,exitedpending_source
The process table is not just a cache. It is the center of enrichment. File and network events can include the current process comm from eBPF, but context such as parent pid and executable path is better added in userspace.
The flow looks like this.
flowchart TD
A[exec syscall tracepoint] --> B[store pending source]
C[sched_process_exec] --> D[update process table]
E[sched_process_fork] --> F[insert child process]
G[sched_process_exit] --> H[mark process exited]
D --> I[enrich later file/network events]
F --> I
H --> I
Exec Source Correlation
The source of execve and execveat arrives as an ExecSyscallEvent. This event is not written directly to output. It is handled as an internal event. Userspace calls ProcessTable::set_pending_source and stores the source string under (pid, tid).
When sched_process_exec arrives next, ProcessTable::update_from_exec consumes the pending source and inserts it into the normalized ProcessStart event. It appears in output as the source field.
{"event_type":"process_start","pid":123,"comm":"true","source":"execve"}
This is not a full multi-event correlation engine. It is a small piece of state for one purpose: attaching the syscall source to the process start event.
Fork and Parent Relationship
sched_process_fork provides parent and child information. rand-guard turns that into a normalized ProcessRelationship event.
{"event_type":"process_relationship","parent_pid":100,"parent_comm":"bash","child_pid":101,"child_comm":"bash"}
This event can also be used by process rules. For example, the sample config includes a rule that looks for a web server spawning a shell.
[[rules]]
id = "PROC-001"
name = "Shell spawned by web server"
enabled = true
type = "process"
severity = "high"
action = "alert"
parent_names = ["nginx", "apache2", "httpd"]
process_names = ["sh", "bash", "dash"]
This rule is a matcher over a single relationship event. It does not prove a web shell. But if a web server process creates a shell child, it is a signal worth investigating.
Exit and Eviction
When sched_process_exit arrives, the process table does not delete the record immediately. It marks the record as exited = true. This keeps state available briefly after exit and also makes health/debug behavior easier to reason about.
The table cannot grow forever. The config includes limits.
[performance]
max_process_cache_entries = 5000
max_pending_exec_sources = 500
When the record count exceeds the limit, exited records are removed first, followed by the oldest records. This policy is simple, but good enough for the MVP. There is a tradeoff: once a process is evicted, later file or network events may lose enrichment. That limit can be observed through health record fields such as process cache size and eviction count.
How Process Events Help Other Events
The value of process telemetry is mostly in context. Knowing that /etc/systemd/system/demo.service was modified is useful, but not enough. The meaning changes depending on which process modified it, what its parent process was, and what executable path it had.
Network events are similar. A connection to port 4444 is less important than knowing which process made that connection. The false positive profile changes depending on whether it was nc, python, or a normal agent.
That is why rand-guard keeps the process table at the center of userspace. It does not try to collect all context at once in eBPF. Instead, it uses lifecycle events to enrich later telemetry in userspace.
Demo
After the agent is running, execute a simple process.
/bin/true
The exact output depends on the environment, but records like these can appear.
{"event_type":"process_start","comm":"true","source":"execve"}
{"event_type":"process_exit","comm":"true"}
Real records include additional fields such as timestamp, pid, tid, uid, gid, ppid, exe_path, and truncation flags.
Current Limits
The current process telemetry does not collect argv, env, or cwd. That means command-line based rules and environment-based detections are not implemented. The process table is a bounded cache, so old process context can be evicted. The project also does not defend against a root attacker stopping the agent or changing its config.
That tradeoff is intentional. The priority is to collect lifecycle events reliably and use them to enrich file and network telemetry.
Part 3 Summary
This post covered process lifecycle telemetry.
- Process events are collected around
execve,execveat,fork, andexit. - Syscall source is connected to
sched_process_execusing a small piece of state. - The userspace process table is the basis for enriching later file and network events.
- The process cache has limits and an eviction policy.
The next post covers file telemetry and persistence-sensitive detections. It explains how open, write, rename, and unlink events lead to visibility into systemd service and cron path modifications.