Exploring the Security Boundaries of the Linux eBPF Verifier Through Experiments

28th Apr 2026
14 min read
Tags:
Linux Kernel,
eBPF,
Analysis

eBPF lets us run user-defined programs inside the Linux kernel. Its biggest advantage is that it allows us to extend kernel behavior without directly modifying the kernel itself, in areas such as packet processing, tracing, observability, and security policy enforcement.

But running code inside the kernel is inherently risky. If invalid pointer accesses, invalid memory reads, non-terminating programs, or invalid helper calls were allowed, they could break kernel stability and security.

That is why Linux does not execute eBPF programs immediately. Before a program is loaded into the kernel, the eBPF verifier checks it first. Only programs accepted by the verifier can run inside the kernel. Programs rejected by the verifier are not loaded. In that sense, the verifier is not just a pre-flight checker. It is a security boundary between eBPF programs and the kernel.

This post records a small set of experiments that explore that boundary. The goal is not to explain every internal detail of the verifier. Instead, we will look at real verifier reject logs and trace what kind of conditions the verifier treats as safety conditions, and why certain programs are rejected.

In one sentence:

The rejects observed in this experiment were not evidence of excessive verifier conservatism. They were mostly the result of basic memory safety boundaries being enforced, such as context access, nullability, and map value bounds.

Experiment Setup

The experiments were run against a QEMU/KVM-based guest kernel. The eBPF test programs were built on the host, but the actual program loading was performed against the guest kernel. Since verifier behavior depends heavily on the kernel version and configuration, all results were judged based on the guest kernel, not the host environment.

You can think of all test programs here as small eBPF programs. Some are safe programs that should be accepted. Others intentionally contain invalid accesses that the verifier should reject. Each test was kept deliberately small so that we could observe how the verifier handles one specific pattern.

For each test, I recorded whether the program load succeeded, the verifier log, the number of processed instructions, stack depth, total/peak states, and the reject category. Here, processed insns is the number of instructions processed as reported in the verifier log. stack depth is the stack usage depth computed by the verifier. total/peak states is used as an auxiliary signal for understanding how many states the verifier tracked.

The main kernel used in this post is v7.0-rc6-20-g9147566d8016. The commit is 9147566d801602c9e7fc7f85e989735735bf38ba. Later in the post, I also compare the results with v6.6.

Baseline Results

I first ran a set of smoke tests and an initial synthetic benchmark suite. There were 9 tests in total, and every test behaved as expected.

The test names are just labels used for the experiment. verifier-pass is the smallest passing case. verifier-fail is a failing case that intentionally performs an invalid context access. The synth-* tests are synthetic benchmarks designed to isolate specific verifier conditions.

Item	Value
Total tests	9
Expected accept cases	6
Expected reject cases	3
Actually passed	9
Actually failed	0

The per-pattern results were as follows.

Test	Pattern	Expected	Actual	Processed insns	Stack depth	Reject category
`verifier-pass`	smallest valid program	accept	accept	2	0	none
`verifier-fail`	disallowed context access	reject	reject	1	0	invalid memory access
`synth-map-missing-null-check`	map value access without null-check	reject	reject	8	8	null-check reasoning failure
`synth-map-value-bounds-fail`	out-of-bounds map value access	reject	reject	10	8	map value bounds failure
`tracepoint-smoke`	tracepoint-based runtime smoke test	accept	accept	11	8	none
`synth-map-null-check`	map value access after null-check	accept	accept	13	8	none
`synth-branch-fanout`	increased conditional branches	accept	accept	20	8	none
`synth-stack-access`	stack access	accept	accept	26	64	none
`synth-bounded-loop`	bounded loop	accept	accept	31	0	none

The important point is not merely that every test behaved as expected. The more interesting point is that the three rejected cases were rejected for different reasons. A verifier rejection is not just a generic failure. It is the result of failing one of several safety conditions.

In this experiment, three reject boundaries stood out.

Boundary	Representative test	Key log message
context access	`verifier-fail`	`invalid bpf_context access`
nullability	`synth-map-missing-null-check`	`map_value_or_null`
map value bounds	`synth-map-value-bounds-fail`	`invalid access to map value`

Let us look at them one by one.

Boundary 1: Invalid Context Access

The simplest rejected case is verifier-fail. This test is an eBPF socket filter program that attempts to access a disallowed offset in its context. The exact code is not important here. The key point is that the program tries to read from a context location that is not allowed for the current program type. The core verifier log message is:

invalid bpf_context access off=80 size=4

Each eBPF program type receives a different kind of context. A tracepoint program, a socket filter program, and an XDP program all see different contexts. The fields and offsets that can be accessed are also defined per context.

verifier-fail crosses that boundary. It tries to read from a location that is not allowed for the socket context, so the verifier rejects the program load. This is hard to interpret as a conservatism issue. It is a clear case of memory safety enforcement that prevents the program from reading an invalid kernel memory location.

The first boundary, then, is the context access boundary. An eBPF program cannot freely reinterpret the context of the program type it is attached to. The verifier restricts access based on the context layout and allowed offsets.

Boundary 2: Null-Check After Map Lookup

The second case is null-checking after a map lookup. The two tests are:

synth-map-null-check
synth-map-missing-null-check

Both programs use bpf_map_lookup_elem. The difference is whether the program checks the return value for null before using it. In pseudo-code, the passing version looks like this:

value = bpf_map_lookup_elem(&map, &key);
if (!value)
    return 0;
return value->field;

With the null-check, the program is accepted. In contrast, the following shape fails:

value = bpf_map_lookup_elem(&map, &key);
return value->field;

synth-map-null-check, which performs the null-check, passed the verifier. synth-map-missing-null-check, which dereferences the map value immediately without checking for null, was rejected. The key log message was:

R0 invalid mem access 'map_value_or_null'

bpf_map_lookup_elem returns a pointer to the map value if the key exists, but it can return null if the key is not found. The verifier does not treat the return value as a safe pointer immediately. Right after the helper call, the return value has the state map_value_or_null.

If the code explicitly checks for null, the verifier can refine the pointer state to map_value on the non-null path. From that point on, the program can read or write the map value. Without the null-check, however, the verifier still sees a possible null value. The moment the program tries to dereference a pointer that might be null, it is rejected.

The lesson is simple. It is not enough for the developer to believe that “this key should always exist.” The verifier must be able to prove it. In eBPF code, safety conditions need to be visible in the structure of the code.

Boundary 3: Map Value Bounds

The third rejected case is synth-map-value-bounds-fail. This program performs a null-check after map lookup, but then tries to read from an offset outside the map value. Simplified, it is like trying to read 8 bytes starting at offset 8 from an 8-byte map value. Since offset 8 is already the end of the value, the access goes out of bounds.

The key verifier log message was:

invalid access to map value, value_size=8 off=8 size=8
R0 min value is outside of the allowed memory range

This case is important because it shows that null-checking and bounds checking are separate conditions. Just because the map lookup result is non-null does not mean the program can read arbitrary bytes from the map value. If the map value is 8 bytes and the program reads 8 bytes at offset 8, the access range goes past the end of the value.

Map access has at least two safety conditions.

Condition	Meaning
nullability	the program must prove that the map lookup result is not null
bounds	the access offset and size must stay within the map value size

synth-map-null-check satisfied the first condition. synth-map-value-bounds-fail also satisfied the first condition, but failed the second one. The verifier logs show these failures as different messages.

Looking at these three rejected cases, the verifier’s security boundary is not a single accept/reject line. It is a combination of smaller boundaries, such as context access, pointer nullability, and map value bounds.

A Quick Look at Verifier Cost Signals

Although the main purpose of this experiment was to study verifier security boundaries, a few tests also showed signals related to verifier cost.

Test	Pattern	Processed insns	Stack depth
`synth-branch-fanout`	branch fan-out	20	8
`synth-stack-access`	stack access	26	64
`synth-bounded-loop`	bounded loop	31	0

In the first run, synth-bounded-loop had the largest number of processed instructions. But there is an important caveat: the verifier does not analyze the C source code directly. It analyzes the final BPF bytecode. A loop in the C source does not necessarily appear as the same source-level loop to the verifier. Compiler optimizations and unrolling can change the final instruction stream.

synth-stack-access is also worth noting. It had fewer processed instructions than synth-bounded-loop, but its stack depth was the largest at 64. The verifier log included detailed written/live state for stack slots. This suggests that stack access patterns can affect verifier stack-state tracking and log size, not just instruction count.

That said, these results are not enough to claim which pattern is the most expensive for the verifier. To make that claim, we would need a matrix experiment that gradually increases branch count, loop bounds, and stack usage. At this stage, the conclusion is more modest:

To understand verifier behavior, looking only at the C code is not enough. You need to inspect the final BPF instruction stream and the verifier log as well.

What Realistic Micro-Programs Showed

Synthetic benchmarks are useful because they isolate one boundary at a time. A single null-check difference or a single bounds difference is easy to control. But real eBPF programs usually combine map lookup, map update, helper calls, and compound conditionals.

So I also ran a few realistic micro-programs. These are not production workloads copied as-is. They are small programs that compress common structures found in real eBPF code.

Test	Pattern	Result	Processed insns	Total/peak states	Stack depth
`realistic-helper-call`	helper call and scalar arithmetic	accept	7	0/0	0
`realistic-map-update`	map lookup, null-check, and update	accept	26	2/2	16
`realistic-compound-filter`	filter-like compound conditions	accept	28	3/3	0

realistic-helper-call was relatively simple. In contrast, realistic-map-update and realistic-compound-filter showed total/peak states more clearly. The compound condition program recorded total_states=3 and peak_states=3. The map lookup/update combination recorded total_states=2 and peak_states=2.

This result shows that synthetic benchmarks and realistic micro-programs serve different purposes. Synthetic benchmarks are good for isolating one boundary. Realistic micro-programs are better for observing how the verifier tracks paths and states in code shapes closer to what developers actually write.

The two types of tests are complementary, not competing. If you want to understand a specific verifier rule, a synthetic case is useful. If you want to understand the code shapes developers run into in practice, realistic micro-programs are necessary.

Do Results Change Across Kernel Versions?

I ran the same 12 tests on v6.6 and v7.0-rc6-20-g9147566d8016. The 12 tests consist of the 9 baseline tests above plus the 3 realistic micro-programs. The short answer is that, for this test set, there was no accept/reject difference.

Test	Expected	v6.6	latest-rc
`realistic-compound-filter`	accept	accept	accept
`realistic-helper-call`	accept	accept	accept
`realistic-map-update`	accept	accept	accept
`synth-bounded-loop`	accept	accept	accept
`synth-branch-fanout`	accept	accept	accept
`synth-map-missing-null-check`	reject	reject	reject
`synth-map-null-check`	accept	accept	accept
`synth-map-value-bounds-fail`	reject	reject	reject
`synth-stack-access`	accept	accept	accept
`tracepoint-smoke`	accept	accept	accept
`verifier-fail`	reject	reject	reject
`verifier-pass`	accept	accept	accept

However, the quantitative metrics were not identical. For example, synth-map-null-check processed 14 instructions on v6.6, but 13 on latest-rc. tracepoint-smoke processed 12 instructions on v6.6, but 11 on latest-rc. Some tests also had different stack depths: 4 on v6.6 and 8 on latest-rc.

In summary:

Comparison item	Result
accept/reject	identical on `v6.6` and latest-rc
processed insns	differed in some tests
stack depth	differed in some tests
conclusion	pass/fail alone is not enough to explain verifier differences

From a developer’s perspective, whether the program loads may be the most important question. But if you want to analyze verifier behavior, pass/fail is not enough. Even when the final decision is the same, internal accounting, stack depth calculation, log formatting, and processed instruction counts can differ across kernel versions.

This is why eBPF experiment results should always record the kernel version and commit. Saying “it passed on my machine” is not enough. When comparing verifier behavior or writing a bug report, it is better to include the kernel version, program type, and verifier log together.

So, Is the Verifier Conservative?

One of the initial questions was whether the verifier rejects programs that are likely safe in practice. The eBPF verifier has to prioritize soundness, so it is necessarily conservative to some degree. This can lead to safe-looking but rejected programs.

However, the rejected programs in this test set were all close to clear safety violations.

Test	Reject reason	Interpretation
`verifier-fail`	disallowed context offset access	invalid memory access blocked
`synth-map-missing-null-check`	dereference of a possibly null map value	nullability violation
`synth-map-value-bounds-fail`	access outside map value size	bounds violation

So these results should not be used to claim that the verifier is excessively conservative. This experiment shows how the verifier applies basic memory safety boundaries, rather than proving excessive conservatism.

In fact, these results are better understood as a baseline for future experiments. To find safe-looking but rejected cases, we need programs where a human can see a safety invariant, but the verifier cannot prove that invariant. Possible candidates include cases where scalar range reasoning is insufficient, packet bounds checks are spread across complex control flow, or helper preconditions are difficult for the verifier to follow from the code structure.

Summary

The experiment identified three basic verifier boundaries.

Boundary	Representative log	Meaning
context access	`invalid bpf_context access`	program-type-specific context access restriction
nullability	`map_value_or_null`	tracking the nullability of map lookup results
map value bounds	`invalid access to map value`	map values can only be accessed within their defined size

The experiment also showed that synthetic benchmarks and realistic micro-programs provide different kinds of information. Synthetic benchmarks are useful for isolating specific boundaries. Realistic micro-programs are better for observing state tracking in code shapes closer to real development patterns. The cross-version comparison also showed that even when accept/reject results are identical, quantitative metrics such as processed instruction count and stack depth can differ.

When working with the eBPF verifier, the practical lessons are:

Check map lookup results for null before using them.
Access map values only within their defined value size.
When debugging verifier rejects, first find the key verifier log message.
Do not look only at the C source; also inspect the verifier log and the BPF instruction stream.
Record the kernel version and commit together with the result.
Look beyond pass/fail. Processed instructions, states, and stack depth can reveal useful information.

As a next step, a small comparison experiment that varies branch count, loop bounds, and stack usage would make the verifier cost signals clearer. After that, safe-looking but rejected cases should be designed separately. That is where we can start discussing verifier conservatism more concretely.