Exploring the Security Boundaries of the Linux eBPF Verifier Through Experiments

eBPF lets us run user-defined programs inside the Linux kernel. Its biggest advantage is that it allows us to extend kernel behavior without directly modifying the kernel itself, in areas such as packet processing, tracing, observability, and security policy enforcement.

But running code inside the kernel is inherently risky. If invalid pointer accesses, invalid memory reads, non-terminating programs, or invalid helper calls were allowed, they could break kernel stability and security.

That is why Linux does not execute eBPF programs immediately. Before a program is loaded into the kernel, the eBPF verifier checks it first. Only programs accepted by the verifier can run inside the kernel. Programs rejected by the verifier are not loaded. In that sense, the verifier is not just a pre-flight checker. It is a security boundary between eBPF programs and the kernel.

This post records a small set of experiments that explore that boundary. The goal is not to explain every internal detail of the verifier. Instead, we will look at real verifier reject logs and trace what kind of conditions the verifier treats as safety conditions, and why certain programs are rejected.

In one sentence:

The rejects observed in this experiment were not evidence of excessive verifier conservatism. They were mostly the result of basic memory safety boundaries being enforced, such as context access, nullability, and map value bounds.

Experiment Setup

The experiments were run against a QEMU/KVM-based guest kernel. The eBPF test programs were built on the host, but the actual program loading was performed against the guest kernel. Since verifier behavior depends heavily on the kernel version and configuration, all results were judged based on the guest kernel, not the host environment.

You can think of all test programs here as small eBPF programs. Some are safe programs that should be accepted. Others intentionally contain invalid accesses that the verifier should reject. Each test was kept deliberately small so that we could observe how the verifier handles one specific pattern.

For each test, I recorded whether the program load succeeded, the verifier log, the number of processed instructions, stack depth, total/peak states, and the reject category. Here, processed insns is the number of instructions processed as reported in the verifier log. stack depth is the stack usage depth computed by the verifier. total/peak states is used as an auxiliary signal for understanding how many states the verifier tracked.

The main kernel used in this post is v7.0-rc6-20-g9147566d8016. The commit is 9147566d801602c9e7fc7f85e989735735bf38ba. Later in the post, I also compare the results with v6.6.

Baseline Results

I first ran a set of smoke tests and an initial synthetic benchmark suite. There were 9 tests in total, and every test behaved as expected.

The test names are just labels used for the experiment. verifier-pass is the smallest passing case. verifier-fail is a failing case that intentionally performs an invalid context access. The synth-* tests are synthetic benchmarks designed to isolate specific verifier conditions.

ItemValue
Total tests9
Expected accept cases6
Expected reject cases3
Actually passed9
Actually failed0

The per-pattern results were as follows.

TestPatternExpectedActualProcessed insnsStack depthReject category
verifier-passsmallest valid programacceptaccept20none
verifier-faildisallowed context accessrejectreject10invalid memory access
synth-map-missing-null-checkmap value access without null-checkrejectreject88null-check reasoning failure
synth-map-value-bounds-failout-of-bounds map value accessrejectreject108map value bounds failure
tracepoint-smoketracepoint-based runtime smoke testacceptaccept118none
synth-map-null-checkmap value access after null-checkacceptaccept138none
synth-branch-fanoutincreased conditional branchesacceptaccept208none
synth-stack-accessstack accessacceptaccept2664none
synth-bounded-loopbounded loopacceptaccept310none

The important point is not merely that every test behaved as expected. The more interesting point is that the three rejected cases were rejected for different reasons. A verifier rejection is not just a generic failure. It is the result of failing one of several safety conditions.

In this experiment, three reject boundaries stood out.

BoundaryRepresentative testKey log message
context accessverifier-failinvalid bpf_context access
nullabilitysynth-map-missing-null-checkmap_value_or_null
map value boundssynth-map-value-bounds-failinvalid access to map value

Let us look at them one by one.

Boundary 1: Invalid Context Access

The simplest rejected case is verifier-fail. This test is an eBPF socket filter program that attempts to access a disallowed offset in its context. The exact code is not important here. The key point is that the program tries to read from a context location that is not allowed for the current program type. The core verifier log message is:

invalid bpf_context access off=80 size=4

Each eBPF program type receives a different kind of context. A tracepoint program, a socket filter program, and an XDP program all see different contexts. The fields and offsets that can be accessed are also defined per context.

verifier-fail crosses that boundary. It tries to read from a location that is not allowed for the socket context, so the verifier rejects the program load. This is hard to interpret as a conservatism issue. It is a clear case of memory safety enforcement that prevents the program from reading an invalid kernel memory location.

The first boundary, then, is the context access boundary. An eBPF program cannot freely reinterpret the context of the program type it is attached to. The verifier restricts access based on the context layout and allowed offsets.

Boundary 2: Null-Check After Map Lookup

The second case is null-checking after a map lookup. The two tests are:

  • synth-map-null-check
  • synth-map-missing-null-check

Both programs use bpf_map_lookup_elem. The difference is whether the program checks the return value for null before using it. In pseudo-code, the passing version looks like this:

value = bpf_map_lookup_elem(&map, &key);
if (!value)
    return 0;
return value->field;

With the null-check, the program is accepted. In contrast, the following shape fails:

value = bpf_map_lookup_elem(&map, &key);
return value->field;

synth-map-null-check, which performs the null-check, passed the verifier. synth-map-missing-null-check, which dereferences the map value immediately without checking for null, was rejected. The key log message was:

R0 invalid mem access 'map_value_or_null'

bpf_map_lookup_elem returns a pointer to the map value if the key exists, but it can return null if the key is not found. The verifier does not treat the return value as a safe pointer immediately. Right after the helper call, the return value has the state map_value_or_null.

If the code explicitly checks for null, the verifier can refine the pointer state to map_value on the non-null path. From that point on, the program can read or write the map value. Without the null-check, however, the verifier still sees a possible null value. The moment the program tries to dereference a pointer that might be null, it is rejected.

The lesson is simple. It is not enough for the developer to believe that “this key should always exist.” The verifier must be able to prove it. In eBPF code, safety conditions need to be visible in the structure of the code.

Boundary 3: Map Value Bounds

The third rejected case is synth-map-value-bounds-fail. This program performs a null-check after map lookup, but then tries to read from an offset outside the map value. Simplified, it is like trying to read 8 bytes starting at offset 8 from an 8-byte map value. Since offset 8 is already the end of the value, the access goes out of bounds.

The key verifier log message was:

invalid access to map value, value_size=8 off=8 size=8
R0 min value is outside of the allowed memory range

This case is important because it shows that null-checking and bounds checking are separate conditions. Just because the map lookup result is non-null does not mean the program can read arbitrary bytes from the map value. If the map value is 8 bytes and the program reads 8 bytes at offset 8, the access range goes past the end of the value.

Map access has at least two safety conditions.

ConditionMeaning
nullabilitythe program must prove that the map lookup result is not null
boundsthe access offset and size must stay within the map value size

synth-map-null-check satisfied the first condition. synth-map-value-bounds-fail also satisfied the first condition, but failed the second one. The verifier logs show these failures as different messages.

Looking at these three rejected cases, the verifier’s security boundary is not a single accept/reject line. It is a combination of smaller boundaries, such as context access, pointer nullability, and map value bounds.

A Quick Look at Verifier Cost Signals

Although the main purpose of this experiment was to study verifier security boundaries, a few tests also showed signals related to verifier cost.

TestPatternProcessed insnsStack depth
synth-branch-fanoutbranch fan-out208
synth-stack-accessstack access2664
synth-bounded-loopbounded loop310

In the first run, synth-bounded-loop had the largest number of processed instructions. But there is an important caveat: the verifier does not analyze the C source code directly. It analyzes the final BPF bytecode. A loop in the C source does not necessarily appear as the same source-level loop to the verifier. Compiler optimizations and unrolling can change the final instruction stream.

synth-stack-access is also worth noting. It had fewer processed instructions than synth-bounded-loop, but its stack depth was the largest at 64. The verifier log included detailed written/live state for stack slots. This suggests that stack access patterns can affect verifier stack-state tracking and log size, not just instruction count.

That said, these results are not enough to claim which pattern is the most expensive for the verifier. To make that claim, we would need a matrix experiment that gradually increases branch count, loop bounds, and stack usage. At this stage, the conclusion is more modest:

To understand verifier behavior, looking only at the C code is not enough. You need to inspect the final BPF instruction stream and the verifier log as well.

What Realistic Micro-Programs Showed

Synthetic benchmarks are useful because they isolate one boundary at a time. A single null-check difference or a single bounds difference is easy to control. But real eBPF programs usually combine map lookup, map update, helper calls, and compound conditionals.

So I also ran a few realistic micro-programs. These are not production workloads copied as-is. They are small programs that compress common structures found in real eBPF code.

TestPatternResultProcessed insnsTotal/peak statesStack depth
realistic-helper-callhelper call and scalar arithmeticaccept70/00
realistic-map-updatemap lookup, null-check, and updateaccept262/216
realistic-compound-filterfilter-like compound conditionsaccept283/30

realistic-helper-call was relatively simple. In contrast, realistic-map-update and realistic-compound-filter showed total/peak states more clearly. The compound condition program recorded total_states=3 and peak_states=3. The map lookup/update combination recorded total_states=2 and peak_states=2.

This result shows that synthetic benchmarks and realistic micro-programs serve different purposes. Synthetic benchmarks are good for isolating one boundary. Realistic micro-programs are better for observing how the verifier tracks paths and states in code shapes closer to what developers actually write.

The two types of tests are complementary, not competing. If you want to understand a specific verifier rule, a synthetic case is useful. If you want to understand the code shapes developers run into in practice, realistic micro-programs are necessary.

Do Results Change Across Kernel Versions?

I ran the same 12 tests on v6.6 and v7.0-rc6-20-g9147566d8016. The 12 tests consist of the 9 baseline tests above plus the 3 realistic micro-programs. The short answer is that, for this test set, there was no accept/reject difference.

TestExpectedv6.6latest-rc
realistic-compound-filteracceptacceptaccept
realistic-helper-callacceptacceptaccept
realistic-map-updateacceptacceptaccept
synth-bounded-loopacceptacceptaccept
synth-branch-fanoutacceptacceptaccept
synth-map-missing-null-checkrejectrejectreject
synth-map-null-checkacceptacceptaccept
synth-map-value-bounds-failrejectrejectreject
synth-stack-accessacceptacceptaccept
tracepoint-smokeacceptacceptaccept
verifier-failrejectrejectreject
verifier-passacceptacceptaccept

However, the quantitative metrics were not identical. For example, synth-map-null-check processed 14 instructions on v6.6, but 13 on latest-rc. tracepoint-smoke processed 12 instructions on v6.6, but 11 on latest-rc. Some tests also had different stack depths: 4 on v6.6 and 8 on latest-rc.

In summary:

Comparison itemResult
accept/rejectidentical on v6.6 and latest-rc
processed insnsdiffered in some tests
stack depthdiffered in some tests
conclusionpass/fail alone is not enough to explain verifier differences

From a developer’s perspective, whether the program loads may be the most important question. But if you want to analyze verifier behavior, pass/fail is not enough. Even when the final decision is the same, internal accounting, stack depth calculation, log formatting, and processed instruction counts can differ across kernel versions.

This is why eBPF experiment results should always record the kernel version and commit. Saying “it passed on my machine” is not enough. When comparing verifier behavior or writing a bug report, it is better to include the kernel version, program type, and verifier log together.

So, Is the Verifier Conservative?

One of the initial questions was whether the verifier rejects programs that are likely safe in practice. The eBPF verifier has to prioritize soundness, so it is necessarily conservative to some degree. This can lead to safe-looking but rejected programs.

However, the rejected programs in this test set were all close to clear safety violations.

TestReject reasonInterpretation
verifier-faildisallowed context offset accessinvalid memory access blocked
synth-map-missing-null-checkdereference of a possibly null map valuenullability violation
synth-map-value-bounds-failaccess outside map value sizebounds violation

So these results should not be used to claim that the verifier is excessively conservative. This experiment shows how the verifier applies basic memory safety boundaries, rather than proving excessive conservatism.

In fact, these results are better understood as a baseline for future experiments. To find safe-looking but rejected cases, we need programs where a human can see a safety invariant, but the verifier cannot prove that invariant. Possible candidates include cases where scalar range reasoning is insufficient, packet bounds checks are spread across complex control flow, or helper preconditions are difficult for the verifier to follow from the code structure.

Summary

The experiment identified three basic verifier boundaries.

BoundaryRepresentative logMeaning
context accessinvalid bpf_context accessprogram-type-specific context access restriction
nullabilitymap_value_or_nulltracking the nullability of map lookup results
map value boundsinvalid access to map valuemap values can only be accessed within their defined size

The experiment also showed that synthetic benchmarks and realistic micro-programs provide different kinds of information. Synthetic benchmarks are useful for isolating specific boundaries. Realistic micro-programs are better for observing state tracking in code shapes closer to real development patterns. The cross-version comparison also showed that even when accept/reject results are identical, quantitative metrics such as processed instruction count and stack depth can differ.

When working with the eBPF verifier, the practical lessons are:

  • Check map lookup results for null before using them.
  • Access map values only within their defined value size.
  • When debugging verifier rejects, first find the key verifier log message.
  • Do not look only at the C source; also inspect the verifier log and the BPF instruction stream.
  • Record the kernel version and commit together with the result.
  • Look beyond pass/fail. Processed instructions, states, and stack depth can reveal useful information.

As a next step, a small comparison experiment that varies branch count, loop bounds, and stack usage would make the verifier cost signals clearer. After that, safe-looking but rejected cases should be designed separately. That is where we can start discussing verifier conservatism more concretely.