Why 12 eBPF Partial-Init Variants That Looked Equivalent Produced Different Verification Results

I wrote an eBPF program that put a local struct on the stack, initialized only some of its fields, and then caused it to be read. At first, I assumed the verifier would obviously reject code like this. To a human reader, some bytes were clearly left unwritten, and it was already well known that the eBPF verifier applies a strict write-before-read rule to the stack.

But the actual experiment unfolded a bit differently from what I first expected. When I loaded a tracepoint program through a root path, partial-init reads passed much more permissively than I expected. Only after tracing the verifier code path did I confirm that the result was tied to allow_uninit_stack and the CAP_PERFMON path. So I reset the main experiment around the socket + verifier_runner_noperfmon + -O2 context. Only then did the differences I originally expected around partial-init, branch join, and read range reproduce consistently.

This article asks three questions.

  1. At what granularity does the verifier judge whether a stack object is initialized?
  2. After a branch rejoins, how does initializedness information survive, and how does it disappear?
  3. Why do the actual read range and the shape of the code change the verification result?

The point is not that partial initialization is always bad. Even when code looks almost equivalent to a human reader, the verifier views the program through the range of initialized bytes and per-path state. So even if two snippets look similar at the source level, the verdict can still differ.

Background

The verifier does not directly understand C source. It examines the CFG of compiled BPF instructions and tracks register types and stack state along each possible path. The stack is no exception. The decision hinges on which offsets actually receive stores, which bytes are initialized, and how state can be merged after branches.

In this article, I keep coming back to four things.

  1. Which stack offsets actually received writes.
  2. Which byte ranges are in an initialized state.
  3. What information survives at a join point after branching.
  4. Whether the actual read range goes beyond the initialized range.

Without this viewpoint, it is easy to keep interpreting the code only in terms of C semantics. With it, the verifier's behavior becomes explainable.

Why I Had To Reset the Control Variables After the Preliminary Experiment

Before the main experiment, the preliminary run went off track once. I first built partial-init cases A-F as tracepoint programs. But when I loaded them through a root path, partial-init read cases like B-E passed unexpectedly. Even after changing helpers and keeping direct stack reads through noinline consumer functions, the overall outcome did not change much.

The cause was the verifier's allow_uninit_stack path. In kernel/bpf/verifier.c, the verifier environment is initialized with the following value.

env->allow_uninit_stack = bpf_allow_uninit_stack(env->prog->aux->token);

And in include/linux/bpf.h, that value is effectively tied to CAP_PERFMON.

static inline bool bpf_allow_uninit_stack(const struct bpf_token *token)
{
    return bpf_token_capable(token, CAP_PERFMON);
}

The stack-read check also contains the following condition.

if (type == STACK_INVALID && env->allow_uninit_stack)
    continue;

In other words, even if the bytes being read are STACK_INVALID, the verifier does not immediately treat that as an error when allow_uninit_stack is enabled. The surprising success I saw at first was not a verifier bug, but privilege-dependent behavior.

This preliminary experiment works better as a section explaining why I had to reset the main baseline than as a centerpiece case. The tables and analysis in the main body should all be read in the socket + verifier_runner_noperfmon + -O2 context.

Main Experiment Design

The main experiment consists of one baseline and a family of 12 partial-init variants. The baseline context is as follows.

  • Kernel version: Linux 7.0.0-rc6-00020-g9147566d8016
  • clang/LLVM version: 22.1.3
  • bpftool version: bpftool v7.7.0, libbpf v1.7 (host side)
  • Program type: socket
  • Loader: verifier_runner_noperfmon
  • Optimization level: -O2

The shared struct used across the experiment is below.

struct partial_init_event {
    __u64 pid_tgid;
    __u64 ts;
    __u32 cpu;
    __u32 tag;
};

I fixed the read pattern into two forms.

static __noinline int partial_init_consume_full(struct partial_init_event *event)
{
    volatile struct partial_init_event *src = event;
    struct partial_init_event copy;

    copy.pid_tgid = src->pid_tgid;
    copy.ts = src->ts;
    copy.cpu = src->cpu;
    copy.tag = src->tag;

    return (__u32)copy.pid_tgid + (__u32)copy.ts + copy.cpu + copy.tag;
}

static __noinline int partial_init_consume_first(struct partial_init_event *event)
{
    volatile struct partial_init_event *src = event;
    __u64 pid_tgid;

    pid_tgid = src->pid_tgid;
    return (__u32)pid_tgid;
}

consume_full() reads the whole struct, while consume_first() reads only the first field. This contrast let me separate whether the issue was partial initialization itself, or the actual read range.

Variant Set

This article uses the following 12 cases.

casecode shapeobservation point
Azero-init + full readbaseline
Binitialize one field + full readwhole read range and uninitialized bytes
Cinitialize two fields + full readlater fields left uninitialized
Dinitialize on only one branch + full readbranch join
Ebranch A/B initialize different fields + full readmerge of per-path initializedness
Finitialize one field + narrow readeffect of shrinking the read range
Gcopy to a temporary after partial initeffect of temporary copy
Hpartial init inside an inline functioneffect of inlining
Iearly return structureCFG rewrite
Jnested if structurenested join
Kfull field-by-field initialization without zero-initis full init enough?
Loverwrite some fields after memseteffect of zeroed remainder

Summary Table of Results

The actual results are easiest to read when summarized in a table first.

casecode shapeexpectationactualkey log lineinitial reading
Azero-init + full readpasspassprocessed 18 insnsbaseline
Binitialize one field + full readexpected failfailinvalid read from stack off -16+0 size 8partial-init full read
Cinitialize two fields + full readexpected failfailinvalid read from stack off -8+0 size 4later field uninitialized
Dinitialize on only one branch + full readexpected failfailinvalid read from stack off -16+0 size 8effect of branch join
Edifferent fields initialized on each branch + full readexpected failfailinvalid read from stack off -16+0 size 8merge of per-path initializedness failed
Fpartial init + narrow readexpected passpassprocessed 8 insnseffect of read range
Gpass through a temporary copyexpected failfailinvalid read from stack off -16+0 size 8not hidden by temporary copy
Hinitialize inside inline functionexpected failfailinvalid read from stack off -16+0 size 8same issue even when inlined
Iearly returnexpected failfailinvalid read from stack off -16+0 size 8rejected even after CFG rewrite
Jnested ifexpected failfailinvalid read from stack off -16+0 size 8rejected even with nested join
Kfull field-by-field initializationexpected passpassprocessed 18 insnspasses without zero-init if fully initialized
Loverwrite after memsetexpected passpassprocessed 17 insnszeroed remainder is sufficient

This table is the center of the article. There is no need to explain all 12 cases at full length. It is enough to dig deeply into B, E, and F.

Case B: Initializing Only Some Fields, Then Reading the Whole Object

Case B is the simplest partial-init failure. It fills only pid_tgid, then partial_init_consume_full() reads pid_tgid, ts, cpu, and tag in sequence. That makes it the fastest way to show what it means for a read to go beyond the initialized byte range.

Why It Clashed With Intuition

It seemed obvious that initializing only one field should fail. To a human reader, ts, cpu, and tag are plainly left untouched. But the key point in this case is not just that those bytes were unwritten. It is how far the program actually reads.

Verifier Log

7: (79) r2 = *(u64 *)(r1 +0)          ; frame1: R1=fp[0]-24 R2=0x1111111111111111
; copy.ts = src->ts; @ partial_init_defs.h:35
8: (79) r0 = *(u64 *)(r1 +8)
invalid read from stack off -16+0 size 8

The first 8-byte read of pid_tgid succeeds, but the very next 8-byte read of ts immediately triggers invalid read from stack.

Key Instructions

0: (18) r1 = 0x1111111111111111
2: (7b) *(u64 *)(r10 -24) = r1
3: (bf) r1 = r10
4: (07) r1 += -24
5: (85) call pc+1
7: (79) r2 = *(u64 *)(r1 +0)
8: (79) r0 = *(u64 *)(r1 +8)

There is only one store to fp-24, and then the consumer reads r1 +0, r1 +8, r1 +16, and r1 +20 in order. The written range and the read range do not match.

Interpretation

In this case, only the first 8 bytes holding pid_tgid are initialized. The next 8 bytes for ts, the following 4 bytes for cpu, and the last 4 bytes for tag are still not proven initialized. Since partial_init_consume_full() compiles into a whole-struct read pattern, the verifier demands proof that the full read range is initialized. That proof does not exist, so the program is rejected.

Conclusion

Case B is the simplest demonstration that the issue is not merely "partial init exists," but that the actual read range goes beyond the initialized range.

Case E: Different Fields Initialized on Branch A and Branch B

Case E is the best case for explaining branch join. One path initializes only pid_tgid, while the other initializes only ts. To a human reader, that can feel like "at least one of them got filled," but that is not enough for a full read.

Why It Clashed With Intuition

One path fills pid_tgid, and the other fills ts, so at a glance it is tempting to think that "some part of the struct got initialized either way." But the verifier does not infer human intent. After the join, it can safely preserve only facts guaranteed on every path.

Verifier Log

14: (79) r2 = *(u64 *)(r1 +0)         ; frame1: R1=fp[0]-24 R2=0x1111111111111111
; copy.ts = src->ts; @ partial_init_defs.h:35
15: (79) r0 = *(u64 *)(r1 +8)
invalid read from stack off -16+0 size 8

At the point of the full read, the ts read at r1 +8 is blocked. The fact that ts was initialized on one path is not enough to prove it across all paths.

Key Instructions

0: (61) r1 = *(u32 *)(r1 +0)
1: (54) w1 &= 1
2: (16) if w1 == 0x0 goto pc+4
3: (18) r1 = 0x1111111111111111
5: (7b) *(u64 *)(r10 -24) = r1
6: (05) goto pc+3
10: (bf) r1 = r10
11: (07) r1 += -24
12: (85) call pc+1
15: (79) r0 = *(u64 *)(r1 +8)

Before the branch, the program checks skb->len & 1. If true, it writes pid_tgid to fp-24; if false, it writes ts to fp-16. After that, the shared path calls partial_init_consume_full() and reads the whole object.

Interpretation

On branch A, only pid_tgid is initialized. On branch B, only ts is initialized. After the join, the verifier cannot safely retain the fact that "the whole struct is initialized"; it can keep only a much weaker common fact. That is why the full read is rejected.

Conclusion

Case E is the key example for explaining how initializedness is merged after a branch join. It shows most clearly that a fact true on only one path is not proof for the merged control flow.

Case F: Only Some Fields Initialized, but the Read Is Narrow

Case F is the opposite reference point. It fills only pid_tgid and then reads only that field through partial_init_consume_first(). This case passes normally.

Why It Matters

Its shape is almost the same as Case B, but the result is different. That makes it the best contrast for showing that the read range is the real issue.

Verifier Log

7: frame1: R1=fp[0]-24 R10=fp0
; pid_tgid = src->pid_tgid; @ partial_init_defs.h:47
7: (79) r0 = *(u64 *)(r1 +0)          ; frame1: R0=0x1111111111111111 R1=fp[0]-24
8: (95) exit

This time, the program reads only r1 +0 and exits immediately. Instead of rejection, it finishes normally.

Key Instructions

0: (18) r1 = 0x1111111111111111
2: (7b) *(u64 *)(r10 -24) = r1
3: (bf) r1 = r10
4: (07) r1 += -24
5: (85) call pc+1
7: (79) r0 = *(u64 *)(r1 +0)
8: (95) exit

There is still only one store to fp-24, but there is also only one read from r1 +0. The initialized range and the read range match exactly.

Interpretation

The struct as a whole is still only partially initialized, but the actual read range the verifier needs to validate is only the 8 bytes of pid_tgid. That range is already initialized, so the program passes.

Conclusion

Case F is the shortest summary of the article's core claim. Partial initialization itself is not forbidden. The real question is whether the read range seen by the verifier goes beyond the initialized range.

How Initializedness Information Is Merged After a Branch

This section works best as a dedicated subsection. In the end, a deep technical article lives or dies here. If I cannot explain what happens to a field initialized on only one side of a branch, then even a long list of cases stays just a collection of examples.

It can be summarized in three sentences.

  1. The verifier tracks state separately for each path.
  2. At a join point, it can safely retain only facts proven on every path.
  3. Therefore, bytes initialized on only some paths are hard to treat as initialized after the join.

Cases D and E form the pair that illustrates this principle. E in particular shows most clearly that even if one path initializes pid_tgid and the other initializes ts, that still does not guarantee a full read after the join.

The Actual Read Range and the Stack Object Model

This is where many people get stuck. If they only intend to use one field, why does the whole object seem to need initialization? In this main experiment, I used direct stack reads rather than helper size checks, but the core point is the same. If the range the verifier considers actually read goes beyond the initialized range, the program is rejected.

So the best way to read this experiment is through the following contrast.

  1. B: partial-init + full read -> fail
  2. F: partial-init + narrow read -> pass

In other words, the partial-init problem is not just about how many fields were written. It should be understood as a question of which byte range of the stack object belongs to the actual read set.

Why This Is Not a Verifier Bug

At first, the verifier felt like it was behaving strangely. But after following the code, this result looked more like a case where the verifier was applying its own model consistently.

In short, the result comes down to this.

  1. The verifier does not infer C-level intent.
  2. The verifier looks only at instructions and per-path state.
  3. A full read and a narrow read are different events.
  4. After a branch join, only commonly proven initializedness survives.
  5. Depending on capability context, even the stack-read decision path can change.

The diagnostics can still feel lacking. Even if the error message is technically correct, it is often hard to see immediately why the state ended up that way at that exact point. But that is different from saying the verifier's decision itself was wrong.

Five Practical Rules

After this experiment, the practical rules also became fairly simple.

  1. When a stack object will be read, check the actual read range first.
  2. If there is a full read under partial initialization, inspect which bytes are missing by offset.
  3. For branch-specific initialization, assume that only common facts survive after the join.
  4. Even if the source code looks similar, inspect the instruction shape directly.
  5. Do not stop at the verifier log alone; also trace the verifier code path.

Remaining Risks and Follow-Up Experiments

These conclusions are also based on results observed under one specific kernel and one specific toolchain. So the following risks still need to be stated.

  1. Even for the same C code, different clang optimizations can change the instruction shape.
  2. The verifier's detailed behavior and log wording can change across kernel versions.
  3. Helper-based reads and direct stack reads can fail at different points.
  4. Under different program types or capability contexts, the same pattern may behave differently.

For follow-up experiments, I am considering the following.

  1. Compare the same code under -O0 and -O2.
  2. Compare helper-based reads and direct stack reads in the same table.
  3. Repeat the same cases on different kernel versions.
  4. Split out more complex nested control flow for branch joins.

Conclusion

I do not want this article to remain just a collection of partial-init failure cases. More precisely, I want it to use 12 similar-looking programs to explain the verifier's model of stack initializedness and branch join in reverse.

Even when code looks almost equivalent to a human reader, the verifier does not see it that way. The verifier sees the range of initialized bytes and the state on each path. And when the actual read range goes beyond what can be proven initialized, it rejects the program. That is exactly what B, E, and F each showed in this experiment.