[Linux Kernel eBPF] Analyzing bpf selftests pkt_access.c
Table of Contents
- Background
- Study Order
- prog_tests/pkt_access.c
- progs/test_pkt_access.c
- Next Step
- What This Test Is Really Trying to Verify
- Revisiting It from the Verifier's Point of View
- 1. The verifier first has to distinguish what a pointer refers to.
- 2. skb->len and skb->ifindex are not just field reads; they are tests of context preservation.
- 3. subprog1 and subprog2 ask whether the same asm means the same thing.
- 4. The large stack frame and function chain look like deliberate complexity added by the test.
- Why the TCP Access Block Is Interesting
- Rules Summarized by This Single Test
- Closing Thoughts
Background
I started this because I was told that reading selftests is a good way to study BPF.
Study Order
Analysis Order
- Read
prog_tests/<name>.cfirst. - Read the matching
progs/<source>.c. - First understand what user space expects through step 1.
- Then see what the BPF program actually does.
- Check the commit history to see how it evolved.
+ plus
Summary Template
- What the test claims
- Code locations the verifier is likely to be sensitive to
- Where the kernel makes its decision
- What I would change if I were patching it
prog_tests/pkt_access.c
I picked this one because an LLM recommended it, saying that "the basic flow of an eBPF selftest is revealed very briefly and clearly here."
// linux/tools/testing/selftests/bpf/prog_tests/pkt_access.c
// SPDX-License-Identifier: GPL-2.0
#include <test_progs.h>
#include <network_helpers.h>
void test_pkt_access(void)
{
const char *file = "./test_pkt_access.bpf.o";
struct bpf_object *obj;
int err, prog_fd;
LIBBPF_OPTS(bpf_test_run_opts, topts,
.data_in = &pkt_v4,
.data_size_in = sizeof(pkt_v4),
.repeat = 100000,
);
err = bpf_prog_test_load(file, BPF_PROG_TYPE_SCHED_CLS, &obj, &prog_fd);
if (CHECK_FAIL(err))
return;
err = bpf_prog_test_run_opts(prog_fd, &topts);
ASSERT_OK(err, "ipv4 test_run_opts err");
ASSERT_OK(topts.retval, "ipv4 test_run_opts retval");
topts.data_in = &pkt_v6;
topts.data_size_in = sizeof(pkt_v6);
topts.data_size_out = 0; /* reset from last call */
err = bpf_prog_test_run_opts(prog_fd, &topts);
ASSERT_OK(err, "ipv6 test_run_opts err");
ASSERT_OK(topts.retval, "ipv6 test_run_opts retval");
bpf_object__close(obj);
}Detailed Analysis
const char *file = "./test_pkt_access.bpf.o";
struct bpf_object *obj;
int err, prog_fd;
LIBBPF_OPTS(bpf_test_run_opts, topts,
.data_in = &pkt_v4,
.data_size_in = sizeof(pkt_v4),
.repeat = 100000,
);
// network_helpers.h
/* ipv4 test vector */
struct ipv4_packet {
struct ethhdr eth;
struct iphdr iph;
struct tcphdr tcp;
} __packed;
extern struct ipv4_packet pkt_v4;
LIBBPF_OPTS: this macro is a commonly used libbpf macro for initializing option structs.
It creates struct bpf_test_run_opts topts and initializes its fields with specific values. (topts is a bundle of test execution settings.)
Here, pkt_v4 is the IPv4 test structure defined in network_helpers.h. The options say that the BPF test should run with that packet as input.
err = bpf_prog_test_load(file, BPF_PROG_TYPE_SCHED_CLS, &obj, &prog_fd);
if (CHECK_FAIL(err))
return;
What bpf_prog_test_load does:
It loads the BPF program as a BPF_PROG_TYPE_SCHED_CLS tc-classifier-style program.
If it succeeds, it stores the object handle in obj and the program FD in prog_fd.
err = bpf_prog_test_run_opts(prog_fd, &topts);
ASSERT_OK(err, "ipv4 test_run_opts err");
ASSERT_OK(topts.retval, "ipv4 test_run_opts retval");
This runs the BPF program we just loaded with the input packet in topts (pkt_v4).
ASSERT_OK passes when the value is 0 and fails otherwise. In the end, the test passes only if both the execution return value and retval are 0.
topts.data_in = &pkt_v6;
topts.data_size_in = sizeof(pkt_v6);
topts.data_size_out = 0; /* reset from last call */
err = bpf_prog_test_run_opts(prog_fd, &topts);
ASSERT_OK(err, "ipv6 test_run_opts err");
ASSERT_OK(topts.retval, "ipv6 test_run_opts retval");
bpf_object__close(obj);
This part resets the input to pkt_v6 and runs the same test again.
bpf_object__close closes the BPF object and frees its resources.
progs/test_pkt_access.c
From const char *file = "./test_pkt_access.bpf.o";, we can tell that the source is progs/test_pkt_access.c.
// linux/tools/testing/selftests/bpf/progs/test_pkt_access.c
// SPDX-License-Identifier: GPL-2.0-only
/* Copyright (c) 2017 Facebook
*/
#include <stddef.h>
#include <string.h>
#include <linux/bpf.h>
#include <linux/if_ether.h>
#include <linux/if_packet.h>
#include <linux/ip.h>
#include <linux/ipv6.h>
#include <linux/in.h>
#include <linux/tcp.h>
#include <linux/pkt_cls.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_endian.h>
#include "bpf_misc.h"
/* llvm will optimize both subprograms into exactly the same BPF assembly
*
* Disassembly of section .text:
*
* 0000000000000000 test_pkt_access_subprog1:
* ; return skb->len * 2;
* 0: 61 10 00 00 00 00 00 00 r0 = *(u32 *)(r1 + 0)
* 1: 64 00 00 00 01 00 00 00 w0 <<= 1
* 2: 95 00 00 00 00 00 00 00 exit
*
* 0000000000000018 test_pkt_access_subprog2:
* ; return skb->len * val;
* 3: 61 10 00 00 00 00 00 00 r0 = *(u32 *)(r1 + 0)
* 4: 64 00 00 00 01 00 00 00 w0 <<= 1
* 5: 95 00 00 00 00 00 00 00 exit
*
* Which makes it an interesting test for BTF-enabled verifier.
*/
static __attribute__ ((noinline))
int test_pkt_access_subprog1(volatile struct __sk_buff *skb)
{
return skb->len * 2;
}
static __attribute__ ((noinline))
int test_pkt_access_subprog2(int val, volatile struct __sk_buff *skb)
{
return skb->len * val;
}
#define MAX_STACK (512 - 2 * 32)
__attribute__ ((noinline))
int get_skb_len(struct __sk_buff *skb)
{
volatile char buf[MAX_STACK] = {};
__sink(buf[MAX_STACK - 1]);
return skb->len;
}
__attribute__ ((noinline))
int get_constant(long val)
{
return val - 122;
}
int get_skb_ifindex(int, struct __sk_buff *skb, int);
__attribute__ ((noinline))
int test_pkt_access_subprog3(int val, struct __sk_buff *skb)
{
return get_skb_len(skb) * get_skb_ifindex(val, skb, get_constant(123));
}
__attribute__ ((noinline))
int get_skb_ifindex(int val, struct __sk_buff *skb, int var)
{
volatile char buf[MAX_STACK] = {};
__sink(buf[MAX_STACK - 1]);
return skb->ifindex * val * var;
}
__attribute__ ((noinline))
int test_pkt_write_access_subprog(struct __sk_buff *skb, __u32 off)
{
void *data = (void *)(long)skb->data;
void *data_end = (void *)(long)skb->data_end;
struct tcphdr *tcp = NULL;
if (off > sizeof(struct ethhdr) + sizeof(struct ipv6hdr))
return -1;
tcp = data + off;
if (tcp + 1 > data_end)
return -1;
/* make modification to the packet data */
tcp->check++;
return 0;
}
SEC("tc")
int test_pkt_access(struct __sk_buff *skb)
{
void *data_end = (void *)(long)skb->data_end;
void *data = (void *)(long)skb->data;
struct ethhdr *eth = (struct ethhdr *)(data);
struct tcphdr *tcp = NULL;
__u8 proto = 255;
__u64 ihl_len;
if (eth + 1 > data_end)
return TC_ACT_SHOT;
if (eth->h_proto == bpf_htons(ETH_P_IP)) {
struct iphdr *iph = (struct iphdr *)(eth + 1);
if (iph + 1 > data_end)
return TC_ACT_SHOT;
ihl_len = iph->ihl * 4;
proto = iph->protocol;
tcp = (struct tcphdr *)((void *)(iph) + ihl_len);
} else if (eth->h_proto == bpf_htons(ETH_P_IPV6)) {
struct ipv6hdr *ip6h = (struct ipv6hdr *)(eth + 1);
if (ip6h + 1 > data_end)
return TC_ACT_SHOT;
ihl_len = sizeof(*ip6h);
proto = ip6h->nexthdr;
tcp = (struct tcphdr *)((void *)(ip6h) + ihl_len);
}
if (test_pkt_access_subprog1(skb) != skb->len * 2)
return TC_ACT_SHOT;
if (test_pkt_access_subprog2(2, skb) != skb->len * 2)
return TC_ACT_SHOT;
if (test_pkt_access_subprog3(3, skb) != skb->len * 3 * skb->ifindex)
return TC_ACT_SHOT;
if (tcp) {
if (test_pkt_write_access_subprog(skb, (void *)tcp - data))
return TC_ACT_SHOT;
if (((void *)(tcp) + 20) > data_end || proto != 6)
return TC_ACT_SHOT;
barrier(); /* to force ordering of checks */
if (((void *)(tcp) + 18) > data_end)
return TC_ACT_SHOT;
if (tcp->urg_ptr == 123)
return TC_ACT_OK;
}
return TC_ACT_UNSPEC;
}Big Picture First
test_pkt_access() checks whether the incoming packet is IPv4 or IPv6, calculates the TCP header location, and then verifies:
- whether
skb->lenaccess works correctly inside subprograms - whether
skb->ifindexaccess still works correctly across a call chain - whether the program can actually modify the TCP checksum field in packet data
- whether TCP header boundary checks work correctly
If any of these fail, it returns TC_ACT_SHOT, which means the packet is dropped.
Functions
static __attribute__ ((noinline))
int test_pkt_access_subprog1(volatile struct __sk_buff *skb)
{
return skb->len * 2;
}
- This is a test of reading the
skb->lenfield. - It simply returns twice the packet length.
/* llvm will optimize both subprograms into exactly the same BPF assembly
*
* Disassembly of section .text:
*
* 0000000000000000 test_pkt_access_subprog1:
* ; return skb->len * 2;
* 0: 61 10 00 00 00 00 00 00 r0 = *(u32 *)(r1 + 0)
* 1: 64 00 00 00 01 00 00 00 w0 <<= 1
* 2: 95 00 00 00 00 00 00 00 exit
*
* 0000000000000018 test_pkt_access_subprog2:
* ; return skb->len * val;
* 3: 61 10 00 00 00 00 00 00 r0 = *(u32 *)(r1 + 0)
* 4: 64 00 00 00 01 00 00 00 w0 <<= 1
* 5: 95 00 00 00 00 00 00 00 exit
*
* Which makes it an interesting test for BTF-enabled verifier.
*/
static __attribute__ ((noinline))
int test_pkt_access_subprog2(int val, volatile struct __sk_buff *skb)
{
return skb->len * val;
}
- This tests
skb->lenaccess plus argument passing. - If
val = 2, it produces the same result astest_pkt_access_subprog1. - As the comment says, this tests whether the BTF-enabled verifier correctly handles different source functions that end up looking like the same assembly.
#define MAX_STACK (512 - 2 * 32)
__attribute__ ((noinline))
int get_skb_len(struct __sk_buff *skb)
{
volatile char buf[MAX_STACK] = {};
__sink(buf[MAX_STACK - 1]);
return skb->len;
}
- This function uses a large stack frame.
- It tests whether
skb->lencan still be accessed correctly in that case. - The BPF stack limit is 512 bytes, so it leaves a little room.
- Here,
__sinkpreventsbuffrom being optimized away.
In other words, it checks whether reading skb fields still works correctly even when the stack usage is high.
__attribute__ ((noinline))
int get_constant(long val)
{
return val - 122;
}
- This function returns
1when given123. - The main point is to make the call chain more complex and exercise verifier and call-handling paths.
__attribute__ ((noinline))
int get_skb_ifindex(int val, struct __sk_buff *skb, int var)
{
volatile char buf[MAX_STACK] = {};
__sink(buf[MAX_STACK - 1]);
return skb->ifindex * val * var;
}
- This tests access to the
skb->ifindexfield. - It uses a large stack and passes three arguments.
- It checks whether the verifier correctly tracks context through a call chain.
__attribute__ ((noinline))
int test_pkt_access_subprog3(int val, struct __sk_buff *skb)
{
return get_skb_len(skb) * get_skb_ifindex(val, skb, get_constant(123));
}
- This simply ties together the functions we looked at so far.
- It returns
skb->len * skb->ifindex * val.
__attribute__ ((noinline))
int test_pkt_write_access_subprog(struct __sk_buff *skb, __u32 off)
{
void *data = (void *)(long)skb->data;
void *data_end = (void *)(long)skb->data_end;
struct tcphdr *tcp = NULL;
if (off > sizeof(struct ethhdr) + sizeof(struct ipv6hdr))
return -1;
tcp = data + off;
if (tcp + 1 > data_end)
return -1;
/* make modification to the packet data */
tcp->check++;
return 0;
}
- This tests write access to packet data.
- It gets the packet bounds from
skb->dataandskb->data_end. - If
offis too large, it fails. - It treats
data + offas a TCP header pointer. - If
tcp + 1 > data_end, it fails. - That means the entire TCP header must lie within packet bounds.
tcp->check++- This increments the TCP checksum field, which is an actual write.
- It returns
0on success.
So this is a test of whether packet data can be written safely inside a subprogram.
SEC("tc")
int test_pkt_access(struct __sk_buff *skb)
- This is the actual eBPF program attached to the tc hook.
if (eth + 1 > data_end)
return TC_ACT_SHOT;
The entire Ethernet header must fit within the packet.
if (eth->h_proto == bpf_htons(ETH_P_IP)) {
struct iphdr *iph = (struct iphdr *)(eth + 1);
if (iph + 1 > data_end)
return TC_ACT_SHOT;
ihl_len = iph->ihl * 4;
proto = iph->protocol;
tcp = (struct tcphdr *)((void *)(iph) + ihl_len);
}
- This handles IPv4 packets.
- It treats the bytes after Ethernet as an
iphdr. - It checks that the minimum IP header is in bounds.
- It computes the actual header length as
ihl * 4. - It stores
protocol. - It calculates the TCP header location.
else if (eth->h_proto == bpf_htons(ETH_P_IPV6)) {
struct ipv6hdr *ip6h = (struct ipv6hdr *)(eth + 1);
if (ip6h + 1 > data_end)
return TC_ACT_SHOT;
ihl_len = sizeof(*ip6h);
proto = ip6h->nexthdr;
tcp = (struct tcphdr *)((void *)(ip6h) + ihl_len);
}
- This handles IPv6 packets.
- It treats the bytes after Ethernet as an
ipv6hdr. - The IPv6 base header has a fixed size, so
sizeofis enough. - It stores the next-header value in
proto. - It calculates the TCP header location.
if (test_pkt_access_subprog1(skb) != skb->len * 2)
return TC_ACT_SHOT;
- This checks whether the
skb->lenvalue read in the subprogram matches theskb->lenseen in the main function. - It is effectively checking call handling, register passing, and BTF context handling.
if (test_pkt_access_subprog2(2, skb) != skb->len * 2)
return TC_ACT_SHOT;
- This checks whether a subprogram with argument passing also works correctly.
- It also checks whether the verifier still behaves correctly even if the resulting BPF assembly matches the previous function.
if (test_pkt_access_subprog3(3, skb) != skb->len * 3 * skb->ifindex)
return TC_ACT_SHOT;
- This validates multiple paths through several functions at once.
if (tcp) {
if (test_pkt_write_access_subprog(skb, (void *)tcp - data))
return TC_ACT_SHOT;
if (((void *)(tcp) + 20) > data_end || proto != 6)
return TC_ACT_SHOT;
barrier(); /* to force ordering of checks */
if (((void *)(tcp) + 18) > data_end)
return TC_ACT_SHOT;
if (tcp->urg_ptr == 123)
return TC_ACT_OK;
}
- This block is entered only if the packet was parsed as IPv4 or IPv6 and the TCP location could be computed.
- It tests whether packet writes are allowed by the verifier and at runtime.
- Since the minimum TCP header is 20 bytes, it checks that this range is inside the packet and also checks
prototo make sure the packet is actually TCP. barrierprevents the compiler from reordering the checks in an unexpected way.urg_ptris a field inside the TCP header, so it separately confirms that reading that field is safe.- The reason both the
+20and+18checks are present is to test check ordering and the verifier's packet-range reasoning.
Next Step
Now that all the functions have been analyzed, it is worth stepping back and focusing more on the question of "why?"
What This Test Is Really Trying to Verify
At first glance, this looks like a simple test that feeds in IPv4 and IPv6 packets and checks whether TC_ACT_OK comes back. But the real point is much deeper. This test is trying to verify the following:
- Whether
skb_bufffield access is safe even inside subprograms - Whether the verifier still behaves correctly when different C functions are optimized into the same BPF asm
- Whether large stack frames and multi-stage function calls are still handled correctly
- Whether packet data read/write boundary checks are tracked correctly
- Whether all of this works for both IPv4 and IPv6
Revisiting It from the Verifier's Point of View
Above, I walked through the code function by function to confirm what pkt_access does. Now I want to revisit the same code from another angle: what facts does the verifier need to know in order to allow this program?
1. The verifier first has to distinguish what a pointer refers to.
For the eBPF verifier, what matters is not just the value itself, but its kind. In this test code, two major categories appear:
- context field access such as
skb->lenandskb->ifindex - packet data pointer access such as
data,data_end, andtcp
So the verifier has to keep tracking whether something is a field read from __sk_buff context or a pointer into the actual packet buffer.
The key point here is that this test intentionally mixes both kinds of code together to check whether validation of one kind of access destabilizes the other.
2. skb->len and skb->ifindex are not just field reads; they are tests of context preservation.
At a glance, reading fields like skb->len and ifindex looks easy. But from the verifier's point of view, it is not that simple.
- They are read directly in the main function.
- They are also read inside subprograms with additional arguments.
- They are read inside functions that use large stack frames.
- They are still read after flowing through several functions and calculations.
So what the verifier really has to confirm here is: "Is this function still executing on a valid __sk_buff * context?"
That is why this is a test of whether the meaning of context access is preserved even when subprogram calls are involved.
3. subprog1 and subprog2 ask whether the same asm means the same thing.
As mentioned earlier, this code shows that even if two different C functions are optimized into identical BPF asm, they should not automatically be treated as the same in every semantic sense.
Even if different subprograms look like the same asm after optimization, the verifier still has to validate them correctly without losing the function identity and argument context.
That is the core point.
4. The large stack frame and function chain look like deliberate complexity added by the test.
The large stack frame is there to ask: "A simple straight-line program may pass, but does the verifier become overly conservative once the structure gets only a little more complex?"
Why the TCP Access Block Is Interesting
The most interesting part of this test is the final TCP access block.
if (tcp) {
if (test_pkt_write_access_subprog(skb, (void *)tcp - data))
return TC_ACT_SHOT;
if (((void *)(tcp) + 20) > data_end || proto != 6)
return TC_ACT_SHOT;
barrier(); /* to force ordering of checks */
if (((void *)(tcp) + 18) > data_end)
return TC_ACT_SHOT;
if (tcp->urg_ptr == 123)
return TC_ACT_OK;
}
- First, it computes a packet pointer.
- Then it proves that the pointer is within packet bounds.
- Only after that does it read the actual field.
What matters is writing the code in an order where the verifier can track that the read is safe.
Why keep bounds checks that look redundant?
These two checks look redundant:
if (((void *)(tcp) + 20) > data_end || proto != 6)
return TC_ACT_SHOT;
barrier();
if (((void *)(tcp) + 18) > data_end)
return TC_ACT_SHOT;
if (tcp->urg_ptr == 123)
return TC_ACT_OK;
If tcp + 20 <= data_end has already been confirmed, then tcp + 18 <= data_end looks like a logically weaker check.
The reason it is still left in place is that this test is not about minimizing checks for a human reader. It is about checking whether the verifier's range reasoning remains stable under a specific code shape.
Rules Summarized by This Single Test
In one sentence, the lesson is this:
In eBPF, what matters is not just reading and writing packets, but writing the code in a form where the verifier can follow the proof that the access is safe.
- Context access must preserve context even when the call structure becomes complex.
- For packet pointers, proof comes before use.
- Code that looks identical to a human may still mean something different to the verifier.
Closing Thoughts
This was my first time looking at a BPF-related test, and this simple test already contains a lot.