[Linux Kernel eBPF] Finding bugs with an LLM - 1
Table of Contents
Why I tried this
I had been hearing for a while that using LLMs to find bugs could actually be worthwhile. I kept thinking I should try it someday, and this time I finally did.
What happened
I spent a fair amount of time writing AGENTS.md and tightening the prompt because I was worried the model would slip past the guardrails. It turned out to be easier than I expected.
I hooked GPT-5.4 high up to OpenCode, pointed it at the entire Linux kernel tree, and let it think for about ten minutes. It came back with a handful of supposed bugs.
Code the LLM flagged as a bug: net_namespace.c
// kernel/bpf/net_namespace.c
int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
{
struct bpf_prog_array *run_array;
enum netns_bpf_attach_type type;
struct bpf_prog *attached;
struct net *net;
int ret;
if (attr->target_fd || attr->attach_flags || attr->replace_bpf_fd)
return -EINVAL;
type = to_netns_bpf_attach_type(attr->attach_type);
if (type < 0)
return -EINVAL;
net = current->nsproxy->net_ns;
mutex_lock(&netns_bpf_mutex);
/* Attaching prog directly is not compatible with links */
if (!list_empty(&net->bpf.links[type])) {
ret = -EEXIST;
goto out_unlock;
}
switch (type) {
case NETNS_BPF_FLOW_DISSECTOR:
ret = flow_dissector_bpf_prog_attach_check(net, prog);
break;
default:
ret = -EINVAL;
break;
}
if (ret)
goto out_unlock;
attached = net->bpf.progs[type];
if (attached == prog) {
/* The same program cannot be attached twice */
ret = -EINVAL;
goto out_unlock;
}
run_array = rcu_dereference_protected(net->bpf.run_array[type],
lockdep_is_held(&netns_bpf_mutex));
if (run_array) {
WRITE_ONCE(run_array->items[0].prog, prog);
} else {
run_array = bpf_prog_array_alloc(1, GFP_KERNEL);
if (!run_array) {
ret = -ENOMEM;
goto out_unlock;
}
run_array->items[0].prog = prog;
rcu_assign_pointer(net->bpf.run_array[type], run_array);
}
net->bpf.progs[type] = prog;
if (attached)
bpf_prog_put(attached);
out_unlock:
mutex_unlock(&netns_bpf_mutex);
return ret;
}
The model's claim was that ret could be returned without being initialized, which would make the syscall return a garbage value.
At first glance, I thought that sounded plausible. I asked how I could verify it, and that led me to KMSAN, which is well suited for catching this sort of issue.
I built a kernel with KMSAN enabled and ran the selftest that triggers this path, flow_dissector_reattach.c.
The test passed without any trouble.
After taking a closer look at the code, I understood why this was not actually a bug.
What I learned
// include/linux/bpf-netns.h
static inline enum netns_bpf_attach_type
to_netns_bpf_attach_type(enum bpf_attach_type attach_type)
{
switch (attach_type) {
case BPF_FLOW_DISSECTOR:
return NETNS_BPF_FLOW_DISSECTOR;
case BPF_SK_LOOKUP:
return NETNS_BPF_SK_LOOKUP;
default:
return NETNS_BPF_INVALID;
}
}
There is a helper called to_netns_bpf_attach_type, and it only ever returns one of these values:
NETNS_BPF_FLOW_DISSECTORNETNS_BPF_SK_LOOKUPNETNS_BPF_INVALID
type = to_netns_bpf_attach_type(attr->attach_type);
That result goes into type.
switch (type) {
case NETNS_BPF_FLOW_DISSECTOR:
ret = flow_dissector_bpf_prog_attach_check(net, prog);
break;
default:
ret = -EINVAL;
break;
}
if (ret)
goto out_unlock;
If type is NETNS_BPF_FLOW_DISSECTOR, then ret gets the return value of flow_dissector_bpf_prog_attach_check().
Otherwise, ret is set to -EINVAL.
// net/core/flow_dissector.c
int flow_dissector_bpf_prog_attach_check(struct net *net,
struct bpf_prog *prog)
{
enum netns_bpf_attach_type type = NETNS_BPF_FLOW_DISSECTOR;
if (net == &init_net) {
/* BPF flow dissector in the root namespace overrides
* any per-net-namespace one. When attaching to root,
* make sure we don't have any BPF program attached
* to the non-root namespaces.
*/
struct net *ns;
for_each_net(ns) {
if (ns == &init_net)
continue;
if (rcu_access_pointer(ns->bpf.run_array[type]))
return -EEXIST;
}
} else {
/* Make sure root flow dissector is not attached
* when attaching to the non-root namespace.
*/
if (rcu_access_pointer(init_net.bpf.run_array[type]))
return -EEXIST;
}
return 0;
}
And flow_dissector_bpf_prog_attach_check() also returns a proper value on every path. So in the end, ret is never actually left undefined.
Takeaways
I learned about KMSAN.
Kernel code is not easy reading.
LLMs are still pretty dumb. Or maybe just GPT.
I even cross-checked it. I still do not know why the model decided this was a bug, but I did end up learning something new from the exercise.