[Linux Kernel eBPF] Finding bugs with an LLM - 1

Why I tried this

I had been hearing for a while that using LLMs to find bugs could actually be worthwhile. I kept thinking I should try it someday, and this time I finally did.

What happened

I spent a fair amount of time writing AGENTS.md and tightening the prompt because I was worried the model would slip past the guardrails. It turned out to be easier than I expected.

I hooked GPT-5.4 high up to OpenCode, pointed it at the entire Linux kernel tree, and let it think for about ten minutes. It came back with a handful of supposed bugs.

Code the LLM flagged as a bug: net_namespace.c

// kernel/bpf/net_namespace.c
int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
{
	struct bpf_prog_array *run_array;
	enum netns_bpf_attach_type type;
	struct bpf_prog *attached;
	struct net *net;
	int ret;

	if (attr->target_fd || attr->attach_flags || attr->replace_bpf_fd)
		return -EINVAL;

	type = to_netns_bpf_attach_type(attr->attach_type);
	if (type < 0)
		return -EINVAL;

	net = current->nsproxy->net_ns;
	mutex_lock(&netns_bpf_mutex);

	/* Attaching prog directly is not compatible with links */
	if (!list_empty(&net->bpf.links[type])) {
		ret = -EEXIST;
		goto out_unlock;
	}

	switch (type) {
	case NETNS_BPF_FLOW_DISSECTOR:
		ret = flow_dissector_bpf_prog_attach_check(net, prog);
		break;
	default:
		ret = -EINVAL;
		break;
	}
	if (ret)
		goto out_unlock;

	attached = net->bpf.progs[type];
	if (attached == prog) {
		/* The same program cannot be attached twice */
		ret = -EINVAL;
		goto out_unlock;
	}

	run_array = rcu_dereference_protected(net->bpf.run_array[type],
					      lockdep_is_held(&netns_bpf_mutex));
	if (run_array) {
		WRITE_ONCE(run_array->items[0].prog, prog);
	} else {
		run_array = bpf_prog_array_alloc(1, GFP_KERNEL);
		if (!run_array) {
			ret = -ENOMEM;
			goto out_unlock;
		}
		run_array->items[0].prog = prog;
		rcu_assign_pointer(net->bpf.run_array[type], run_array);
	}

	net->bpf.progs[type] = prog;
	if (attached)
		bpf_prog_put(attached);

out_unlock:
	mutex_unlock(&netns_bpf_mutex);

	return ret;
}

The model's claim was that ret could be returned without being initialized, which would make the syscall return a garbage value.

At first glance, I thought that sounded plausible. I asked how I could verify it, and that led me to KMSAN, which is well suited for catching this sort of issue.

I built a kernel with KMSAN enabled and ran the selftest that triggers this path, flow_dissector_reattach.c.

The test passed without any trouble.

After taking a closer look at the code, I understood why this was not actually a bug.

What I learned

// include/linux/bpf-netns.h
static inline enum netns_bpf_attach_type
to_netns_bpf_attach_type(enum bpf_attach_type attach_type)
{
	switch (attach_type) {
	case BPF_FLOW_DISSECTOR:
		return NETNS_BPF_FLOW_DISSECTOR;
	case BPF_SK_LOOKUP:
		return NETNS_BPF_SK_LOOKUP;
	default:
		return NETNS_BPF_INVALID;
	}
}

There is a helper called to_netns_bpf_attach_type, and it only ever returns one of these values:

  • NETNS_BPF_FLOW_DISSECTOR
  • NETNS_BPF_SK_LOOKUP
  • NETNS_BPF_INVALID
type = to_netns_bpf_attach_type(attr->attach_type);

That result goes into type.

switch (type) {
case NETNS_BPF_FLOW_DISSECTOR:
    ret = flow_dissector_bpf_prog_attach_check(net, prog);
    break;
default:
    ret = -EINVAL;
    break;
}
if (ret)
    goto out_unlock;

If type is NETNS_BPF_FLOW_DISSECTOR, then ret gets the return value of flow_dissector_bpf_prog_attach_check().

Otherwise, ret is set to -EINVAL.

// net/core/flow_dissector.c
int flow_dissector_bpf_prog_attach_check(struct net *net,
					 struct bpf_prog *prog)
{
	enum netns_bpf_attach_type type = NETNS_BPF_FLOW_DISSECTOR;

	if (net == &init_net) {
		/* BPF flow dissector in the root namespace overrides
		 * any per-net-namespace one. When attaching to root,
		 * make sure we don't have any BPF program attached
		 * to the non-root namespaces.
		 */
		struct net *ns;

		for_each_net(ns) {
			if (ns == &init_net)
				continue;
			if (rcu_access_pointer(ns->bpf.run_array[type]))
				return -EEXIST;
		}
	} else {
		/* Make sure root flow dissector is not attached
		 * when attaching to the non-root namespace.
		 */
		if (rcu_access_pointer(init_net.bpf.run_array[type]))
			return -EEXIST;
	}

	return 0;
}

And flow_dissector_bpf_prog_attach_check() also returns a proper value on every path. So in the end, ret is never actually left undefined.

Takeaways

I learned about KMSAN.

Kernel code is not easy reading.

LLMs are still pretty dumb. Or maybe just GPT.

I even cross-checked it. I still do not know why the model decided this was a bug, but I did end up learning something new from the exercise.