xdp: the express data path
if you've ever wondered how cloudflare drops 10 million packets per second per server during a ddos attack, the answer starts with three letters: xdp.
the problem with the kernel networking stack
the linux networking stack is incredibly capable. it handles routing, firewalling, connection tracking, qos, and a dozen other things. but all that capability has a cost: every packet that enters your system walks through a long chain of processing before your application ever sees it.
for most workloads, this is fine. for high-throughput packet processing — ddos mitigation, load balancing, traffic monitoring — it's a bottleneck. by the time the kernel has allocated an sk_buff, run it through netfilter hooks, and pushed it up the stack, you've burned through thousands of cpu cycles on a packet you might just want to drop.
enter xdp
xdp (express data path) lets you run ebpf programs at the earliest possible point in the networking stack — right when the network driver receives a packet, before the kernel allocates any data structures for it. your program gets raw packet data and returns a verdict:
XDP_DROP — drop the packet. gone. no sk_buff, no processing.
XDP_PASS — continue normal kernel processing.
XDP_TX — bounce the packet back out the same interface.
XDP_REDIRECT — forward to another interface, cpu, or socket.
XDP_ABORTED — drop + trace point for debugging.that's it. five actions. the simplicity is the point. you make a decision on the raw packet bytes and the kernel respects it immediately.
a minimal xdp program
here's a basic xdp program that drops all udp traffic on port 9999:
#include <linux/bpf.h>
#include <linux/if_ether.h>
#include <linux/ip.h>
#include <linux/udp.h>
#include <bpf/bpf_helpers.h>
SEC("xdp")
int drop_udp_9999(struct xdp_md *ctx) {
void *data = (void *)(long)ctx->data;
void *data_end = (void *)(long)ctx->data_end;
// parse ethernet header
struct ethhdr *eth = data;
if ((void *)(eth + 1) > data_end)
return XDP_PASS;
if (eth->h_proto != __constant_htons(ETH_P_IP))
return XDP_PASS;
// parse ip header
struct iphdr *ip = (void *)(eth + 1);
if ((void *)(ip + 1) > data_end)
return XDP_PASS;
if (ip->protocol != IPPROTO_UDP)
return XDP_PASS;
// parse udp header
struct udphdr *udp = (void *)ip + (ip->ihl * 4);
if ((void *)(udp + 1) > data_end)
return XDP_PASS;
if (udp->dest == __constant_htons(9999))
return XDP_DROP;
return XDP_PASS;
}every pointer arithmetic operation is bounds-checked against data_end. this isn't optional — the ebpf verifier will reject your program if you access memory without proving it's within the packet bounds. the verifier is strict, and that's exactly what you want when you're running code inside the kernel.
driver modes
xdp programs can run in three modes, each with different tradeoffs:
native xdp — the program runs inside the network driver itself, before the kernel sees the packet. this is the fast path. requires driver support (most modern nics have it: mlx5, i40e, ixgbe, virtio_net).
offloaded xdp — the program runs on the nic hardware. your cpu never touches the packet. only supported by smartnics like netronome. insanely fast but limited in what ebpf features you can use.
generic xdp — fallback mode that works with any driver but runs later in the stack. useful for development and testing, but you lose the performance advantages.
xdp in practice
loading an xdp program onto an interface is straightforward with ip link:
# compile the bpf program
clang -O2 -target bpf -c drop_udp.c -o drop_udp.o
# attach to interface
ip link set dev eth0 xdpgeneric obj drop_udp.o sec xdp
# check it's loaded
ip link show eth0
# detach
ip link set dev eth0 xdpgeneric offfor anything beyond toy examples, you probably want a loader like libbpf or a higher-level framework like cilium/ebpf (go) or aya(rust). they handle map creation, program loading, and lifecycle management so you're not manually calling bpf syscalls.
performance
the numbers speak for themselves. on commodity hardware with native xdp:
XDP_DROP: ~24 million packets/sec (single core)
XDP_TX: ~12 million packets/sec (single core)
iptables: ~2-3 million packets/sec (comparable rules)that's a 10x improvement over iptables for packet filtering. the difference comes from skipping the entire kernel networking stack — no sk_buff allocation, no netfilter traversal, no socket lookup.
when to use xdp
xdp is not a replacement for the kernel networking stack. it's a scalpel for specific high-performance use cases:
ddos mitigation — drop malicious traffic before it consumes resources. this is the canonical use case.
load balancing— facebook's katran uses xdp to load-balance traffic at l4. no userspace involved.
traffic monitoring — sample or mirror packets at line rate without impacting the forwarding path.
firewalling— when iptables can't keep up and you need programmable packet filtering at scale.
closing thoughts
xdp is one of those technologies that makes you rethink what's possible in the kernel. the combination of ebpf's safety guarantees with driver-level performance means you can run custom packet processing logic inside the kernel without writing a kernel module and without risking a crash.
if you're working on anything that touches high-volume network traffic, xdp is worth learning. start with generic mode, get comfortable with the ebpf verifier's constraints, and then move to native mode when you need the performance. the learning curve is real but the payoff is massive.