Archives by Category
Contact
- Hagen Paul Pfeifer
- http://jauu.net
- hagen@jauu.net (encrypted preferred)
- KeyId: 0x98350C22
- Telephone: +49 174 5455209
Follow this blog
UDP TX Buffer Behaviour and Egress Queueing
- Published in: networking
- | Time: 23:13:58 CEST
- | SHA1: 8fc2eb54d24556684f8747742a01c2df8cea1b40
UDP sockets – if corking is disabled – always push packets directly to the
lower layers. In the case of IPv4, ip_output() will forward the packet to
Netfilter where the firewall rules are applied. ip_finish_output() calls
ip_finish_output2() which on his part calls neigh_hh_output() which put the
cached layer 2 Ethernet header in front of skb and finally call
dev_queue_xmit().
dev_queue_xmit() queues the packet in the local egress queue – default
is a FIFO queue (pfifo_fast) but sophisticated queuing strategies are available and often selected.
Before the actual packet is enqueued the function dev_queue_xmit()
linearize the skb if necessary, do checksumming if necessary, and
finally calls enqueue() which places the packet into the queue. This
function fails if the queue is (temporarily) deactivated or a overflow
happens. In the common cases the function returns NET_XMIT_SUCCESS. Note
that the standard queue has a short cut: if the queue length is 0 – no
packet is queued – the packet is directly scheduled via sch_direct_xmit().
If something goes wrong (somewhere in the driver), the packet is
pushed several times to the NIC put on the wire via qdisc_restart() until the
NIC accepts the new packet. If this fails the one element queue is saved
and a SOFTIRQ is raised on the local CPU. In the hope that next time the
SOFTIRQ is executed the NIC is in the ability to accept the packet.
If the queue supports no short cut or the queue contains at least one element the packet must be enqueued via qdisc_enqueue_root(). This enqueue function is scheduler specific. For FIFO queueing the packet is added at the end of the list, more complicated queues implement a more sophisticated queueing.
dev_queue_xmit() {
netif_needs_gso();
skb_needs_linearize();
if (q->enqueue) {
if (TCQ_F_CAN_BYPASS && qdisc_qlen(q) == 0) {
if (sch_direct_xmit())
__qdisc_run(q);
return NET_XMIT_SUCCESS;
} else {
ret = qdisc_enqueue_root()
qdisc_run(q);
return ret;
}
}
}
__qdisc_run() {
while (qdisc_restart()) {
if (need_resched() || to_often_restarted) {
raise_softirq_irqoff(NET_TX_SOFTIRQ);
}
}
}
qdisc_restart() try to call dev_hard_start_xmit() instantly to put the
packet on the NIC TX descriptor ring – if possible. __qdisc_run() enabled
the SOFTIRQ – if not already enabled.
Finally note that the return code of dev_queue_xmit() make no statement if
the packet can be transmitted. Subsequent congestion or the queue policy can
decide to drop the packet. A positive return code only signals that the
enqueuing was successful.
IPv6 is similar except that neighbor resolution is done by IPv6 neighbor discovery mechanism (ND).
Last but not least some network devices have no associated queues. The
loopback device and all kind of pseudo tunnel devices are common examples.
These devices have no queues and instead of placing the packet in a queue the
function dev_hard_start_xmit() is called directly.