The compiler is failing to draw the connection between
slice cap and slice len, so is missing some obvious BCE opportunities.
Give it a hint by making the cap equal to the length.
The generated code is smaller and cleaner, and a bit faster.
name old time/op new time/op delta
Decode/tcp4-8 12.2ns ± 1% 11.6ns ± 3% -5.31% (p=0.000 n=28+29)
Decode/tcp6-8 12.5ns ± 2% 11.9ns ± 2% -4.84% (p=0.000 n=30+30)
Decode/udp4-8 11.5ns ± 1% 11.1ns ± 1% -3.11% (p=0.000 n=25+24)
Decode/udp6-8 11.8ns ± 3% 11.4ns ± 1% -3.08% (p=0.000 n=30+26)
Decode/icmp4-8 11.0ns ± 3% 10.6ns ± 1% -3.38% (p=0.000 n=25+30)
Decode/icmp6-8 11.4ns ± 1% 11.1ns ± 2% -2.29% (p=0.000 n=27+30)
Decode/igmp-8 10.3ns ± 0% 10.0ns ± 1% -3.26% (p=0.000 n=19+23)
Decode/unknown-8 8.68ns ± 1% 8.38ns ± 1% -3.55% (p=0.000 n=28+29)
The packet filter still rejects all IPv6, but decodes enough from v6
packets to do something smarter in a followup.
name time/op
Decode/tcp4-8 28.8ns ± 2%
Decode/tcp6-8 20.6ns ± 1%
Decode/udp4-8 28.2ns ± 1%
Decode/udp6-8 20.0ns ± 6%
Decode/icmp4-8 21.7ns ± 2%
Decode/icmp6-8 14.1ns ± 2%
Decode/unknown-8 9.43ns ± 2%
Signed-off-by: David Anderson <danderson@tailscale.com>