The remaining flake occurs due to a mysterious packet loss. This
doesn't affect normal tailscaled operations, so until I track down
where the loss occurs and fix it, the flaky test is going to be
lenient about packet loss (but not about whether the spray logic
worked).
Signed-off-by: David Anderson <danderson@tailscale.com>
It previously passed incorrectly due to bugs. With those fixed,
it becomes flaky for 2 reasons. One of them is the wireguard handshake
race, which can eat the 1st sprayed packet and prevent roamAddr
discovery. This change fixes that failure, by spreading the test
traffic out enough that additional spraying occurs.
Signed-Off-By: David Anderson <danderson@tailscale.com>
The previous code would skip the DERP short-circuit if roamAddr
was set, which is not what we wanted. More generally, hitting
any of the fast path conditions is a direct return, so we can
just have 3 standalone branches rather than 'else if' stuff.
Signed-Off-By: David Anderson <danderson@tailscale.com>
The effect is subtle: when we're not spraying packets, and have not yet
figured out a curAddr, and we're not spraying, we end up sending to
whatever the first IP is in the iteration order. In English, that
means "when we have no idea where to send packets, and we've given
up on sending to everyone, just send to the first addr we see in
the list."
This is, in general, what we want, because the addrs are in sorted
preference order, low to high, and DERP is the least preferred
destination. So, when we have no idea where to send, send to DERP,
right?
... Except for very historical reasons, appendDests iterated through
addresses in _reverse_ order, most preferred to least preferred.
crawshaw@ believes this was part of the earliest handshaking
algorithm magicsock had, where it slowly iterated through possible
destinations and poked handshakes to them one at a time.
Anyway, because of this historical reverse iteration, in the case
described above of "we have no idea where to send", the code would
end up sending to the _most_ preferred candidate address, rather
than the _least_ preferred. So when in doubt, we'd end up firing
packets into the blackhole of some LAN address that doesn't work,
and connectivity would not work.
This case only comes up if all your non-DERP connectivity options
have failed, so we more or less failed to detect it because we
didn't have a pathological test box deployed. Worse, codependent
bug 2839854994 made DERP accidentally
work sometimes anyway by incorrectly exploiting roamAddr behavior,
albeit at the cost of making DERP traffic symmetric. In fixing
DERP to once again be asymmetric, we effectively removed the
bandaid that was concealing this bug.
Signed-Off-By: David Anderson <danderson@tailscale.com>
DERP traffic is asymmetric by design, with nodes always sending
to their peer's home DERP server. However, if roamAddr is set,
magicsock will always push data there, rather than let DERP
server selection do its thing, so we end up accidentally
creating a symmetric flow.
Signed-Off-By: David Anderson <danderson@tailscale.com>
I started to write a full DNS caching resolver and I realized it was
overkill and wouldn't work on Windows even in Go 1.14 yet, so I'm
doing this tiny one instead for now, just for all our netcheck STUN
derp lookups, and connections to DERP servers. (This will be caching a
exactly 8 DNS entries, all ours.)
Fixes#145 (can be better later, of course)
The value predates the introduction of AddrSet which replaces
the index by tracking curAddr directly.
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
This avoids a non-obvious data race, where the JSON decoder ends
up creating do-nothing writes into global variables.
==================
WARNING: DATA RACE
Write at 0x0000011e1860 by goroutine 201:
tailscale.com/wgengine/packet.(*IP).UnmarshalJSON()
/home/crawshaw/repo/corp/oss/wgengine/packet/packet.go:83 +0x2d9
encoding/json.(*decodeState).literalStore()
/home/crawshaw/go/go/src/encoding/json/decode.go:877 +0x445e
...
encoding/json.Unmarshal()
/home/crawshaw/go/go/src/encoding/json/decode.go:107 +0x1de
tailscale.com/control/controlclient.(*Direct).decodeMsg()
/home/crawshaw/repo/corp/oss/control/controlclient/direct.go:615 +0x1ab
tailscale.com/control/controlclient.(*Direct).PollNetMap()
/home/crawshaw/repo/corp/oss/control/controlclient/direct.go:525 +0x1053
tailscale.com/control/controlclient.(*Client).mapRoutine()
/home/crawshaw/repo/corp/oss/control/controlclient/auto.go:428 +0x3a6
Previous read at 0x0000011e1860 by goroutine 86:
tailscale.com/wgengine/filter.matchIPWithoutPorts()
/home/crawshaw/repo/corp/oss/wgengine/filter/match.go:108 +0x91
tailscale.com/wgengine/filter.(*Filter).runIn()
/home/crawshaw/repo/corp/oss/wgengine/filter/filter.go:147 +0x3c6
tailscale.com/wgengine/filter.(*Filter).RunIn()
/home/crawshaw/repo/corp/oss/wgengine/filter/filter.go:127 +0xb0
tailscale.com/wgengine.(*userspaceEngine).SetFilter.func1()
/home/crawshaw/repo/corp/oss/wgengine/userspace.go:390 +0xfc
github.com/tailscale/wireguard-go/device.(*Device).RoutineDecryption()
/home/crawshaw/repo/corp/wireguard-go/device/receive.go:295 +0xa1f
For #112
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
Because wgLock is held while some wireguard-go methods run,
trying to hold wgLock during HandshakeDone potentially creates
lock cycles between wgengine and internals of wireguard-go.
Arguably wireguard-go should call HandshakeDone in a new goroutine,
but until its API promises that, don't make any assumptions here.
Maybe for #110.
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
For 3 seconds after a successful handshake, wgengine will send a
ping packet every 300ms to its peer. This ensures the spray logic
in magicsock has something to spray.
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
In particular, this is designed to catch the case where a
HandshakeInitiation packet is sent out but the intermediate NATs
have not been primed, so the packet passes over DERP.
In that case, the HandshakeResponse also comes back over DERP,
and the connection proceeds via DERP without ever trying to punch
through the NAT.
With this change, the HandshakeResponse (which was sprayed out
and so primed one NAT) triggers an UpdateDst, which triggers
the extra spray logic.
(For this to work, there has to be an initial supply of packets
to send on to a peer for the three seconds following a handshake.
The source of these packets is left as a future exercise.)
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
This is the first, and easier, part of incremental wireguard-go
reconfiguration. It means that a new node appearing on the
network does not cause all existing nodes to re-handshake with
the other nodes they are talking to.
(This code has been running on hello.ipn.dev for a few weeks and
peers have successfully reconnected to it through many network
map updates.)
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
And make the monitor package portable with no-op implementations on
unsupported operating systems.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
* make RouterGen return an error, not take both tunname and tundev
* also remove RouteGen taking a wireguard/device.Device; currently unused
* remove derp parameter (it'll work differently)
* unexport NewUserspaceRouter in per-OS impls, add documented wrapper
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The MTU is currently set when creating the tun device,
elsewhere in the code. Maybe someday we'll want some kind
of per-platform MTU configuration here, but not in the
short-medium term.
Signed-off-by: David Anderson <dave@natulte.net>