cmd/tailscaled: don't block ipnserver startup behind engine init on Windows

With this change, the ipnserver's safesocket.Listen (the localhost
tcp.Listen) happens right away, before any synchronous
TUN/DNS/Engine/etc setup work, which might be slow, especially on
early boot on Windows.

Because the safesocket.Listen starts up early, that means localhost
TCP dials (the safesocket.Connect from the GUI) complete successfully
and thus the GUI avoids the MessageBox error. (I verified that
pacifies it, even without a Listener.Accept; I'd feared that Windows
localhost was maybe special and avoided the normal listener backlog).

Once the GUI can then connect immediately without errors, the various
timeouts then matter less, because the backend is no longer trying to
race against the GUI's timeout. So keep retrying on errors for a
minute, or 10 minutes if the system just booted in the past 10
minutes.

This should fix the problem with Windows 10 desktops auto-logging in
and starting the Tailscale frontend which was then showing a
MessageBox error about failing to connect to tailscaled, which was
slow coming up because the Windows networking stack wasn't up
yet. Fingers crossed.

Fixes #1313 (previously #1187, etc)

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This commit is contained in:
Brad Fitzpatrick
2021-04-19 13:45:55 -07:00
committed by Brad Fitzpatrick
parent 7d8f082ff7
commit 91c9c33036
2 changed files with 69 additions and 89 deletions

View File

@@ -592,6 +592,7 @@ func (s *server) writeToClients(b []byte) {
// Run runs a Tailscale backend service.
// The getEngine func is called repeatedly, once per connection, until it returns an engine successfully.
func Run(ctx context.Context, logf logger.Logf, logid string, getEngine func() (wgengine.Engine, error), opts Options) error {
getEngine = getEngineUntilItWorksWrapper(getEngine)
runDone := make(chan struct{})
defer close(runDone)
@@ -652,46 +653,6 @@ func Run(ctx context.Context, logf logger.Logf, logid string, getEngine func() (
eng, err := getEngine()
if err != nil {
logf("ipnserver: initial getEngine call: %v", err)
// Issue 1187: on Windows, in unattended mode,
// sometimes we try 5 times and fail to create the
// engine before the system's ready. Hack until the
// bug if fixed properly: if we're running in
// unattended mode on Windows, keep trying forever,
// waiting for the machine to be ready (networking to
// come up?) and then dial our own safesocket TCP
// listener to wake up the usual mechanism that lets
// us surface getEngine errors to UI clients. (We
// don't want to just call getEngine in a loop without
// the listener.Accept, as we do want to handle client
// connections so we can tell them about errors)
bootRaceWaitForEngine, bootRaceWaitForEngineCancel := context.WithTimeout(context.Background(), time.Minute)
if runtime.GOOS == "windows" && opts.AutostartStateKey != "" {
logf("ipnserver: in unattended mode, waiting for engine availability")
getEngine = getEngineUntilItWorksWrapper(getEngine)
// Wait for it to be ready.
go func() {
defer bootRaceWaitForEngineCancel()
t0 := time.Now()
for {
time.Sleep(10 * time.Second)
if _, err := getEngine(); err != nil {
logf("ipnserver: unattended mode engine load: %v", err)
continue
}
c, err := net.Dial("tcp", listen.Addr().String())
logf("ipnserver: engine created after %v; waking up Accept: Dial error: %v", time.Since(t0).Round(time.Second), err)
if err == nil {
c.Close()
}
break
}
}()
} else {
bootRaceWaitForEngineCancel()
}
for i := 1; ctx.Err() == nil; i++ {
c, err := listen.Accept()
if err != nil {
@@ -699,7 +660,6 @@ func Run(ctx context.Context, logf logger.Logf, logid string, getEngine func() (
bo.BackOff(ctx, err)
continue
}
<-bootRaceWaitForEngine.Done()
logf("ipnserver: try%d: trying getEngine again...", i)
eng, err = getEngine()
if err == nil {