Turns out, it's annoying to have to wait the entire interval
before getting any monitorable data, especially for very long
interval probes like hourly/daily checks.
Signed-off-by: David Anderson <danderson@tailscale.com>
Due to a bug in Go (golang/go#51778), cmd/go doesn't warn about your
Go version being older than the go.mod's declared Go version in that
case that package loading fails before the build starts, such as when
you use packages that are only in the current version of Go, like our
use of net/netip.
This change works around that Go bug by adding build tags and a
pre-Go1.18-only file that will cause Go 1.17 and earlier to fail like:
$ ~/sdk/go1.17/bin/go install ./cmd/tailscaled
# tailscale.com/cmd/tailscaled
./required_version.go:11:2: undefined: you_need_Go_1_18_to_compile_Tailscale
note: module requires Go 1.18
Change-Id: I39f5820de646703e19dde448dd86a7022252f75c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Incidentally, simplify the go generate CI workflow, by
marking the dnsfallback update non-hermetic (so CI will
skip it) rather than manually filter it out of `go list`.
Updates #4194
Signed-off-by: David Anderson <danderson@tailscale.com>
The docs say:
Note that while correct uses of TryLock do exist, they are rare,
and use of TryLock is often a sign of a deeper problem in a particular use of mutexes.
Rare code! Or bad code! Who can tell!
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Also make IPPrefixSliceOf use Slice[netaddr.IPPrefix] as it also
provides additional functions besides the standard ones provided by
Slice[T].
Signed-off-by: Maisem Ali <maisem@tailscale.com>
There is a Cosmic Background level of DERP Unreachability,
with individual nodes or regions becoming unreachable briefly
and returning a short time later. This is due to hosting provider
outages or just the Internet sloshing about.
Returning a 500 error pages a human. Being awoken at 3am for
a transient error is annoying.
For relatively small levels of badness don't page a human,
just post to Slack. If the outage impacts a significant fraction
of the DERP fleet, then page a human.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
It includes a fix to allow us to use Go 1.18.
We can now remove our Tailscale-only build tags.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The certstore code is impacted by golang/go#51726.
The Tailscale Go toolchain fork contains a temporary workaround,
so it can compile it. Once the upstream toolchain can compile certstore,
presumably in Go 1.18.1, we can revert this change.
Note that depaware runs with the upstream toolchain.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
As of Go 1.18, the register ABI list includes arm64, amd64,
ppc64, and ppc64le. This is a large enough percentage of the
architectures that it's not worth explaining.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
This is required for staticcheck to process code
using Go 1.18.
This puts us on a random commit on the bleeding edge
of staticcheck, which isn't great, but there don't
appear to have been any releases yet that support 1.18.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The version string changed slightly. Adapt.
And always check the current Go version to prevent future
accidental regressions. I would have missed this one had
I not explicitly manually checked it.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
A new flag --conflict=(skip|overwrite|rename) lets users specify
what to do when receiving files that match a same-named file in
the target directory.
Updates #3548
Signed-off-by: David Eger <david.eger@gmail.com>
- Remove the expanded module files, as Go can likely expand the zips
faster than tar can expand the extra copies.
- Add the go-build cache.
- Remove the extra restore key to avoid extra cache lookups on miss.
Signed-off-by: James Tucker <james@tailscale.com>
Co-authored-by: James Tucker <james@tailscale.com>
Still not sure the exact rules of how/when/who's supposed to set
these, but this works for now on making them match. Baby steps.
Will research more and adjust later.
Updates #4146 (but not enough to fix it, something's still wrong)
Updates #3802
Change-Id: I496d8cd7e31d45fe9ede88fc8894f35dc096de67
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We need to be able to provide the ability for the GUI clients to resolve and set
the exit node IP from an untrusted string, thus enabling the ability to specify
that information via enterprise policy.
This patch moves the relevant code out of the handler for `tailscale up`,
into a method on `Prefs` that may then be called by GUI clients.
We also update tests accordingly.
Updates https://github.com/tailscale/corp/issues/4239
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
Enable use of command line arguments with tailscale cli on gokrazy. Before
this change using arguments like "up" would cause tailscale cli to be
repeatedly restarted by gokrazy process supervisor.
We never want to have gokrazy restart tailscale cli, even if user would
manually start the process.
Expected usage is that user creates files:
flags/tailscale.com/cmd/tailscale/flags.txt:
up
flags/tailscale.com/cmd/tailscaled/flags.txt:
--statedir=/perm/tailscaled/
--tun=userspace-networking
Then tailscale prints URL for user to log in with browser.
Alternatively it should be possible to use up with auth key to allow
unattended gokrazy installs.
Signed-off-by: Joonas Kuorilehto <joneskoo@derbian.fi>
Currently `Write` returns the number of ciphertext bytes written.
According to the docs for io.Writer, Write should return the amount
of bytes consumed from the input.
```
// Write writes len(p) bytes from p to the underlying data stream.
// It returns the number of bytes written from p (0 <= n <= len(p))
// and any error encountered that caused the write to stop early.
// Write must return a non-nil error if it returns n < len(p).
// Write must not modify the slice data, even temporarily.
Write(p []byte) (n int, err error)
```
Fixes#4126
Signed-off-by: Maisem Ali <maisem@tailscale.com>
Customer reported an issue where the connections were not closing, and
would instead just stay open. This commit makes it so that we close out
the connection regardless of what error we see. I've verified locally
that it fixes the issue, we should add a test for this.
Signed-off-by: Maisem Ali <maisem@tailscale.com>
Fix regression from 21069124db caught by tests in another repo.
The HTTP/2 Transport that was being returned had a ConnPool that never
dialed.
Updates #3488
Change-Id: I3184d6393813448ae143d37ece14eb732334c05f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We want to close the connection after a minute of inactivity,
not heartbeat once a minute to keep it alive forever.
Updates #3488
Change-Id: I4b5275e8d1f2528e13de2d54808773c70537db91
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
And flesh out docs on the --http-port flag.
Change-Id: If9d42665f67409082081cb9a25ad74e98869337b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This was just cleanup for an ancient version of Tailscale. Any such machines
have upgraded since then.
Change-Id: Iadcde05b37c2b867f92e02ec5d2b18bf2b8f653a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
And add a CapabilityVersion type, primarily for documentation.
This makes MapRequest.Version, RegisterRequest.Version, and
SetDNSRequest.Version all use the same version, which will avoid
confusing in the future if Register or SetDNS ever changed their
semantics on Version change. (Currently they're both always 1)
This will requre a control server change to allow a
SetDNSRequest.Version value other than 1 to be deployed first.
Change-Id: I073042a216e0d745f52ee2dbc45cf336b9f84b7c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
In the future we'll probably want to run the "tailscale web"
server instead, but for now stop the infinite restart loop.
See https://gokrazy.org/userguide/process-interface/ for details.
Updates #1866
Change-Id: I4133a5fdb859b848813972620495865727fe397a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
One of the current few steps to run Tailscale on gokrazy is to
specify the --tun=userspace-networking flag:
https://gokrazy.org/userguide/install/tailscale/
Instead, make it the default for now. Later we can change the
default to kernel mode if available and fall back to userspace
mode like Synology, once #391 is done.
Likewise, set default paths for Gokrazy, as its filesystem hierarchy
is not the Linux standard one. Instead, use the conventional paths as
documented at https://gokrazy.org/userguide/install/tailscale/.
Updates #1866
RELNOTE=default to userspace-networking mode on gokrazy
Change-Id: I3766159a294738597b4b30629d2860312dbb7609
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
If it's in a non-standard table, as it is on Unifi UDM Pro, apparently.
Updates #4038 (probably fixes, but don't have hardware to verify)
Change-Id: I2cb9a098d8bb07d1a97a6045b686aca31763a937
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Otherwise it would log warnings about an empty file.
```
stores.go:138: store.NewFileStore("/tmp/3777352782"): file empty; treating it like a missing file [warning]
```
Signed-off-by: Maisem Ali <maisem@tailscale.com>
Also move KubeStore and MemStore into their own package.
RELNOTE: tsnet now supports providing a custom ipn.StateStore.
Signed-off-by: Maisem Ali <maisem@tailscale.com>
When I deployed server-side changes, I put the upgrade handler at /ts2021
instead of /switch. We could move the server to /switch, but ts2021 seems
more specific and better, but I don't feel strongly.
Updates #3488
Change-Id: Ifbf8ea60a815fd2fa1bfbe1b7af1ac2a27218354
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Turns out we're pretty good already at init-time work in tailscaled.
The regexp/syntax shows up but it's hard to get rid of that; zstd even
uses regexp. *shrug*
Change-Id: I856aca056dcb7489f5fc22ef07f55f34ddf19bd6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
For ssh and maybe windows service babysitter later.
Updates #3802
Change-Id: I7492b98df98971b3fb72d148ba92c2276cca491f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
For local dev testing initially. Product-wise, it'll probably only be
workable on the two unsandboxed builds.
Updates #3802
Change-Id: Ic352f966e7fb29aff897217d79b383131bf3f92b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
And add a private context type in the process.
Updates #3802
Change-Id: I257187f4cfb0f2248d95b81c1dfe0911ef203b60
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
So it's not confused for a context.Context and we can add contexts
later and not look like we have two.
Updates #3802
Change-Id: Icf229ae2c020d173f3cbf09a13ccd03a60cbb85e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
I introduced a bug in 8fe503057d when unifying oneConnListener
implementations.
The NewOneConnListenerFrom API was easy to misuse (its Close method
closes the underlying Listener), and we did (via http.Serve, which
closes the listener after use, which meant we were close the peerapi's
listener, even though we only wanted its Addr)
Instead, combine those two constructors into one and pass in the Addr
explicitly, without delegating through to any Listener.
Change-Id: I061d7e5f842e0cada416e7b2dd62100d4f987125
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The MSI installer sets a special sentinel value that we can use to detect it.
I also removed the code that bails out when the installation path is not
`Program Files`, as both the NSIS and MSI installers permit the user to install
to a different path.
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
This unbreaks some downstream users of tailscale who end up
with build errors from importing a v0 indirect dependency.
Signed-off-by: David Anderson <danderson@tailscale.com>
Don't make users map their system's "caddy" (or whatever) system user
to its userid. We can do that. Support either a uid or a username.
RELNOTE=TS_PERMIT_CERT_UID can contain a uid or username
Change-Id: I7451b537a5e118b818addf1353882291d5f0d07f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
And also reject attempts to use other users.
Updates #3802
Change-Id: Iddc85f6ea2dba17d12be66a50408d24c1f92833e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
e.g. the change to ipnlocal in this commit ultimately logs out:
{"logtail":{"client_time":"2022-02-17T20:40:30.511381153-08:00","server_time":"2022-02-18T04:40:31.057771504Z"},"type":"Hostinfo","val":{"GoArch":"amd64","Hostname":"tsdev","IPNVersion":"1.21.0-date.20220107","OS":"linux","OSVersion":"Debian 11.2 (bullseye); kernel=5.10.0-10-amd64"},"v":1}
Change-Id: I668646b19aeae4a2fed05170d7b279456829c844
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Otherwise omitempty doesn't work.
This is wire-compatible with a non-pointer type, so switching
is safe, now and in the future.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
(The name SSH_HostKeys is bad but SSHHostKeys is worse.)
Updates #3802
Change-Id: I2a889019c9e8b065b668dd58140db4fcab868a91
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Make tailssh ask LocalBackend for the SSH hostkeys, as we'll need to
distribute them to peers.
For now only the hacky use-same-as-actual-host mode is implemented.
Updates #3802
Change-Id: I819dcb25c14e42e6692c441186c1dc744441592b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
That way humans don't have to remember which is correct.
RELNOTE=--auth-key is the new --authkey, but --authkey still works
Updates tailscale/corp#3486
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
And log it when provided in map responses.
The test uses the date on which I joined Tailscale. :)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Still largely incomplete, but in a better home now.
Updates #3802
Change-Id: I46c5ffdeb12e306879af801b06266839157bc624
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We need to capture some tailnet-related information for some Docker
features we're building. This exposes the tailnet name and MagicDNS
information via `tailscale status --json`.
Fixestailscale/corp#3670
Signed-off-by: Ross Zurowski <ross@rosszurowski.com>
If we've already connected to a certain name's IP in the past, don't
assume the problem was DNS related. That just puts unnecessarily load
on our bootstrap DNS servers during regular restarts of Tailscale
infrastructure components.
Also, if we do do a bootstrap DNS lookup and it gives the same IP(s)
that we already tried, don't try them again.
Change-Id: I743e8991a7f957381b8e4c1508b8e9d0df1782fe
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
When running this script against a totally fresh out of the box Debian
11 image, sometimes it will fail to run because it doesn't have a
package list cached. This patch adds an `apt-get update` to ensure that
the local package cache is up to date.
Signed-off-by: Xe Iaso <xe@tailscale.com>
Our previous Hostinfo logging was all as a side effect of telling
control. And it got marked as verbose (as it was)
This adds a one-time Hostinfo logging that's not verbose, early in
start-up.
Change-Id: I1896222b207457b9bb12ffa7cf361761fa4d3b3a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Spell hamster correctly, and add the name of a teeny tiny type of
hamster, the Roborovski dwarf hamster.
Signed-off-by: Charlotte Brandhorst-Satzkorn <charlotte@tailscale.com>
No behavior changes (intended, at least).
This is in prep for future changes to this package, which would get
too complicated in the current style.
Change-Id: Ic260f8e34ae2f64f34819d4a56e38bee8d8ac5ce
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This TODO was both added and fixed in 506c727e3.
As I recall, I wasn't originally going to do it because it seemed
annoying, so I wrote the TODO, but then I felt bad about it and just
did it, but forgot to remove the TODO.
Change-Id: I8f3514809ad69b447c62bfeb0a703678c1aec9a3
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
I was about to add a third copy, so unify them now instead.
Change-Id: I3b93896aa1249b1250a6b1df4829d57717f2311a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
For analysis of log spam.
Bandwidth is ~unchanged from had we not stripped the "[vN] " from
text; it just gets restructed intot he new "v":N, field. I guess it
adds one byte.
Updates #1548
Change-Id: Ie00a4e0d511066a33d10dc38d765d92b0b044697
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The strconv errors already stringified with the same.
Change-Id: I6938c5653e9aafa6d9028d45fc26e39eb9ccbaea
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The TODO is done. Magicsock doesn't require any endpoints to create an
*endpoint now. Verified both in code and empirically: I can use the
env knob and access everything.
Change-Id: I4fe7ed5b11c5c5e94b21ef3d77be149daeab998a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
If multiple certificates match when selecting a certificate, use the one
issued the most recently (as determined by the NotBefore timestamp).
This also adds some tests for the function that performs that
comparison.
Updates tailscale/coral#6
Signed-off-by: Adrian Dewhurst <adrian@tailscale.com>
The commit b9c92b90db earlier today
caused a regression of serving an empty map always, as it was
JSON marshalling an atomic.Value instead of the DNS entries map
it just built.
Change-Id: I9da3eeca132c6324462dedeaa7d002908557384b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Didn't help enough. We are setting another header anyway. Restore it.
This reverts commit 60abeb027b.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Avoid some work when D-Bus isn't running.
Change-Id: I6f89bb75fdb24c13f61be9b400610772756db1ef
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
If systemd-resolved is enabled but not running (or not yet running,
such as early boot) and resolv.conf is old/dangling, we weren't
detecting systemd-resolved.
This moves its ping earlier, which will trigger it to start up and
write its file.
Updates #3362 (likely fixes)
Updates #3531 (likely fixes)
Change-Id: I6392944ac59f600571c43b8f7a677df224f2beed
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
No one really cares. Its cost outweighs its usefulness.
name old time/op new time/op delta
HandleBootstrapDNS-10 105ns ± 4% 65ns ± 2% -37.68% (p=0.000 n=15+14)
name old alloc/op new alloc/op delta
HandleBootstrapDNS-10 416B ± 0% 0B -100.00% (p=0.000 n=15+15)
name old allocs/op new allocs/op delta
HandleBootstrapDNS-10 3.00 ± 0% 0.00 -100.00% (p=0.000 n=15+15)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Do json formatting once, rather than on every request.
Use an atomic.Value.
name old time/op new time/op delta
HandleBootstrapDNS-10 6.35µs ± 0% 0.10µs ± 4% -98.35% (p=0.000 n=14+15)
name old alloc/op new alloc/op delta
HandleBootstrapDNS-10 3.20kB ± 0% 0.42kB ± 0% -86.99% (p=0.000 n=12+15)
name old allocs/op new allocs/op delta
HandleBootstrapDNS-10 41.0 ± 0% 3.0 ± 0% -92.68% (p=0.000 n=15+15)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
A large influx of new connections can bring down DERP
since it spins off a new goroutine for each connection,
where each routine may do significant amount of work
(e.g., allocating memory and crunching numbers for TLS crypto).
The momentary spike can cause the process to OOM.
This commit sets the groundwork for limiting connections,
but leaves the limit at infinite by default.
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
It makes the most sense to have all our utility functions reside in one place.
There was nothing in corp that could not reasonably live in OSS.
I also updated `StartProcessAsChild` to no longer depend on `futureexec`,
thus reducing the amount of code that needed migration. I tested this change
with `tswin` and it is working correctly.
I have a follow-up PR to remove the corresponding code from corp.
The migrated code was mostly written by @alexbrainman.
Sourced from corp revision 03e90cfcc4dd7b8bc9b25eb13a26ec3a24ae0ef9
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
This patch adds new functions to be used when accessing system policies,
and revises callers to use the new functions. They first attempt the new
registry path for policies, and if that fails, attempt to fall back to the
legacy path.
We keep non-policy variants of these functions because we should be able to
retain the ability to read settings from locations that are not exposed to
sysadmins for group policy edits.
The remaining changes will be done in corp.
Updates https://github.com/tailscale/tailscale/issues/3584
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
We don't use it anyway, so be explicit that we're not using it.
Change-Id: Iec953271ef0169a2e227811932f5b65b479624af
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Recent linuxmint releases now use VERSION_CODENAME for
a linuxmint release (like "uma") and set UBUNTU_CODENAME to
the Ubuntu release they branched from.
Tested in a linuxmint 20.2 VM.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
tailscaled was using 100% CPU on a machine with ~1M lines, 100MB+
of /proc/net/route data.
Two problems: in likelyHomeRouterIPLinux, we didn't stop reading the
file once we found the default route (which is on the first non-header
line when present). Which meant it was finding the answer and then
parsing 100MB over 1M lines unnecessarily. Second was that if the
default route isn't present, it'd read to the end of the file looking
for it. If it's not in the first 1,000 lines, it ain't coming, or at
least isn't worth having. (it's only used for discovering a potential
UPnP/PMP/PCP server, which is very unlikely to be present in the
environment of a machine with a ton of routes)
Change-Id: I2c4a291ab7f26aedc13885d79237b8f05c2fd8e4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
It was broken on Windows:
Error: util\winutil\winutil_windows.go:15:7: regBase redeclared in this block
Error: D:\a\tailscale\tailscale\util\winutil\winutil_notwindows.go:7:17: previous declaration
Error: util\winutil\winutil_windows.go:29:6: getRegString redeclared in this block
Error: D:\a\tailscale\tailscale\util\winutil\winutil_notwindows.go:9:40: previous declaration
Error: util\winutil\winutil_windows.go:47:6: getRegInteger redeclared in this block
Error: D:\a\tailscale\tailscale\util\winutil\winutil_notwindows.go:11:48: previous declaration
Error: util\winutil\winutil_windows.go:77:6: isSIDValidPrincipal redeclared in this block
Error: D:\a\tailscale\tailscale\util\winutil\winutil_notwindows.go:13:38: previous declaration
Change-Id: Ib1ce4b647f5711547840c736b933a6c42bf09583
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Our current workaround made the user check too lax, thus allowing deleted
users. This patch adds a helper function to winutil that checks that the
uid's SID represents a valid Windows security principal.
Now if `lookupUserFromID` determines that the SID is invalid, we simply
propagate the error.
Updates https://github.com/tailscale/tailscale/issues/869
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
We're finding a bunch of host operating systems/firewalls interact poorly
with peerapi. We either get ICMP errors from the host or users need to run
commands to allow the peerapi port:
https://github.com/tailscale/tailscale/issues/3842#issuecomment-1025133727
... even though the peerapi should be an internal implementation detail.
Rather than fight the host OS & firewalls, this change handles the
server side of peerapi entirely in netstack (except on iOS), so it
never makes its way to the host OS where it might be messed with. Two
main downsides are:
1) netstack isn't as fast, but we don't really need speed for peerapi.
And actually, with fewer trips to/from the kernel, we might
actually make up for some of the netstack performance loss by
staying in userspace.
2) tcpdump / Wireshark etc packet captures will no longer see the peerapi
traffic. Oh well. Crawshaw's been wanting to add packet capture server
support to tailscaled, so we'll probably do that sooner now.
A future change might also then use peerapi for the client-side
(except on iOS).
Updates #3842 (probably fixes, as well as many exit node issues I bet)
Change-Id: Ibc25edbb895dc083d1f07bd3cab614134705aa39
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Also fix a somewhat related printing bug in the process where
some paths would print "Success." inconsistently even
when there otherwise was no output (in the EditPrefs path)
Fixes#3830
Updates #3702 (which broke it once while trying to fix it)
Change-Id: Ic51e14526ad75be61ba00084670aa6a98221daa5
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Now that Go 1.17 has module graph pruning
(https://go.dev/doc/go1.17#go-command), we should be able to use
upstream netstack without breaking our private repo's build
that then depends on the tailscale.com Go module.
This is that experiment.
Updates #1518 (the original bug to break out netstack to own module)
Updates #2642 (this updates netstack, but doesn't remove workaround)
Change-Id: I27a252c74a517053462e5250db09f379de8ac8ff
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Salamanders also have no scales. I checked the interweb, and there
doesn't seem to be any subspecies that would let us claim that
*some* salamanders are scaley.
But they are tailey, for sure.
Signed-off-by: David Anderson <danderson@tailscale.com>
So you can run Caddy etc as a non-root user and let it have access to
get certs.
Updates caddyserver/caddy#4541
Change-Id: Iecc5922274530e2b00ba107d4b536580f374109b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
So Linux/etc CLI users get helpful advice to run tailscale
with --operator=$USER when they try to 'tailscale file {cp,get}'
but are mysteriously forbidden.
Signed-off-by: David Eger <eger@google.com>
Signed-off-by: David Eger <david.eger@gmail.com>
Disabled by default.
To use, run tailscaled with:
TS_SSH_ALLOW_LOGIN=you@bar.com
And enable with:
$ TAILSCALE_USE_WIP_CODE=true tailscale up --ssh=true
Then ssh [any-user]@[your-tailscale-ip] for a root bash shell.
(both the "root" and "bash" part are temporary)
Updates #3802
Change-Id: I268f8c3c95c8eed5f3231d712a5dc89615a406f0
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
A new package can also later record/report which knobs are checked and
set. It also makes the code cleaner & easier to grep for env knobs.
Change-Id: Id8a123ab7539f1fadbd27e0cbeac79c2e4f09751
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Mudpuppies are salamanders, and as such have tails but no scales.
The management apologizes for the error.
Signed-off-by: David Anderson <danderson@tailscale.com>
Currently only search domains are stored. This was an oversight
(under?) on my part.
As things are now, when MagicDNS is on and "Override local DNS" is
off, the dns forwarder has to timeout before names resolve. This
introduces a pretty annoying lang that makes everything feel
extremely slow. You will also see an error: "upstream nameservers
not set".
I tested with "Override local DNS" on and off. In both situations
things seem to function as expected (and quickly).
Signed-off-by: Aaron Bieber <aaron@bolddaemon.com>
This fixes a deadlock on shutdown.
One goroutine is waiting to send on c.derpRecvCh before unlocking c.mu.
The other goroutine is waiting to lock c.mu before receiving from c.derpRecvCh.
#3736 has a more detailed explanation of the sequence of events.
Fixes#3736
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
-W is milliseconds on darwin, not seconds, and empirically it's
milliseconds after a 1 second base.
Change-Id: I2520619e6699d9c505d9645ce4dfee4973555227
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
With this change, the client can obtain the initial handshake message
separately from the rest of the handshake, for embedding into another
protocol. This enables things like RTT reduction by stuffing the
handshake initiation message into an HTTP header.
Similarly, the server API optionally accepts a pre-read Noise initiation
message, in addition to reading the message directly off a net.Conn.
Updates #3488
Signed-off-by: David Anderson <danderson@tailscale.com>
This test set the bar too high.
Just a couple of missed timers was enough to fail.
Change the test to more of a sanity check.
While we're here, run it for just 1s instead of 5s.
Prior to this change, on a 13" M1 MPB, with
stress -p 512 ./rate.test -test.run=QPS
I saw 90%+ failures.
After this change, I'm at 30k runs with no failures yet.
Fixes#3733
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Go 1.17 added a HandshakeContext func to take care of timeouts during
TLS handshaking, so switch from our homegrown goroutine implementation
to the standard way.
Signed-off-by: David Anderson <danderson@tailscale.com>
Cancelling the context makes the timeout goroutine race with the write that
reports a successful TLS handshake, so you can end up with a successful TLS
handshake that mysteriously reports that it timed out after ~0s in flight.
The context is always canceled and cleaned up as the function exits, which
happens mere microseconds later, so just let function exit clean up and
thereby avoid races.
Signed-off-by: David Anderson <danderson@tailscale.com>
This started as an attempt to placate GitHub's code scanner,
but it's also probably generally a good idea.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Turning this on at the beginning of the 1.21.x dev cycle, for 1.22.
Updates #150
Change-Id: I1de567cfe0be3df5227087de196ab88e60c9eb56
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The GitHub code scanner flagged this as a security vulnerability.
I don't believe it was, but I couldn't convince myself of it 100%.
Err on the safe side and use html/template to generate the HTML,
with all necessary escaping.
Fixestailscale/corp#2698
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
On Synology, the /etc/resolv.conf has tabs in it, which this
resolv.conf parser (we have two, sigh) didn't handle.
Updates #3710
Change-Id: I86f8e09ad1867ee32fa211e85c382a27191418ea
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The --reset shouldn't imply that a Backend.Start is necessary. With
this, it can do a Backend.EditPrefs instead, which then doesn't do all
the heavy work that Start does. Also, Start on Windows behaves
slightly differently than Linux etc in some cases because of tailscaled
running in client mode on Windows (where the GUI supplies the prefs).
Fixes#3702
Change-Id: I75c9f08d5e0052bf623074030a3a7fcaa677abf6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Tailscale seems to be breaking WSL configurations lately. Until we
understand what changed, turn off Tailscale's involvement by default
and make it opt-in.
Updates #2815
Change-Id: I9977801f8debec7d489d97761f74000a4a33f71b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
OpenBSD 6.9 and up has a daemon which handles nameserver configuration. This PR
teaches the OpenBSD dns manager to check if resolvd is being used. If it is, it
will use the route(8) command to tell resolvd to add the Tailscale dns entries
to resolv.conf
Signed-off-by: Aaron Bieber <aaron@bolddaemon.com>
The rest of our workflows use v2.1.4.
For reasons I do not understand, we must set GOPATH here.
Maybe the GitHub Action builds come with GOPATH already set?
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Except for the super verbose packet-level dumps. Keep those disabled
by default with a const.
Updates #2642
Change-Id: Ia9eae1677e8b3fe6f457a59e44896a335d95d547
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
From Maisem's code review feedback where he mashed the merge
button by mistake.
Change-Id: I55abce036a6c25dc391250514983125dda10126c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This code was copied in a few places (Windows, Android), so unify it
and add tests.
Change-Id: Id0510c0f5974761365a2045279d1fb498feca11e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The blockForeverConn was only using its sync.Cond one side. Looks like it
was just forgotten.
Fixes#3671
Change-Id: I4ed0191982cdd0bfd451f133139428a4fa48238c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Bigger changes coming later, but this should improve things a bit in
the meantime.
Rationale:
* 2 minutes -> 45 seconds: 2 minutes was overkill and never considered
phones/battery at the time. It was totally arbitrary. 45 seconds is
also arbitrary but is less than 2 minutes.
* heartbeat from 2 seconds to 3 seconds: in practice this meant two
packets per second (2 pings and 2 pongs every 2 seconds) because the
other side was also pinging us every 2 seconds on their own.
That's just overkill. (see #540 too)
So in the worst case before: when we sent a single packet (say: a DNS
packet), we ended up sending 61 packets over 2 minutes: the 1 DNS
query and then then 60 disco pings (2 minutes / 2 seconds) & received
the same (1 DNS response + 60 pongs). Now it's 15. In 1.22 we plan to
remove this whole timer-based heartbeat mechanism entirely.
The 5 seconds to 6.5 seconds change is just stretching out that
interval so you can still miss two heartbeats (other 3 + 3 seconds
would be greater than 5 seconds). This means that if your peer moves
without telling you, you can have a path out for 6.5 seconds
now instead of 5 seconds before disco finds a new one. That will also
improve in 1.22 when we start doing UDP+DERP at the same time
when confidence starts to go down on a UDP path.
Updates #3363
Change-Id: Ic2314bbdaf42edcdd7103014b775db9cf4facb47
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
I apparently only did HTTP before, not HTTPS.
Updates tailscale/corp#1327
Change-Id: I7d5265a0a25fcab5b142c8c3f21a0920f6cae39f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Fixes#3660
RELNOTE=MagicDNS now works over IPv6 when CGNAT IPv4 is disabled.
Change-Id: I001e983df5feeb65289abe5012dedd177b841b45
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
But still support hello.ipn.dev for a bit.
Updates tailscale/corp#1327
Change-Id: Iab59cca0b260d69858af16f4e42677e54f9fe54a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
And delete the unused code in net/dns/resolver/neterr_*.go.
Change-Id: Ibe62c486bacce2733eb9968c96a98cbbdb2758bd
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Treat UDP send EPERM errors as a lost UDP packet, not something super
fatal. That's just the Linux firewall preventing it from going out.
And add a leaf package net/neterror for that (and future) policy that
all three packages can share, with tests.
Updates #3619
Change-Id: Ibdb838c43ee9efe70f4f25f7fc7fdf4607ba9c1d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Only if the source address isn't on the currently active interface or
a ping of the DERP server fails.
Updates #3619
Change-Id: I6bf06503cff4d781f518b437c8744ac29577acc8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
It was pretty ill-defined before and mostly for logging. But I wanted
to start depending on it, so define what it is and make Windows match
the other operating systems, without losing the log output we had
before. (and add tests for that)
Change-Id: I0fbbba1cfc67a265d09dd6cb738b73f0f6005247
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
So magicsock can later ask a DERP connection whether its source IP
would've changed if it reconnected.
Updates #3619
Change-Id: Ibc8810340c511d6786b60c78c1a61c09f5800e40
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Continuing work in 434af15a04, to make it possible for magicsock to
probe whether a DERP server is still there.
Updates #3619
Change-Id: I366a77c27e93b876734e64f445b85ef01eb590f2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
In prep for a future change to have client ping derp connections
when their state is questionable, rather than aggressively tearing
them down and doing a heavy reconnect when their state is unknown.
We already support ping/pong in the other direction (servers probing
clients) so we already had the two frame types, but I'd never finished
this direction.
Updates #3619
Change-Id: I024b815d9db1bc57c20f82f80f95fb55fc9e2fcc
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We only tracked the transport type (UDP vs DERP), not what they were.
Change-Id: Ia4430c1c53afd4634e2d9893d96751a885d77955
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Don't just ignore them. See if this makes them calm down.
Updates #3363
Change-Id: Id1d66308e26660d26719b2538b577522a1e36b63
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
To convince me it's not as alloc-y as it looks.
Change-Id: I503a0cc267268a23d2973dfde9833c420be4e868
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This is for use by the Windows GUI client to log via when an
exit node is in use, so the logs don't go out via the exit node and
instead go directly, like tailscaled's. The dialer tried to do that
in the unprivileged GUI by binding to a specific interface, but the
"Internet Kill Switch" installed by tailscaled for exit nodes
precludes that from working and instead the GUI fails to dial out.
So, go through tailscaled (with a CONNECT request) instead.
Fixestailscale/corp#3169
Change-Id: I17a8efdc1d4b8fed53a29d1c19995592b651b215
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The intent of the updateIPs code is to add & remove IP addresses
to netstack based on what we get from the netmap.
But netstack itself adds 255.255.255.255/32 apparently and we always
fight it (and it adds it back?). So stop fighting it.
Updates #2642 (maybe fixes? maybe.)
Change-Id: I37cb23f8e3f07a42a1a55a585689ca51c2be7c60
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The new /keys endpoint allows you to list API and machine auth keys.
You can also create machine auth key.
It currently does not support creating another API key.
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
This moves the Windows-only initialization of the filelogger into
logpolicy. Previously we only did it when babysitting the tailscaled
subprocess, but this meant that log messages from the service itself
never made it to disk. Examples that weren't logged to disk:
* logtail unable to dial out,
* DNS flush messages from the service
* svc.ChangeRequest messages (#3581)
This is basically the same fix as #3571 but staying in the Logf type,
and avoiding build-tagged file (which wasn't quite a goal, but
happened and seemed nice)
Fixes#3570
Co-authored-by: Aaron Klotz <aaron@tailscale.com>
Change-Id: Iacd80c4720b7218365ec80ae143339d030842702
Make shrinkDefaultRoute a pure function.
Instead of calling interfaceRoutes, accept that information as parameters.
Hard-code those parameters in TestShrinkDefaultRoute.
Fixes#3580
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
One option was to just hide "offline" in the text output, but that
doesn't fix the JSON output.
The next option was to lie and say it's online in the JSON (which then
fixes the "offline" in the text output).
But instead, this sets the self node's "Online" to whether we're in an
active map poll.
Fixes#3564
Change-Id: I9b379989bd14655198959e37eec39bb570fb814a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
testNodes have a reference to a testing.TB via their env.
Use it instead of making the caller pass theirs.
We did this in some methods but not others; finish the job.
This simplifies the call sites.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
magicsock was hanging onto its netmap on logout,
which caused tailscale status to display partial
information about a bunch of zombie peers.
After logout, there should be no peers.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
If you're using -verbose-tailscaled, you're doing in-the-weeds debugging,
so you probably want the verbose output.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
I'm sick of this flaking. Even if this isn't the right fix, it
stops the alert fatigue.
Updates #3020
Change-Id: I4001c127d78f1056302f7741adec34210a72ee61
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
And it updates the build tag style on a couple files.
Change-Id: I84478d822c8de3f84b56fa1176c99d2ea5083237
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
I broke it in 1.17.x sometime while rewiring some logs stuff,
mostly in 0653efb092 (but with a handful
of logs-related changes around that time)
Fixestailscale/corp#3265
Change-Id: Icb5c07412dc6d55f1d9244c5d0b51dceca6a7e34
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The existing code relied on the Go build cache to avoid
needless work when obtaining the tailscale binaries.
For non-obvious reasons, the binaries were getting re-linked
every time, which added 600ms or so on my machine to every test.
Instead, build the binaries exactly once, on demand.
This reduces the time to run 'go test -count=5' from 34s to 10s
on my machine.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
After apt install, Kali Linux had not enabled nor started
the tailscaled systemd service. Add a quirks mode to enable
and start it after apt install for debian platforms.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
One of the most annoying parts of using the Tailscale CLI on Windows
and the macOS GUI is that Tailscale's GUIs default to running with
"Route All" (accept all non-exitnode subnet routes) but the CLI--being
originally for Linux--uses the Linux default, which is to not accept
subnets.
Which means if a Windows user does, e.g.:
tailscale up --advertise-exit-node
Or:
tailscale up --shields-up
... then it'd warn about reverting the --accept-routes option, which the user
never explicitly used.
Instead, make the CLI's default match the platform/GUI's default.
Change-Id: I15c804b3d9b0266e9ca8651e0c09da0f96c9ef8d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
on error.
While debugging a customer issue where the firewallTweaker was failing
the only message we have is `router: firewall: error adding
Tailscale-Process rule: exit status 1` which is not really helpful.
This will help diagnose firewall tweaking failures.
Signed-off-by: Maisem Ali <maisem@tailscale.com>
fee2d9fad added support for cmd/tailscale to connect to IPNExtension.
It came in two parts: If no socket was provided, dial IPNExtension first,
and also, if dialing the socket failed, fall back to IPNExtension.
The second half of that support caused the integration tests to fail
when run on a machine that was also running IPNExtension.
The integration tests want to wait until the tailscaled instances
that they spun up are listening. They do that by dialing the new
instance. But when that dial failed, it was falling back to IPNExtension,
so it appeared (incorrectly) that tailscaled was running.
Hilarity predictably ensued.
If a user (or a test) explicitly provides a socket to dial,
it is a reasonable assumption that they have a specific tailscaled
in mind and don't want to fall back to IPNExtension.
It is certainly true of the integration tests.
Instead of adding a bool to Connect, split out the notion of a
connection strategy. For now, the implementation remains the same,
but with the details hidden a bit. Later, we can improve that.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
This is enough to handle the DNS queries as generated by Go's
net package (which our HTTP/SOCKS client uses), and the responses
generated by the ExitDNS DoH server.
This isn't yet suitable for putting on 100.100.100.100 where a number
of different DNS clients would hit it, as this doesn't yet do
EDNS0. It might work, but it's untested and likely incomplete.
Likewise, this doesn't handle anything about truncation, as the
exchanges are entirely in memory between Go or DoH. That would also
need to be handled later, if/when it's hooked up to 100.100.100.100.
Updates #3507
Change-Id: I1736b0ad31eea85ea853b310c52c5e6bf65c6e2a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
It's been a bunch of releases now since the TailscaleIPs slice
replacement was added.
Change-Id: I3bd80e1466b3d9e4a4ac5bedba8b4d3d3e430a03
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
It will be used for ICMPv6 next, so pass in the proto.
Also, use the ipproto constants rather than hardcoding the mysterious
number.
Change-Id: I57b68bdd2d39fff75f82affe955aff9245de246b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Allow users of CallbackRouter to supply a GetBaseConfig
implementation. This is expected to be used on Android,
which currently lacks both a) platform support for
Split-DNS and b) a way to retrieve the current DNS
servers.
iOS/macOS also use the CallbackRouter but have platform
support for SplitDNS, so don't need getBaseConfig.
Updates https://github.com/tailscale/tailscale/issues/2116
Updates https://github.com/tailscale/tailscale/issues/988
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
The caller of func run said:
// No need to log; the func already did
But that wasn't true. Some return paths didn't log.
So instead, return rich errors and have func main do the logging,
so we can't miss anything in the future.
Prior to this, safesocket.Listen for instance was causing tailscaled
to os.Exit(1) on failure without any clue as to why.
Change-Id: I9d71cc4d73d0fed4aa1b1902cae199f584f25793
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Given our development cycle, we'll instead do big-bang updates
after every release, to give time for all the updates to soak in
unstable.
This does _not_ disable dependabot security-critical PRs.
Signed-off-by: David Anderson <danderson@tailscale.com>
To make ExitDNS cheaper.
Might not finish client-side support in December before 1.20, but at
least server support can start rolling out ahead of clients being
ready for it.
Tested with curl against peerapi.
Updates #1713
Change-Id: I676fed5fb1aef67e78c542a3bc93bddd04dd11fe
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
If the user has a "Taildrop" shared folder on startup and
the "tailscale" system user has read/write access to it,
then the user can "tailscale file cp" to their NAS.
Updates #2179 (would be fixes, but not super ideal/easy yet)
Change-Id: I68e59a99064b302abeb6d8cc84f7d2a09f764990
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
And simplify, unexport some tsdial/netstack stuff in the the process.
Fixes#3475
Change-Id: I186a5a5cbd8958e25c075b4676f7f6e70f3ff76e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The control plane is currently still eating it.
Updates #1713
Change-Id: I66a0698599d6794ab1302f9585bf29e38553c884
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Before:
failed to connect to local tailscaled (which appears to be running). Got error: Get "http://local-tailscaled.sock/localapi/v0/status": EOF
After:
failed to connect to local tailscaled (which appears to be running as IPNExtension, pid 2118). Got error: Get "http://local-tailscaled.sock/localapi/v0/status": EOF
This was useful just now, as it made it clear that tailscaled I thought
I was connecting to might not in fact be running; there was
a second tailscaled running that made the error message slightly misleading.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
It was using the wrong prefs (intended vs current) to map the current
exit node ID to an IP.
Fixes#3480
Change-Id: I9f117d99a84edddb4cd1cb0df44a2f486abde6c2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
If you're online, let tailscale up --exit-node=NAME map NAME to its IP.
We don't store the exit node name server-side in prefs, avoiding
the concern raised earlier.
Fixes#3062
Change-Id: Ieea5ceec1a30befc67e9d6b8a530b3cb047b6b40
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This starts to refactor tsdial.Dialer's name resolution to have
different stages: in-memory MagicDNS vs system resolution. A future
change will plug in ExitDNS resolution.
This also plumbs a Dialer into netstack and unexports the dnsMap
internals.
And it removes some of the async AddNetworkMapCallback usage and
replaces it with synchronous updates of the Dialer's netmap
from LocalBackend, since the LocalBackend has the Dialer too.
Updates #3475
Change-Id: Idcb7b1169878c74f0522f5151031ccbc49fe4cb4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Without this, enabling an exit node immediately blackholes all traffic,
but doesn't correctly let it flow to the exit node until the next netmap
update.
Fixes#3447
Signed-off-by: David Anderson <danderson@tailscale.com>
With this, I'm able to send a Taildrop file (using "tailscale file cp")
from a Linux machine running --tun=userspace-networking.
Updates #2179
Change-Id: I4e7a4fb0fbda393e4fb483adb06b74054a02cfd0
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
In prep for moving stuff out of LocalBackend.
Change-Id: I9725aa9c3ebc7275f8c40e040b326483c0340127
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Not done yet, but this move more of the outbound dial special casing
from random packages into tsdial, which aspires to be the one unified
place for all outbound dialing shenanigans.
Then this plumbs it all around, so everybody is ultimately
holding on to the same dialer.
As of this commit, macOS/iOS using an exit node should be able to
reach to the exit node's DoH DNS proxy over peerapi, doing the sockopt
to stay within the Network Extension.
A number of steps remain, including but limited to:
* move a bunch more random dialing stuff
* make netstack-mode tailscaled be able to use exit node's DNS proxy,
teaching tsdial's resolver to use it when an exit node is in use.
Updates #1713
Change-Id: I1e8ee378f125421c2b816f47bc2c6d913ddcd2f5
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The behavior was changed in March (in 7f174e84e6)
but that change forgot to update these docs.
Change-Id: I79c0301692c1d13a4a26641cc5144baf48ec1360
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
For now this just deletes the net/socks5/tssocks implementation (and
the DNSMap stuff from wgengine/netstack) and moves it into net/tsdial.
Then initialize a Dialer early in tailscaled, currently only use for the
outbound and SOCKS5 proxies. It will be plumbed more later. Notably, it
needs to get down into the DNS forwarder for exit node DNS forwading
in netstack mode. But it will also absorb all the peerapi setsockopt
and netns Dial and tlsdial complexity too.
Updates #1713
Change-Id: Ibc6d56ae21a22655b2fa1002d8fc3f2b2ae8b6df
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We often need both a log function and a context.
We can do this by adding the log function as a context value.
This commit adds helper glue to make that easy.
It is designed to allow incremental adoption.
Updates tailscale/corp#3138
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The block-write and block-read tests are both flaky,
because each assumes it can get a normal read/write
completed within 10ms. This isn’t always true.
We can’t increase the timeouts, because that slows down the test.
However, we don’t need to issue a regular read/write for this test.
The immediately preceding tests already test this code,
using a far more generous timeout.
Remove the extraneous read/write.
This drops the failure rate from 1 per 20,000 to undetectable
on my machine.
While we’re here, fix a typo in a debug print statement.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Without the continue, we might overwrite our current meta
with a zero meta.
Log the error, so that we can check for anything unexpected.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
So Taildrop sends work even if the local tailscaled is running in
netstack mode, as it often is on Synology, etc.
Updates #2179 (which is primarily about receiving, but both important)
Change-Id: I9bd1afdc8d25717e0ab6802c7cf2f5e0bd89a3b2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Don't be a DoH DNS server to peers unless the Tailnet admin has permitted
that peer autogroup:internet access.
Updates #1713
Change-Id: Iec69360d8e4d24d5187c26904b6a75c1dabc8979
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
I probably broke it when SCTP support was added but nothing apparently
ever used NewAllowAllForTest so it wasn't noticed when it broke.
Change-Id: Ib5a405be233d53cb7fcc61d493ae7aa2d1d590a2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
If IP forwarding is disabled globally, but enabled per-interface on all interfaces,
don't complain. If only some interfaces have forwarding enabled, warn that some
subnet routing/exit node traffic may not work.
Fixes#1586
Signed-off-by: David Anderson <danderson@tailscale.com>
It's a basic "deny everything" policy, since DERP's HTTP
server is very uninteresting from a browser POV. But it
stops every security scanner under the sun from reporting
"dangerously configured" HTTP servers.
Updates tailscale/corp#3119
Signed-off-by: David Anderson <danderson@tailscale.com>
Android doesn't use logpolicy and currently has enough
unique stuff about its logging that makes it difficult to
do so. For example, its logsDir comes from Gio.
Export NewLogtailTransport to let Android use it.
Updates https://github.com/tailscale/tailscale/issues/3046
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
Currently, comments in resolv.conf cause our parser to fail,
with error messages like:
ParseIP("192.168.0.100 # comment"): unexpected character (at " # comment")
Fix that.
Noticed while looking through logs.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
We were missing an argument here.
Also, switch to %q, in case anything weird
is happening with these strings.
Updates tailscale/corp#461
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
When this happens, it is incredibly noisy in the logs.
It accounts for about a third of all remaining
"unexpected" log lines from a recent investigation.
It's not clear that we know how to fix this,
we have a functioning workaround,
and we now have a (cheap and efficient) metric for this
that we can use for measurements.
So reduce the logging to approximately once per minute.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
This limits the output to a single IP address.
RELNOTE=tailscale ip now has a -1 flag (TODO: update docs to use it)
Fixes#1921
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
These were supposed to be part of
3b541c833e but I guess I forgot to "git
add" them. Whoops.
Updates #3307
Change-Id: I8c768a61ec7102a01799e81dc502a22399b9e9f0
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
One of the most common "unexpected" log lines is:
"network state changed, but stringification didn't"
One way that this can occur is if an interesting interface
(non-Tailscale, has interesting IP address)
gains or loses an uninteresting IP address (link local or loopback).
The fact that the interface is interesting is enough for EqualFiltered
to inspect it. The fact that an IP address changed is enough for
EqualFiltered to declare that the interfaces are not equal.
But the State.String method reasonably declines to print any
uninteresting IP addresses. As a result, the network state appears
to have changed, but the stringification did not.
The String method is correct; nothing interesting happened.
This change fixes this by adding an IP address filter to EqualFiltered
in addition to the interface filter. This lets the network monitor
ignore the addition/removal of uninteresting IP addresses.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Linux-only for now, to avoid having to figure out why
powershell doesn't like my shell scripting. (Not that I blame it.)
That'll be enough to catch most regressions.
Fixes#1083
Co-authored-by: Aaron Klotz <aaron@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The Windows BOOL type is an int32. We were using a bool,
which is a one byte wide. This could be responsible for the
ERROR_INVALID_PARAMETER errors we were seeing for calls to
WinHttpGetProxyForUrl.
We manually checked all other existing Windows syscalls
for similar mistakes and did not find any.
Updates #879
Co-authored-by: Aaron Klotz <aaron@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
We replace the cmd.exe invocation with RtlGetNtVersionNumbers for the first
three fields. On Windows 10+, we query for the fourth field which is available
via the registry.
The fourth field is not really documented anywhere; Firefox has been querying
it successfully since Windows 10 was released, so we can be pretty confident in
its longevity at this point.
Fixes https://github.com/tailscale/tailscale/issues/1478
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
There are lots of lines in the logs of the form:
portmapper: unexpected PMP probe response: {OpCode:128 ResultCode:3
SecondsSinceEpoch:NNN MappingValidSeconds:0 InternalPort:0
ExternalPort:0 PublicAddr:0.0.0.0}
ResultCode 3 here means a network failure, e.g. the NAT box itself has
not obtained a DHCP lease. This is not an indication that something
is wrong in the Tailscale client, so use different wording here
to reflect that. Keep logging, so that we can analyze and debug
the reasons that PMP probes fail.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Lets the systemd-resolved OSConfigurator report health changes
for out of band config resyncs.
Updates #3327
Signed-off-by: David Anderson <danderson@tailscale.com>
In rare circumstances (tailscale/corp#3016), the PublicKey
and Endpoints can diverge.
This by itself doesn't cause any harm, but our early exit
in response did, because it prevented us from recovering from it.
Remove the early exit.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
At some point since filelogger was added on Windows, the log hierarchy
above it changed such that a log.Printf writes to filelogger and includes
the log package's own date. But then filelogger adds another.
Rather than debug everything above and risk removing the prefix when
run by tailscaled, instead just remove the log package's prefix
very late right before we go to add the filelogger's own.
Change-Id: I9db518f42c603ef83017f74827270f124fdf5c14
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Tailscale 1.18 uses netlink instead of the "ip" command to program the
Linux kernel.
The old way was kept primarily for tests, but this also adds a
TS_DEBUG_USE_IP_COMMAND environment knob to force the old way
temporarily for debugging anybody who might have problems with the
new way in 1.18.
Updates #391
Change-Id: I0236fbfda6c9c05dcb3554fcc27ec0c86456efd9
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
endpoint.discoKey is protected by endpoint.mu.
endpoint.sendDiscoMessage was reading it without holding the lock.
This showed up in a CI failure and is readily reproducible locally.
The fix is in two parts.
First, for Conn.enqueueCallMeMaybe, eliminate the one-line helper method endpoint.sendDiscoMessage; call Conn.sendDiscoMessage directly.
This makes it more natural to read endpoint.discoKey in a context
in which endpoint.mu is already held.
Second, for endpoint.sendDiscoPing, explicitly pass the disco key
as an argument. Again, this makes it easier to read endpoint.discoKey
in a context in which endpoint.mu is already held.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
I believe that this should eliminate the flakiness.
If GitHub CI manages to be even slower that can be believed
(and I can believe a lot at this point),
then we should roll this back and make some more invasive changes.
Updates #654Fixes#3247 (I hope)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
We can do the "maybe delete" check unilaterally:
In the case of an insert, both oldDiscoKey
and ep.discoKey will be the zero value.
And since we don't use pi again, we can skip
giving it a name, which makes scoping clearer.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
wgengine/wgcfg: introduce wgcfg.NewDevice helper to disable roaming
at all call sites (one real plus several tests).
Fixestailscale/corp#3016.
Signed-off-by: David Anderson <danderson@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Don't set all the *.arpa. reverse DNS lookup domains if systemd-resolved
is old and can't handle them.
Fixes#3188
Change-Id: I283f8ce174daa8f0a972ac7bfafb6ff393dde41d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
It was a mess of flags. Use subcommands under "debug" instead.
And document loudly that it's not a stable interface.
Change-Id: Idcc58f6a6cff51f72cb5565aa977ac0cc30c3a03
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
And annotate magicsock as a start.
And add localapi and debug handlers with the Prometheus-format
exporter.
Updates #3307
Change-Id: I47c5d535fe54424741df143d052760387248f8d3
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Was done as part of e6fbc0cd54 for ssh
work, but wasn't committed yet. Including it here both to minimize the
ssh diff size, and because I need it for a separate change.
Change-Id: If6eb54a2ca7150ace96488ed14582c2c05ca3422
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
More work towards removing the massive ipnserver.Run and ipnserver.Options
and making composable pieces.
Work remains. (The getEngine retry loop on Windows complicates things.)
For now some duplicate code exists. Once the Windows side is fixed
to either not need the retry loop or to move the retry loop into a
custom wgengine.Engine wrapper, then we can unify tailscaled_windows.go
too.
Change-Id: If84d16e3cd15b54ead3c3bb301f27ae78d055f80
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Fixes regression from 81cabf48ec which made
all map errors be sent to the frontend UI.
Fixes#3230
Change-Id: I7f142c801c7d15e268a24ddf901c3e6348b6729c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
For debugging Synology. Like the existing goroutines handler, in that
it's owner-only.
Change-Id: I852f0626be8e1c0b6794c1e062111d14adc3e6ac
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
In DeviceConfig, we did not close r after calling FromUAPI.
If FromUAPI returned early due to an error, then it might
not have read all the data that IpcGetOperation wanted to write.
As a result, IpcGetOperation could hang, as in #3220.
We were also closing the wrong end of the pipe after IpcSetOperation
in ReconfigDevice.
To ensure that we get all available information to diagnose
such a situation, include all errors anytime something goes wrong.
This should fix the immediate crashing problem in #3220.
We'll then need to figure out why IpcGetOperation was failing.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
github.com/go-multierror/multierror served us well.
But we need a few feature from it (implement Is),
and it's not worth maintaining a fork of such a small module.
Instead, I did a clean room implementation inspired by its API.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Using temporary netlink fork in github.com/tailscale/netlink until we
get the necessary changes upstream in either vishvananda/netlink
or jsimonetti/rtnetlink.
Updates #391
Change-Id: I6e1de96cf0750ccba53dabff670aca0c56dffb7c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Even if not in use. We plan to use it for more stuff later.
(not for iOS or macOS-GUIs yet; only tailscaled)
Change-Id: Idaef719d2a009be6a39f158fd8f57f8cca68e0ee
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This leaves behind a type alias and associated constructor, to allow
for gradual switchover.
Updates #3206.
Signed-off-by: David Anderson <danderson@tailscale.com>
Temporary until #3206 goes away, but having changed the marshal/unmarshal
implementation I got nervous about the new one doing the correct thing.
Thankfully, the test says it does.
Signed-off-by: David Anderson <danderson@tailscale.com>
(Fix to 31e4f60047)
The 31e4f60047 change accidentally
made it always prepend the VERSION.txt, even when it was already
link-stamped properly.
Updates #81
Change-Id: I6cdcff096c25d92d566ad3ac1de5771c7384daea
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
At least until js/wasm starts using browser LocalStorage or something.
But for the foreseeable future, any login from a browser should
be considered ephemeral as the tab can close at any time and lose
the wireguard key, never to be seen again.
Updates #3157
Change-Id: I6c410d86dc7f9f233c3edd623313d9dee2085aac
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Pull out the list of policy routing rules to a data structure
now shared between the add & delete paths, but to also be shared
by the netlink paths in a future change.
Updates #391
Change-Id: I119ab1c246f141d639006c808b61c585c3d67924
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
There are a few remaining uses of testing.AllocsPerRun:
Two in which we only log the number of allocations,
and one in which dynamically calculate the allocations
target based on a different AllocsPerRun run.
This also allows us to tighten the "no allocs"
test in wgengine/filter.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
testing.AllocsPerRun measures the total allocations performed
by the entire program while repeatedly executing a function f.
If some unrelated part of the rest of the program happens to
allocate a lot during that period, you end up with a test failure.
Ideally, the rest of the program would be silent while
testing.AllocsPerRun executes.
Realistically, that is often unachievable.
AllocsPerRun attempts to mitigate this by setting GOMAXPROCS to 1,
but that doesn't prevent other code from running;
it only makes it less likely.
You can also mitigate this by passing a large iteration count to
AllocsPerRun, but that is unreliable and needlessly expensive.
Unlike most of package testing, AllocsPerRun doesn't use any
toolchain magic, so we can just write a replacement.
One wild idea is to change how we count mallocs.
Instead of using runtime.MemStats, turn on memory profiling with a
memprofilerate of 1. Discard all samples from the profile whose stack
does not contain testing.AllocsPerRun. Count the remaining samples to
determine the number of mallocs.
That's fun, but overkill.
Instead, this change adds a simple API that attempts to get f to
run at least once with a target number of allocations.
This is useful when you know that f should allocate consistently.
We can then assume that any iterations with too many allocations
are probably due to one-time costs or background noise.
This suits most uses of AllocsPerRun.
Ratcheting tests tend to be significantly less flaky,
because they are biased towards success.
They can also be faster, because they can exit early,
once success has been reached.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Anybody using that one old, unreleased version of Tailscale from over
a year ago should've rebooted their machine by now to get various
non-Tailscale security updates. :)
Change-Id: If9e043cb008b20fcd6ddfd03756b3b23a9d7aeb5
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
So js/wasm clients can log in for a bit using regular Gmail/GitHub auth
without using an ephemeral key but still have their node cleaned up
when they're done.
Updates #3157
Change-Id: I49e3d14e9d355a9b8bff0ea810b0016bfe8d47f2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The image is pulled using tailscale/tailscale:latest, and can be run using tailscale/tailscale
Signed-off-by: Michael Stapelberg <michael@stapelberg.de>
Temporary measure until we switch to Go 1.18.
$ go run ./cmd/tailscale version
1.17.0-date.20211022
go version: go1.17
Updates #81
Change-Id: Ic82ebffa5f46789089e5fb9810b3f29e36a47f1a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Complete with converters to all the other types that represent a
node key today, so the new type can gradually subsume old ones.
Updates #3206
Signed-off-by: David Anderson <danderson@tailscale.com>
So future refactors can only deal with a net.Listener and
be unconcerned with their caller's (Windows-specific) struggles.
Change-Id: I0af588b9a769ab65c59b0bd21f8a0c99abfa1784
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
I'll keep ipnserver.Run for compatibility, but it'll be a wrapper
around several smaller pieces. (more testable too)
For now, start untangling some things in preparation.
Plan is to have to have a constructor for the just-exported
ipnserver.Server type that takes a LocalBackend and can
accept (in a new method) on a provided listener.
Change-Id: Ide73aadaac1a82605c97a2af1321d0d8f60b2a8c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
It's all opaque, there's no constructor, and no exported
methods, so it's useless at this point, but this is one
small refactoring step.
Change-Id: Id961e8880cf0c84f1a0a989eefff48ecb3735add
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Now that we multicast the SSDP query, we can get IGD offers from
devices other than the current device's default gateway. We don't want
to accidentally bind ourselves to those.
Updates #3197
Signed-off-by: David Anderson <danderson@tailscale.com>
So js/wasm can override where those go, without implementing
an *os.File pipe pair, etc.
Updates #3157
Change-Id: I14ba954d9f2349ff15b58796d95ecb1367e8ba3a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
And the derper change to add a CORS endpoint for latency measurement.
And a little magicsock change to cut down some log spam on js/wasm.
Updates #3157
Change-Id: I5fd9e6f5098c815116ddc8ac90cbcd0602098a48
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Otherwise random browser requests to /derp cause log spam.
Change-Id: I7bdf991d2106f0323868e651156c788a877a90d5
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
There are /etc/resolv.conf files out there where resolvconf wrote
the file but pointed to systemd-resolved as the nameserver.
We're better off handling those as systemd-resolved.
> # Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
> # DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
> # 127.0.0.53 is the systemd-resolved stub resolver.
> # run "systemd-resolve --status" to see details about the actual nameservers.
Fixes https://github.com/tailscale/tailscale/issues/3026
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
In some containers, /etc/resolv.conf is a bind-mount from outside the container.
This prevents renaming to or from /etc/resolv.conf, because it's on a different
filesystem from linux's perspective. It also prevents removing /etc/resolv.conf,
because doing so would break the bind-mount.
If we find ourselves within this environment, fall back to using copy+delete when
renaming to /etc/resolv.conf, and copy+truncate when renaming from /etc/resolv.conf.
Fixes#3000
Co-authored-by: Denton Gentry <dgentry@tailscale.com>
Signed-off-by: David Anderson <danderson@tailscale.com>
Just something I ran across while debugging an unrelated failure. This
is not in response to any bug/issue.
Signed-off-by: Maisem Ali <maisem@tailscale.com>
Be DERP-only for now. (WebRTC can come later :))
Updates #3157
Change-Id: I56ebb3d914e37e8f4ab651306fd705b817ca381c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Now that peerMap tracks the set of nodes for a DiscoKey.
Updates #3088
Change-Id: I927bf2bdfd2b8126475f6b6acc44bc799fcb489f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
utils/winutil/vss contains just enough COM wrapping to query the Volume Shadow Copy service for snapshots.
WalkSnapshotsForLegacyStateDir is the friendlier interface that adds awareness of our actual use case,
mapping the snapshots and locating our legacy state directory.
Updates #3011
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
Moving this information into a centralized place so that it is accessible to
code in subsequent commits.
Updates #3011
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
Continuation of 2aa5df7ac1, remove nil
check because it can never be nil. (It previously was able to be nil.)
Change-Id: I59cd9ad611dbdcbfba680ed9b22e841b00c9d5e6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This adds new fields (currently unused) to discoInfo to track what the
last verified (unambiguous) NodeKey a DiscoKey last mapped to, and
when.
Then on CallMeMaybe, Pong and on most Pings, we update the mapping
from DiscoKey to the current NodeKey for that DiscoKey.
Updates #3088
Change-Id: Idc4261972084dec71cf8ec7f9861fb9178eb0a4d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This lets clients quickly (sub-millisecond within a local LAN) map
from an ambiguous disco key to a node key without waiting for a
CallMeMaybe (over relatively high latency DERP).
Updates #3088
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The "go generate" command blindly looks for "//go:generate" anywhere
in the file regardless of whether it is truly a comment.
Prevent this false positive in cloner.go by mangling the string
to look less like "//go:generate".
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
https://github.com/tailscale/tailscale/pull/3014 added a
rebind on STUN failure, which means there can now be a
tailscale.com/wgengine/magicsock.(*RebindingUDPConn).ReadFromNetaddr
in progress at the end of the test waiting for a STUN
response which will never arrive.
This causes a test flake due to the resource leak in those
cases where the Conn decided to rebind. For whatever reason,
it mostly flakes with Windows.
If the Conn is closed, don't Rebind after a send error.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
Renames only; continuation of earlier 8049063d35
These kept confusing me while working on #3088
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The one remaining caller of peerMap.endpointForDiscoKey was making the
improper assumption that there's exactly 1 node with a given DiscoKey
in the network. That was the cause of #3088.
Now that all the other callers have been updated to not use
endpointForDiscoKey, there's no need to try to keep maintaining that
prone-to-misuse index.
Updates #3088
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
A DiscoKey maps 1:n to endpoints. When we get a disco pong, we don't
necessarily know which endpoint sent it to us. Ask them all. There
will only usually be 1 (and in rare circumstances 2). So it's easier
to ask all two rather than building new maps from the random ping TxID
to its endpoint.
Updates #3088
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We can reply to a ping without knowing which exact node it's from. As
long as it's in our netmap, it's safe to reply. If there's more than
one node with that discokey, it doesn't matter who we're relpying to.
Updates #3088
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
As more prep for removing the false assumption that you're able to
map from DiscoKey to a single peer, move the lastPingFrom and lastPingTime
fields from the endpoint type to a new discoInfo type, effectively upgrading
the old sharedDiscoKey map (which only held a *[32]byte nacl precomputed key
as its value) to discoInfo which then includes that naclbox key.
Then start plumbing it into handlePing in prep for removing the need
for handlePing to take an endpoint parameter.
Updates #3088
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The pass just after in this method handles cleaning up sharedDiscoKey.
No need to do it wrong (assuming DiscoKey => 1 node) earlier.
Updates #3088
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
It's not valid to assume that a discokey is globally unique.
This removes the first two of the four callers.
Updates #3088
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Keep the now-redundant github.ref branch check for
the future, in case we want to change the policy for main vs
release-branch again later. Save somebody the YAML debugging
time.
Emit a go:generate pragma with the full set of flags passed to cloner.
This allows the user to simply run "go generate" at the location
of the generate file to reproduce the file.
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
From https://github.com/tailscale/tailscale/pull/1919 with
edits by bradfitz@.
This change introduces a new storage provider for the state file. It
allows users to leverage AWS SSM parameter store natively within
tailscaled, like:
$ tailscaled --state=arn:aws:ssm:eu-west-1:123456789:parameter/foo
Known limitations:
- it is not currently possible to specific a custom KMS key ID
RELNOTE=tailscaled on Linux supports using AWS SSM for state
Edits-By: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: Maxime VISONNEAU <maxime.visonneau@gmail.com>
Also shorten "[FR]:" to "FR:" to save precious subject line space.
I don't mind a prefix to distinguish feature requests, but the majority
of cases are bugs. Let's preserve as many chars as possible for the
specific topic when looking at subject lines in gmail.
(Now, if only it wouldn't include [tailscale/tailscale] on every
message...)
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
When a DNS server claims to be unable or unwilling to handle a request,
instead of passing that refusal along to the client, just treat it as
any other error trying to connect to the DNS server. This prevents DNS
requests from failing based on if a server can respond with a transient
error before another server is able to give an actual response. DNS
requests only failing *sometimes* is really hard to find the cause of
(#1033).
Signed-off-by: Smitty <me@smitop.com>
We added the initial handling only for macOS and iOS.
With 1.16.0 now released, suppress forwarding DNS-SD
on all platforms to test it through the 1.17.x cycle.
Updates #2442
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
On iOS (and possibly other platforms), sometimes our UDP socket would
get stuck in a state where it was bound to an invalid interface (or no
interface) after a network reconfiguration. We can detect this by
actually checking the error codes from sending our STUN packets.
If we completely fail to send any STUN packets, we know something is
very broken. So on the next STUN attempt, let's rebind the UDP socket
to try to correct any problems.
This fixes a problem where iOS would sometimes get stuck using DERP
instead of direct connections until the backend was restarted.
Fixes#2994
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
This feature wasn't working until I realized that we also need to opt into
the events. MSDN wasn't so generous as to make this easy to deduce.
Updates #2956
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
A couple of gnarly assumptions in this code, as always with the async
message thing.
UI button is based on the DNS settings in the admin panel.
Co-authored-by: Maisem Ali <maisem@tailscale.com>
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
iOS and Android no longer use these. They both now (as of today)
use the hostinfo.SetFoo setters instead.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Turns out the iOS client has been only sending the OS version it first
started at. This whole hostinfo-via-prefs mechanism was never a good idea.
Start removing it.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This config update will let tailscale use bencher without worrying about the bencher check appearing as failed due to a benchmark regressing.
Updates #2938
Signed-off-by: Nathan Dias <nathan@orijtech.com>
I forgot to include this file in the earlier
7cf8ec8108 commit.
This exists purely to keep "go mod tidy" happy.
Updates #1609
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Lot of people have been hitting this.
Now it says:
$ tailscale cert tsdev.corp.ts.net
Access denied: cert access denied
Use 'sudo tailscale cert' or 'tailscale up --operator=$USER' to not require root.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We still try the host's x509 roots first, but if that fails (like if
the host is old), we fall back to using LetsEncrypt's root and
retrying with that.
tlsdial was used in the three main places: logs, control, DERP. But it
was missing in dnsfallback. So added it there too, so we can run fine
now on a machine with no DNS config and no root CAs configured.
Also, move SSLKEYLOGFILE support out of DERP. tlsdial is the logical place
for that support.
Fixes#1609
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
DNSSEC is an availability issue, as recently demonstrated by the
Slack issue, with limited security advantage. DoH on the other hand
is a critical security upgrade. This change adds DoH support for the
non-DNSSEC endpoints of Quad9.
https://www.quad9.net/service/service-addresses-and-features#unsec
Signed-off-by: Filippo Valsorda <hi@filippo.io>
It was in the wrong filter direction before, per CPU profiles
we now have.
Updates #1526 (maybe fixes? time will tell)
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The old name invited confusion:
* is this the HTTP proxy to use ourselves? (no, that's
via an environment variable, per proxy conventions)
* is this for LetsEncrypt https-to-localhost-http
proxying? (no, that'll come later)
So rename to super verbose --outbound-http-proxy-listen
before the 1.16.0 release to make it clear what it is.
It listens (serves) and it's for outbound, not inbound.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
For the service, all we need to do is handle the `svc.SessionChange` command.
Upon receipt of a `windows.WTS_SESSION_UNLOCK` event, we fire off a goroutine to flush the DNS cache.
(Windows expects responses to service requests to be quick, so we don't want to do that synchronously.)
This is gated on an integral registry value named `FlushDNSOnSessionUnlock`,
whose value we obtain during service initialization.
(See [this link](https://docs.microsoft.com/en-us/windows/win32/api/winsvc/nc-winsvc-lphandler_function_ex) for information re: handling `SERVICE_CONTROL_SESSIONCHANGE`.)
Fixes#2956
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
This helper allows us to retrieve `DWORD` and `QWORD` values from the Tailscale key in the Windows registry.
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
This adds support for tailscaled to be an HTTP proxy server.
It shares the same backend dialing code as the SOCK5 server, but the
client protocol is HTTP (including CONNECT), rather than SOCKS.
Fixes#2289
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This fixes "tailscale cert" on Synology where the var directory is
typically like /volume2/@appdata/Tailscale, or any other tailscaled
user who specifies a non-standard state file location.
This is a interim fix on the way to #2932.
Fixes#2927
Updates #2932
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
In a56520c3c7 dependabot attempted to bump
the setup-go action version. It appears to work for most builders, but
not the self-hosted VM builder. Revert for now.
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
We unconditionally set appropriate perms on the statefile dir.
We look at the basename of the statefile dir, and if it is "tailscale", then
we set perms as appropriate.
Fixes#2925
Updates #2856
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
Because the macOS CLI runs in the sandbox, including the filesystem,
so users would be confused that -cpu-profile=prof.cpu succeeds but doesn't
write to their current directory, but rather in some random Library/Containers
directory somewhere on the machine (which varies depending on the Mac build
type: App Store vs System Extension)
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This was already possible on Linux if you ran tailscaled with --debug
(which runs net/http/pprof), but it requires the user have the Go
toolchain around.
Also, it wasn't possible on macOS, as there's no way to run the IPNExtension
with a debug server (it doesn't run tailscaled).
And on Windows it's super tedious: beyond what users want to do or
what we want to explain.
Instead, put it in "tailscale debug" so it works and works the same on
all platforms. Then we can ask users to run it when we're debugging something
and they can email us the output files.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
pfSense stores its SSL certificate and key in the PHP config.
We wrote PHP code to pull the two out of the PHP config and
into environment variables before running "tailscale web".
The pfSense web UI is served over https, we need "tailscale web"
to also support https in order to put it in an <iframe>.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
There are two reasons this can't ever go to actual logs,
but rewrite it to make it happy.
Fixestailscale/corp#2695
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
ProgramData has a permissive ACL. For us to safely store machine-wide
state information, we must set a more restrictive ACL on our state directory.
We set the ACL so that only talescaled's user (ie, LocalSystem) and the
Administrators group may access our directory.
We must include Administrators to ensure that logs continue to be easily
accessible; omitting that group would force users to use special tools to
log in interactively as LocalSystem, which is not ideal.
(Note that the ACL we apply matches the ACL that was used for LocalSystem's
AppData\Local).
There are two cases where we need to reset perms: One is during migration
from the old location to the new. The second case is for clean installations
where we are creating the file store for the first time.
Updates #2856
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
tailscale-ipn.exe (the GUI) shouldn't use C:\ProgramData.
Also, migrate the earlier misnamed wg32/wg64 conf files if they're present.
(That was stopped in 2db877caa3, but the
files exist from fresh 1.14 installs)
Updates #2856
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Windows has a public dns.Flush used in router_windows.go.
However that won't work for platforms like Linux, where
we need a different flush mechanism for resolved versus
other implementations.
We're instead adding a FlushCaches method to the dns Manager,
which can be made to work on all platforms as needed.
Fixes https://github.com/tailscale/tailscale/issues/2132
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
C:\WINDOWS\system32\config\systemprofile\AppData\Local\
is frequently cleared for almost any reason: Windows updates,
System Restore, even various System Cleaner utilities.
The server-state.conf file in AppData\Local could be deleted
at any time, which would break login until the node is removed
from the Admin Panel allowing it to create a new key.
Carefully copy any AppData state to ProgramData at startup.
If copying the state fails, continue to use AppData so at
least there will be connectivity. If there is no state,
use ProgramData.
We also migrate the log.conf file. Very old versions of
Tailscale named the EXE tailscale-ipn, so the log conf was
tailscale-ipn.log.conf and more recent versions preserved
this filename and cmdName in logs. In this migration we
always update the filename to
c:\ProgramData\Tailscale\tailscaled.log.conf
Updates https://github.com/tailscale/tailscale/issues/2856
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
So if the control plane knows that something's broken about the node, it can
include problem(s) in MapResponse and "tailscale status" will show it.
(and GUIs in the future, as it's in ipnstate.Status/JSON)
This also bumps the MapRequest.Version, though it's not strictly
required. Doesn't hurt.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The fully qualified name of the type is thisPkg.tname,
so write the args like that too.
Suggested-by: Joe Tsai <joetsai@digital-static.net>
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
And in the process, fix a bug:
The fmt formatting was being applied by writef,
not fmt.Sprintf, thus emitting a MISSING string.
And there's no guarantee that fmt will be imported
in the generated code.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Change from a single-case type switch to a type assertion
with an early return.
That exposes that the name arg to gen is unneeded.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
This is a package for shared utilities used in doing codegen programs.
The inaugural API is for writing gofmt'd code to a file.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
LocalBackend.Shutdown's docs say:
> The backend can no longer be used after Shutdown returns.
Nevertheless, TestStateMachine blithely calls Shutdown, talks some smack,
and continues on, expecting things to work. Other uses of Shutdown
in the codebase are as intended.
Things mostly kinda work anyway, except that the wgengine.Engine has been
shut down, so calls to Reconfig fail. Those get logged:
> local.go:603: wgengine status error: engine closing; no status
but otherwise ignored.
However, the Reconfig failure caused one fewer call to pause/unpause
than normal. Now the assertCalls lines match the equivalent ones
earlier in the test.
I don't see an obvious correct replacement for Shutdown in the context
of this test; I'm not sure entirely what it is trying to accomplish.
It is possible that many of the tests remaining after the prior call
to Shutdown are now extraneous. They don't harm anything, though,
so err on the side of safety and leave them for now.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Use helpers and variadic functions to make the call sites
a lot easier to read, since they occur a lot.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Concurrent calls to LocalBackend.setWgengineStatus
could result in some of the status updates being dropped.
This was exacerbated by 92077ae78c,
which increases the probability of concurrent status updates,
causing test failures (tailscale/corp#2579).
It's going to take a bit of work to fix this test.
The ipnlocal state machine is difficult to reason about,
particularly in the face of concurrency.
We could fix the test trivially by throwing a new mutex around
setWgengineStatus to serialize calls to it,
but I'd like to at least try to do better than cosmetics.
In the meantime, commit the test.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
We don't want to force ourselves to update the DERP list
every time we want to cut a new release.
Having an outdated DERP list on release branches is OK.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Spelling out the command to run for every type
means that changing the command makes for a large, repetitive diff.
Stop doing that.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
These "weird" port lines show up in logs frequently.
They're the result of uninteresting races,
and they're not actionable. Remove the noise.
Remove the isLoopbackAddr case to placate staticcheck.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
On about 1 out of 500 runs, TestSendFreeze failed:
derp_test.go:416: bob: unexpected message type derp.PeerGoneMessage
Closing alice before bob created a race.
If bob closed promptly, the test passed.
If bob closed slowly, and alice's disappearance caused
bob to receive a PeerGoneMessage before closing, the test failed.
Deflake the test by closing bob first.
With this fix, the test passed 12,000 times locally.
Fixes#2668
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Real goal is to eliminate some allocs in the STUN path, but that requires
work in the standard library.
See comments in #2783.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Avoid splitting fields in the common case. Field splitting was 84% of
the overall CPU.
name old time/op new time/op delta
ParsePorts-6 33.3ms ± 2% 6.3ms ± 4% -80.97% (p=0.000 n=9+10)
name old alloc/op new alloc/op delta
ParsePorts-6 520B ±79% 408B ± 0% -21.49% (p=0.046 n=10+8)
name old allocs/op new allocs/op delta
ParsePorts-6 7.00 ± 0% 7.00 ± 0% ~ (all equal)
Updates tailscale/corp#2566
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Notably, it no longer allocates proportional to the number of open
sockets on the machine. Any alloc reduction numbers are a little
contrived with such a reduction but e.g. on a machine with 50,000
connections open:
name old time/op new time/op delta
ParsePorts-6 57.7ms ± 6% 32.8ms ± 3% -43.04% (p=0.000 n=9+10)
name old alloc/op new alloc/op delta
ParsePorts-6 24.0MB ± 0% 0.0MB ± 0% -100.00% (p=0.000 n=10+9)
name old allocs/op new allocs/op delta
ParsePorts-6 100k ± 0% 0k ± 0% -99.99% (p=0.000 n=10+10)
Updates tailscale/corp#2566
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The earlier 382b349c54 was too late,
as engine creation itself needed to listen on things.
Fixes#2827
Updates #2822
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Updates #2781 (might even fix it, but its real issue is that
SetPrivateKey starts a ReSTUN goroutines which then logs, and
that bug and data race existed prior to MemLogger existing)
Add a mode control for derp server, and add a "manual" mode
to get derp server certificate. Under manual mode, certificate
is searched in the directory given by "--cert-dir". Certificate
should in PEM format, and use "hostname.{key,crt}" as filename.
If no hostname is used, search by the hostname given for listen.
Fixes#2794
Signed-off-by: SilverBut <SilverBut@users.noreply.github.com>
In prep for other bug fixes & tests. It's hard to test when it was
intermingled into LocalBackend.authReconfig.
Now it's a pure function.
And rename variable 'uc' (user config?) to the since idiomatic
'prefs'.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We currently plumb full URLs for DNS resolvers from the control server
down to the client. But when we pass the values into the net/dns
package, we throw away any URL that isn't a bare IP. This commit
continues the plumbing, and gets the URL all the way to the built in
forwarder. (It stops before plumbing URLs into the OS configurations
that can handle them.)
For #2596
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
And in the process, fix the related confusing error messages from
pinging your own IP or hostname.
Fixes#2803
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
AFAICT this was always present, the log read mid-execution was never safe.
But it seems like the recent magicsock refactoring made the race much
more likely.
Signed-off-by: David Anderson <danderson@tailscale.com>
Reported on IRC: in an edge case, you can end up with a directManager DNS
manager and --accept-dns=false, in which case we should do nothing, but
actually end up restarting resolved whenever the netmap changes, even though
the user told us to not manage DNS.
Signed-off-by: David Anderson <danderson@tailscale.com>
Reported on IRC: a resolv.conf that contained two entries for
"nameserver 127.0.0.53", which defeated our "is resolved actually
in charge" check. Relax that check to allow any number of nameservers,
as long as they're all 127.0.0.53.
Signed-off-by: David Anderson <danderson@tailscale.com>
* Revert "Revert "types/key: add MachinePrivate and MachinePublic.""
This reverts commit 61c3b98a24.
Signed-off-by: David Anderson <danderson@tailscale.com>
* types/key: add ControlPrivate, with custom serialization.
ControlPrivate is just a MachinePrivate that serializes differently
in JSON, to be compatible with how the Tailscale control plane
historically serialized its private key.
Signed-off-by: David Anderson <danderson@tailscale.com>
Plumb throughout the codebase as a replacement for the mixed use of
tailcfg.MachineKey and wgkey.Private/Public.
Signed-off-by: David Anderson <danderson@tailscale.com>
Our code is not vulnerable to the issue in question: it only happens in the decompression
path for untrusted inputs, and we only use xz as part of mkpkg, which is write-only
and operates on trusted build system outputs to construct deb and rpm packages.
Still, it's nice to keep the dependabot dashboard clean.
Signed-off-by: David Anderson <danderson@tailscale.com>
cmd/derper: listen on host of flag server addr for port 80 and 3478
When using custom derp on the server with multiple IP addresses,
we would like to bind derp 80, 443 and stun 3478 to a certain IP.
derp command provides flag `-a` to customize which address to bind
for port 443. But port :80 and :3478 were hard-coded.
Fixes#2767
Signed-off-by: Li Chuangbo <im@chuangbo.li>
I have seen this once in the VM test (caused by an EOF, I believe on
shutdown) that didn't need to cause the test to fail.
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
And add health check errors to ipnstate.Status (tailscale status --json).
Updates #2746
Updates #2775
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
It was useful early in development when disco clients were the
exception and tailscale logs were noisier than today, but now
non-disco is the exception.
Updates #2752
Signed-off-by: David Anderson <danderson@tailscale.com>
Having removed magicconn.Start, there's no need to synchronize startup
of other things to it any more.
Signed-off-by: David Anderson <danderson@tailscale.com>
Over time, other magicsock refactors have made Start effectively a
no-op, except that some other functions choose to panic if called
before Start.
Signed-off-by: David Anderson <danderson@tailscale.com>
The tests build fine on other Unix's, they just can't run there.
But there is already a t.Skip by default, so `go test` ends up
working fine elsewhere and checks the code compiles.
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
At "Starting", the DERP connection isn't yet up. After the first netmap
and DERP connect, then it transitions into "Running".
Fixes#2708
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
So people can use the package for whois checks etc without version
skew errors.
The earlier change faa891c1f2 for #1905
was a bit too aggressive.
Fixes#2757
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This uses a neat little tool to dump the output of DNS queries to
standard out. This is the first end-to-end test of DNS that runs against
actual linux systems. The /etc/resolv.conf test may look superflous,
however this will help for correlating system state if one of the DNS
tests fails.
Signed-off-by: Christine Dodrill <xe@tailscale.com>
A public key should only have max one connection to a given
DERP node (or really: one connection to a node in a region).
But if people clone their machine keys (e.g. clone their VM, Raspbery
Pi SD card, etc), then we can get into a situation where a public key
is connected multiple times.
Originally, the DERP server handled this by just kicking out a prior
connections whenever a new one came. But this led to reconnect fights
where 2+ nodes were in hard loops trying to reconnect and kicking out
their peer.
Then a909d37a59 tried to add rate
limiting to how often that dup-kicking can happen, but empirically it
just doesn't work and ~leaks a bunch of goroutines and TCP
connections, tying them up for hour+ while more and more accumulate
and waste memory. Mostly because we were doing a time.Sleep forever
while not reading from their TCP connections.
Instead, just accept multiple connections per public key but track
which is the most recent. And if two both are writing back & forth,
then optionally disable them both. That last part is only enabled in
tests for now. The current default policy is just last-sender-wins
while we gather the next round of stats.
Updates #2751
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Fix a few test printing issues when tests fail.
Qemu console output is super useful when something is wrong in the
harness and we cannot even bring up the tests.
Also useful for figuring out where all the time goes in tests.
A little noisy, but not too noisy as long as you're only running one VM
as part of the tests, which is my plan.
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
Also remove extra distros for now.
We can bring them back later if useful.
Though our most important distros are these two Ubuntu, debian stable,
and Raspbian (not currently supported).
And before doing more Linux, we should do Windows.
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
We were returning an error almost, but not quite like errConnClosed in
a single codepath, which could still trip the panic on reconfig in the
test logic.
Signed-off-by: David Anderson <danderson@tailscale.com>
Our prod code doesn't eagerly handshake, because our disco layer enables
on-demand handshaking. Configuring both peers to eagerly handshake leads
to WireGuard handshake races that make TestTwoDevicePing flaky.
Signed-off-by: David Anderson <danderson@tailscale.com>
It only existed to override one test-only behavior with a
different test-only behavior, in both cases working around
an annoying feature of our CI environments. Instead, handle
that weirdness entirely in the test code, with a tweaked
TestOnlyPacketListener that gets injected.
Signed-off-by: David Anderson <danderson@tailscale.com>
The docstring said it was meant for use in tests, but it's specifically a
special codepath that is _only_ used in tests, so make the claim stronger.
Signed-off-by: David Anderson <danderson@tailscale.com>
Instead of using the legacy codepath, teach discoEndpoint to handle
peers that have a home DERP, but no disco key. We can still communicate
with them, but only over DERP.
Signed-off-by: David Anderson <danderson@tailscale.com>
Unfortunately this test fails on certain architectures.
The problem comes down to inconsistencies in the Go escape analysis
where specific variables are marked as escaping on certain architectures.
The variables escaping to the heap are unfortunately in crypto/sha256,
which makes it impossible to fixthis locally in deephash.
For now, fix the test by compensating for the allocations that
occur from calling sha256.digest.Sum.
See golang/go#48055
Fixes#2727
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
This test is highly dependent on the accuracy of OS timers.
Reduce the number of failures by decreasing the required
accuracy from 0.999 to 0.995.
Also, switch from repeated time.Sleep to using a time.Ticker
for improved accuracy.
Updates #2727
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
The VM test has two tailscaled instances running and interleaves the
logs. Without a prefix it is impossible to figure out what is going on.
It might be even better to include the [ABCD] node prefix here as well.
Unfortunately lots of interesting logs happen before tailscaled has a
node key, so it wouldn't be a replacement for a short ID.
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
By default httptest listens only on the loopback adapter.
Instead, listen on the IP the user asked for.
The VM test needs this, as it wants to start DERP and STUN
servers on the host that can be reached by guest VMs.
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
* The right web address for configuring API keys seems to have changed
* Minor clarification on how basic authentication works (it's illustrated in the examples later, but can't hurt to be precise)
Signed-off-by: William Lachance <wlach@protonmail.com>
Currently we do not set the env variables for `go list ./...` resulting
in errors like
```
build constraints exclude all Go files in
/home/runner/work/tailscale/tailscale/chirp
```
Signed-off-by: Maisem Ali <maisem@tailscale.com>
It wasn't using the right metric. Apparently you're supposed to sum the route
metric and interface metric. Whoops.
While here, optimize a few little things too, not that this code
should be too hot.
Fixes#2707 (at least; probably dups but I'm failing to find)
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
To be scraped in the Go expvar JSON format, as a string is involved.
For a future tool to record when processes restarted exactly, and at
what version.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
If a peer is connected to multiple nodes in a region (so
multiForwarder is in use) and then a node restarts and re-sends all
its additions, this bug about whether an element is in the
multiForwarder could cause a one-time flip in the which peer node we
forward to. Note a huge deal, but not written as intended.
Thanks to @lewgun for the bug report in #2141.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This log is quite verbose, it was only to be left in for one
unstable build to help debug a user issue.
This reverts commit 1dd2552032.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
This is useful for manual performance testing
of networks with many nodes.
I imagine it'll grow more knobs over time.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Intended to help in resolving customer issue with
DNS caching.
We currently exec `ipconfig /flushdns` from two
places:
- SetDNS(), which logs before invoking
- here in router_windows, which doesn't
We'd like to see a positive indication in logs that flushdns
is being run.
As this log is expected to be spammy, it is proposed to
leave this in just long enough to do an unstable 1.13.x build
and then revert it. They won't run an unsigned image that
I build.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
The number of peers we have will be pretty stable across time.
Allocate roughly the right slice size.
This reduces memory usage when there are many peers.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Two optimizations.
Use values instead of pointers.
We were using pointers to make track the "peer in progress" easier.
It's not too hard to do it manually, though.
Make two passes through the data, so that we can size our
return value accurately from the beginning.
This is cheap enough compared to the allocation,
which grows linearly in the number of peers,
that it is worth doing.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The netmaps can get really large.
Printing, processing, and uploading them is expensive.
Only print the header on an ongoing basis.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The number of packet filters can grow very large,
so this log entry can be very large.
We can get the packet filter server-side,
so reduce verbosity here to just the number of filters present.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The code goes to some effort to send a single JSON object
when there's only a single line and a JSON array when there
are multiple lines.
It makes the code more complex and more expensive;
when we add a second line, we have to use a second buffer
to duplicate the first one after adding a leading square brackets.
The savings come to two bytes. Instead, always send an array.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Scanning log lines is a frequent source of allocations.
Pre-allocate a re-usable buffer.
This still doesn't help when there are giant log lines.
Those will still be problematic from an iOS memory perspective.
For more on that, see https://github.com/tailscale/corp/issues/2423.
(For those who cannot follow that link, it is a discussion
of particular problematic types of log lines for
particular categories of customers. The "categories of customers"
part is the reason that it is a private issue.)
There is also a latent bug here. If we ever encounter
a log line longer than bufio.MaxScanTokenSize,
then bufio.Scan will return an error,
and we'll truncate the file and discard the rest of the log.
That's not good, but bufio.MaxScanTokenSize is really big,
so it probably doesn't matter much in practice now.
Unfortunately, it does prevent us from easily capping the potential
memory usage here, on pain of losing log entries.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Prior to Go 1.16, iOS used GOOS=darwin,
so we had to distinguish macOS from iOS during GOARCH.
We now require Go 1.16 in our go.mod, so we can simplify.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Now that we have the easier-to-parse go:build build tags,
it is straightforward to simplify them. Yay.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Mostly so the Linux one can use Linux-specific stuff in package
syscall and not use os/exec for uname for portability.
But also it helps deps a tiny bit on iOS.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Not even close to usable or well integrated yet, but submitting this before
it bitrots or I lose it.
Updates #1235
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This logs some basic statistics for UPnP, so that tailscale can better understand what routers
are being used and how to connect to them.
Signed-off-by: julianknodt <julianknodt@gmail.com>
This adds a PCP test to the IGD test server, by hardcoding in a few observed packets from
Denton's box.
Signed-off-by: julianknodt <julianknodt@gmail.com>
We want to use tsweb to format Prometheus-style metrics from
our temporary golang.org/x/net/http2 fork, but we don't want http2
to depend on the tailscale.com module to use the concrete type
tailscale.com/metrics.LabelMap. Instead, let a expvar.Map be used
instead of it's annotated sufficiently in its name.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
rsc.io/goversion is really expensive.
Running version.ReadExe on tailscaled on darwin
allocates 47k objects, almost 11mb.
All we want is the module info. For that, all we need to do
is scan through the binary looking for the magic start/end strings
and then grab the bytes in between them.
We can do that easily and quickly with nothing but a 64k buffer.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
And use dynamic port numbers in tests, as Linux on GitHub Actions and
Windows in general have things running on these ports.
Co-Author: Julian Knodt <julianknodt@gmail.com>
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Previously, we hashed the question and combined it with the original
txid which was useful when concurrent queries were multiplexed on a
single local source port. We encountered some situations where the DNS
server canonicalizes the question in the response (uppercase converted
to lowercase in this case), which resulted in responses that we couldn't
match to the original request due to hash mismatches. This includes a
new test to cover that situation.
Fixes#2597
Signed-off-by: Adrian Dewhurst <adrian@tailscale.com>
Before we didn't detect it properly. Since Oracle Linux is diet centos,
we can just make the centos logic detect Oracle linux and everything
should be fine.
Signed-off-by: Christine Dodrill <xe@tailscale.com>
PCP handles external IPs by allowing the client to specify them in the packet, which is more
explicit than requiring 2 packets from PMP, so allow for future changes to add it in easily.
Signed-off-by: julianknodt <julianknodt@gmail.com>
Still very much a prototype (hard-coded IPs, etc) but should be
non-invasive enough to submit at this point and iterate from here.
Updates #2589
Co-Author: David Crawshaw <crawshaw@tailscale.com>
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Prior to Tailscale 1.12 it detected UPnP on any port.
Starting with Tailscale 1.11.x, it stopped detecting UPnP on all ports.
Then start plumbing its discovered Location header port number to the
code that was assuming port 5000.
Fixes#2109
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This was the proximate cause of #2579.
#2582 is a deeper fix, but this will remain
as a footgun, so may as well fix it too.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The index for every struct field or slice element and
the number of fields for the struct is unncessary.
The hashing of Go values is unambiguous because every type (except maps)
encodes in a parsable manner. So long as we know the type information,
we could theoretically decode every value (except for maps).
At a high level:
* numbers are encoded as fixed-width records according to precision.
* strings (and AppendTo output) are encoded with a fixed-width length,
followed by the contents of the buffer.
* slices are prefixed by a fixed-width length, followed by the encoding
of each value. So long as we know the type of each element, we could
theoretically decode each element.
* arrays are encoded just like slices, but elide the length
since it is determined from the Go type.
* maps are encoded first with a byte indicating whether it is a cycle.
If a cycle, it is followed by a fixed-width index for the pointer,
otherwise followed by the SHA-256 hash of its contents. The encoding of maps
is not decodeable, but a SHA-256 hash is sufficient to avoid ambiguities.
* interfaces are encoded first with a byte indicating whether it is nil.
If not nil, it is followed by a fixed-width index for the type,
and then the encoding for the underlying value. Having the type be encoded
first ensures that the value could theoretically be decoded next.
* pointers are encoded first with a byte indicating whether it is
1) nil, 2) a cycle, or 3) newly seen. If a cycle, it is followed by
a fixed-width index for the pointer. If newly seen, it is followed by
the encoding for the pointed-at value.
Removing unnecessary details speeds up hashing:
name old time/op new time/op delta
Hash-8 76.0µs ± 1% 55.8µs ± 2% -26.62% (p=0.000 n=10+10)
HashMapAcyclic-8 61.9µs ± 0% 62.0µs ± 0% ~ (p=0.666 n=9+9)
TailcfgNode-8 10.2µs ± 1% 7.5µs ± 1% -26.90% (p=0.000 n=10+9)
HashArray-8 1.07µs ± 1% 0.70µs ± 1% -34.67% (p=0.000 n=10+9)
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
For testing pfSense clients "behind" pfSense on Digital Ocean where
the main interface still exists. This is easier for debugging.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Instead of hashing the humanly formatted forms of a number,
hash the native machine bits of the integers themselves.
There is a small performance gain for this:
name old time/op new time/op delta
Hash-8 75.7µs ± 1% 76.0µs ± 2% ~ (p=0.315 n=10+9)
HashMapAcyclic-8 63.1µs ± 3% 61.3µs ± 1% -2.77% (p=0.000 n=10+10)
TailcfgNode-8 10.3µs ± 1% 10.2µs ± 1% -1.48% (p=0.000 n=10+10)
HashArray-8 1.07µs ± 1% 1.05µs ± 1% -1.79% (p=0.000 n=10+10)
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
The swapping of bufio.Writer between hasher and mapHasher is subtle.
Just embed a hasher in mapHasher to avoid complexity here.
No notable change in performance:
name old time/op new time/op delta
Hash-8 76.7µs ± 1% 77.0µs ± 1% ~ (p=0.182 n=9+10)
HashMapAcyclic-8 62.4µs ± 1% 62.5µs ± 1% ~ (p=0.315 n=10+9)
TailcfgNode-8 10.3µs ± 1% 10.3µs ± 1% -0.62% (p=0.004 n=10+9)
HashArray-8 1.07µs ± 1% 1.06µs ± 1% -0.98% (p=0.001 n=8+9)
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
This is a simplified rate limiter geared for exactly our needs:
A fast, mono.Time-based rate limiter for use in tstun.
It was generated by stripping down the x/time/rate rate limiter
to just our needs and switching it to use mono.Time.
It removes one time.Now call per packet.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
magicsock makes multiple calls to Now per packet.
Move to mono.Now. Changing some of the calls to
use package mono has a cascading effect,
causing non-per-packet call sites to also switch.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
There's a call to Now once per packet.
Move to mono.Now.
Though the current implementation provides high precision,
we document it to be coarse, to preserve the ability
to switch to a coarse monotonic time later.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Package mono provides a fast monotonic time.
Its primary advantage is that it is fast:
It is approximately twice as fast as time.Now.
This is because time.Now uses two clock calls,
one for wall time and one for monotonic time.
We ask for the current time 4-6 times per network packet.
At ~50ns per call to time.Now, that's enough to show
up in CPU profiles.
Package mono is a first step towards addressing that.
It is designed to be a near drop-in replacement for package time.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Go 1.17 switches to a register ABI on amd64 platforms.
Part of that switch is that go and defer calls use an argument-less
closure, which allocates. This means that we have an extra
alloc in some DNS work. That's unfortunate but not a showstopper,
and I don't see a clear path to fixing it.
The other performance benefits from the register ABI will all
but certainly outweigh this extra alloc.
Fixes#2545
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The kr/pty module moved to creack/pty per the kr/pty README[1].
creack/pty brings in support for a number of OS/arch combos that
are lacking in kr/pty.
Run `go mod tidy` while here.
[1] https://github.com/kr/pty/blob/master/README.md
Signed-off-by: Aaron Bieber <aaron@bolddaemon.com>
I don't know how to get access to a real packet. Basing this commit
entirely off:
+------------+--------------+------------------------------+
| Field Name | Field Type | Description |
+------------+--------------+------------------------------+
| NAME | domain name | MUST be 0 (root domain) |
| TYPE | u_int16_t | OPT (41) |
| CLASS | u_int16_t | requestor's UDP payload size |
| TTL | u_int32_t | extended RCODE and flags |
| RDLEN | u_int16_t | length of all RDATA |
| RDATA | octet stream | {attribute,value} pairs |
+------------+--------------+------------------------------+
From https://datatracker.ietf.org/doc/html/rfc6891#section-6.1.2
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
The handoff between tstun.Wrap's Read and poll methods
is one of the per-packet hotspots. It shows up in pprof.
Making outbound buffered increases throughput.
It is hard to measure exactly how much, because the numbers
are highly variable, but I'd estimate it at about 1%,
using the best observed max throughput across three runs.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The handoff between tstun.Wrap's Read and poll methods
is one of the per-packet hotspots. It shows up in pprof.
Making outbound buffered increases throughput.
It is hard to measure exactly how much, because the numbers
are highly variable, but I'd estimate it at about 1%,
using the best observed max throughput across three runs.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Tested manually with:
$ go test -v ./net/dnscache/ -dial-test=bogusplane.dev.tailscale.com:80
Where bogusplane has three A records, only one of which works.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
A previously added metric which was float64 was being ignored in tsweb, because it previously
only accepted int64 and ints. It can be handled in the same way as ints.
Signed-off-by: julianknodt <julianknodt@gmail.com>
Instead of blasting away at all upstream resolvers at the same time,
make a timing plan upon reconfiguration and have each upstream have an
associated start delay, depending on the overall forwarding config.
So now if you have two or four upstream Google or Cloudflare DNS
servers (e.g. two IPv4 and two IPv6), we now usually only send a
query, not four.
This is especially nice on iOS where we start fewer DoH queries and
thus fewer HTTP/1 requests (because we still disable HTTP/2 on iOS),
fewer sockets, fewer goroutines, and fewer associated HTTP buffers,
etc, saving overall memory burstiness.
Fixes#2436
Updates tailscale/corp#2250
Updates tailscale/corp#2238
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Add a place to hang state in a future change for #2436.
For now this just simplifies the send signature without
any functional change.
Updates #2436
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The previous algorithm used a map of all visited pointers.
The strength of this approach is that it quickly prunes any nodes
that we have ever visited before. The detriment of the approach
is that pruning is heavily dependent on the order that pointers
were visited. This is especially relevant for hashing a map
where map entries are visited in a non-deterministic manner,
which would cause the map hash to be non-deterministic
(which defeats the point of a hash).
This new algorithm uses a stack of all visited pointers,
similar to how github.com/google/go-cmp performs cycle detection.
When we visit a pointer, we push it onto the stack, and when
we leave a pointer, we pop it from the stack.
Before visiting a pointer, we first check whether the pointer exists
anywhere in the stack. If yes, then we prune the node.
The detriment of this approach is that we may hash a node more often
than before since we do not prune as aggressively.
The set of visited pointers up until any node is only the
path of nodes up to that node and not any other pointers
that may have been visited elsewhere. This provides us
deterministic hashing regardless of visit order.
We can now delete hashMapFallback and associated complexity,
which only exists because the previous approach was non-deterministic
in the presence of cycles.
This fixes a failure of the old algorithm where obviously different
values are treated as equal because the pruning was too aggresive.
See https://github.com/tailscale/tailscale/issues/2443#issuecomment-883653534
The new algorithm is slightly slower since it prunes less aggresively:
name old time/op new time/op delta
Hash-8 66.1µs ± 1% 68.8µs ± 1% +4.09% (p=0.000 n=19+19)
HashMapAcyclic-8 63.0µs ± 1% 62.5µs ± 1% -0.76% (p=0.000 n=18+19)
TailcfgNode-8 9.79µs ± 2% 9.88µs ± 1% +0.95% (p=0.000 n=19+17)
HashArray-8 643ns ± 1% 653ns ± 1% +1.64% (p=0.000 n=19+19)
However, a slower but more correct algorithm seems
more favorable than a faster but incorrect algorithm.
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
This prevents centos tests from timing out because sshd does reverse dns
lookups on every session being established instead of doing it once on
the acutal ssh connection being established. This is odd. Appending this
to the sshd config and restarting it seems to fix it though.
Signed-off-by: Christine Dodrill <xe@tailscale.com>
TCP was done in 662fbd4a09.
This does the same for UDP.
Tested by hand. Integration tests will have to come later. I'd wanted
to do it in this commit, but the SOCKS5 server needed for interop
testing between two userspace nodes doesn't yet support UDP and I
didn't want to invent some whole new userspace packet injection
interface at this point, as SOCKS seems like a better route, but
that's its own bug.
Fixes#2302
RELNOTE=netstack mode can now UDP relay to subnets
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
A Go interface may hold any number of different concrete types.
Just because two underlying values hash to the same thing
does not mean the two values are identical if they have different
concrete types. As such, include the type in the hash.
Previously, this was incorrectly returning the internal port, and using that with the external
exposed IP when it did not use WANIPConnection2. In the case when we must provide a port, we
return it instead.
Noticed this while implementing the integration test for upnp.
Signed-off-by: julianknodt <julianknodt@gmail.com>
Seed the hash upon first use with the current time.
This ensures that the stability of the hash is bounded within
the lifetime of one program execution.
Hopefully, this prevents future bugs where someone assumes that
this hash is stable.
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
Filch doesn't like having multiple processes competing
for the same log files (#937).
Parallel integration tests were all using the same log files.
Add a TS_LOGS_DIR env var that the integration test can use
to use separate log files per test.
Fixes#2269
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Prep for #1591 which will need to make Linux's router react to changes
that the link monitor observes.
The router package already depended on the monitor package
transitively. Now it's explicit.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Instead of logging lsof execution failures to stdout,
incorporate them into the returned error.
While we're here, make it clear that the file
success case always returns a nil error.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The maximum unix domain socket path length on darwin is 104 bytes,
including the trailing NUL.
On my machine, the path created by some newly added tests (6eecf3c9)
was too long, resulting in cryptic test failures.
Shorten the names of the tests, and add a check to make
the diagnosis easier next time.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The fact that Hash returns a [sha256.Size]byte leaks details about
the underlying hash implementation. This could very well be any other
hashing algorithm with a possible different block size.
Abstract this implementation detail away by declaring an opaque type
that is comparable. While we are changing the signature of UpdateHash,
rename it to just Update to reduce stutter (e.g., deephash.Update).
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
It was a huge chunk of the overall log output and made debugging
difficult. Omit and summarize the spammy *.arpa parts instead.
Fixestailscale/corp#2066 (to which nobody had opinions, so)
With this, I can now:
* install Tailscale
* stop the GUI
* net stop Tailscale
* net start Tailscale
* tailscale up --unattended
(where the middle three steps simulate what would happen on a Windows
Server Core machine without a GUI)
Fixes#2137
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
To unify the Windows service and non-service/non-Windows paths a bit.
And provides a way to make Linux act like Windows for testing.
(notably, for testing the fix to #2137)
One perhaps visible change of this is that tailscaled.exe when run in
cmd.exe/powershell (not as a Windows Service) no longer uses the
"_daemon" autostart key. But in addition to being naturally what falls
out of this change, that's also what Windows users would likely want,
as otherwise the unattended mode user is ignored when the "_daemon"
autostart key is specified. Notably, this would let people debug what
their normally-run-as-a-service tailscaled is doing, even when they're
running in Unattended Mode.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Adds TS_DEBUG_UP_FLAG_GOOS for integration tests to make "tailscale
up" act like other OSes.
For an upcoming change to test #2137.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We have different deps depending on the platform.
If we pick a privileged platform, we'll miss some deps.
If we use the union of all platforms, the integration test
won't compile on some platforms, because it'll import
packages that don't compile on that platform.
Give in to the madness and give each platform its own deps file.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The earlier 2ba36c294b started listening
for ip rule changes and only cared about DELRULE events, buts its subscription
included all rule events, including new ones, which meant we were then
catching our own ip rule creations and logging about how they were unknown.
Stop that log spam.
Updates #1591
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
For debugging & working on #1591 where certain versions of systemd-networkd
delete Tailscale's ip rule entries.
Updates #1591
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This moves the distribution definitions into a maintainable hujson file
instead of just existing as constants in `distros.go`. Comments are
maintained from the inline definitions.
This uses jennifer[1] for hygenic source tree creation. This allows us
to generate a unique top-level test for each VM run. This should
hopefully help make the output of `go test` easier to read.
This also separates each test out into its own top-level test so that we
can better track the time that each distro takes. I really wish there
was a way to have the `test_codegen.go` file _always_ run as a part of
the compile process instead of having to rely on people remembering to
run `go generate`, but I am limited by my tools.
This will let us remove the `-distro-regex` flag and use `go test -run`
to pick which distros are run.
Signed-off-by: Christine Dodrill <xe@tailscale.com>
Add in UPnP portmapping, using goupnp library in order to get the UPnP client and run the
portmapping functions. This rips out anywhere where UPnP used to be in portmapping, and has a
flow separate from PMP and PCP.
RELNOTE=portmapper now supports UPnP mappings
Fixes#682
Updates #2109
Signed-off-by: julianknodt <julianknodt@gmail.com>
Recognize Cloudflare, Google, Quad9 which are by far the
majority of upstream DNS servers that people use.
RELNOTE=MagicDNS now uses DNS-over-HTTPS when querying popular upstream resolvers,
so DNS queries aren't sent in the clear over the Internet.
Updates #915 (might fix it?)
Updates #988 (gets us closer, if it fixes Android)
Updates #74 (not yet configurable, but progress)
Updates #2056 (not yet configurable, dup of #74?)
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Added the net/speedtest package that contains code for starting up a
speedtest server and a client. The speedtest command for starting a
client takes in a duration for the speedtest as well as the host and
port of the speedtest server to connect to. The speedtest command for
starting a server takes in a host:port pair to listen on.
Signed-off-by: Aaditya Chaudhary <32117362+AadityaChaudhary@users.noreply.github.com>
Apparently this test was flaking because I critically misunderstood how
the kernel buffers UDP packets for senders. I'm trying to send more UDP
packets and will see if that helps.
Signed-off-by: Christine Dodrill <xe@tailscale.com>
This test used to try to run this only once, but this variant of the
test attempts to run `tailscale status` up to 6 times in a loop with
exponential backoff.
This fixes the flakiness found in previous instances of this test.
Signed-off-by: Christine Dodrill <xe@tailscale.com>
This adds some convenient defaults for -c, so that user-provided DERPs require less command line
flags.
Signed-off-by: julianknodt <julianknodt@gmail.com>
With netns handling localhost now, existing tests no longer
need special handling. The tests set up their connections to
localhost, and the connections work without fuss.
Remove the special handling for tests.
Also remove the hostinfo.TestCase support, since this was
the only use of it. It can be added back later if really
needed, but it would be better to try to make tests work
without special cases.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
netns_linux checked whether "ip rule" could run to determine
whether to use SO_MARK for network namespacing. However in
Linux environments which lack CAP_NET_ADMIN, such as various
container runtimes, the "ip rule" command succeeds but SO_MARK
fails due to lack of permission. SO_BINDTODEVICE would work in
these environments, but isn't tried.
In addition to running "ip rule" check directly whether SO_MARK
works or not. Among others, this allows Microsoft Azure App
Service and AWS App Runner to work.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
Connections to a control server or log server on localhost,
used in a number of tests, are working right now because the
calls to SO_MARK in netns fail for non-root but then we ignore
the failure when running in tests.
Unfortunately that failure in SO_MARK also affects container
environments without CAP_NET_ADMIN, breaking Tailscale
connectivity. We're about to fix netns to recognize when SO_MARK
doesn't work and use SO_BINDTODEVICE instead. Doing so makes
tests fail, as their sockets now BINDTODEVICE of the default
route and cannot connect to localhost.
Add support to skip namespacing for localhost connections,
which Darwin and Windows already do. This is not conditional
on running within a test, if you tell tailscaled to connect
to localhost it will automatically use a non-namespaced
socket to do so.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
Several other AWS services like App Run and Lightsail Containers
appear to be layers atop Fargate, to the point that we cannot easily
tell them apart from within the container. Contacting the metadata
service would distinguish them, but doing that from inside tailscaled
seems uncalled for. Just report them as Fargate.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
Treat automated tests as their own, unique environment
rather than the type of container they are running in.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
The localapi was double-unescaping: once by net/http populating
the URL, and once by ourselves later. We need to start with the raw
escaped URL if we're doing it ourselves.
Started to write a test but it got invasive. Will have to add those
tests later in a commit that's not being cherry-picked to a release
branch.
Fixes#2288
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
logBufWriter had no serialization.
It just so happens that none of its users currently ever log concurrently.
Make it safe for concurrent use.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Regression from 6d10655dc3, which added
UpdatePrefs but didn't write it out to disk.
I'd planned on adding tests to state_test.go which is why I'd earlier
added 46896a9311 to prepare for making
such persistence tests easier to write, but turns out state_test.go
didn't even test UpdatePrefs, so I'm staying out of there.
Instead, this is tested using integration tests.
Fixes#2321
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
As Brad suggested, mem.RO allows for a lot of easy perf gains. There were also some smaller
changes outside of mem.RO, such as using hex.Decode instead of hex.DecodeString.
```
name old time/op new time/op delta
FromUAPI-8 14.7µs ± 3% 12.3µs ± 4% -16.58% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
FromUAPI-8 9.52kB ± 0% 7.04kB ± 0% -26.05% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
FromUAPI-8 77.0 ± 0% 29.0 ± 0% -62.34% (p=0.008 n=5+5)
```
Signed-off-by: julianknodt <julianknodt@gmail.com>
Adds a benchmark for FromUAPI in wgcfg.
It appears that it's not actually that slow, the main allocations are from the scanner and new
config.
Updates #1912.
Signed-off-by: julianknodt <julianknodt@gmail.com>
My spatial memory functions poorly with large files and the vms_test.go
file recently surpassed the point where it functions adequately. This
patch splits up vms_test.go into more files to make my spatial memory
function like I need it to.
Signed-off-by: Christine Dodrill <xe@tailscale.com>
It was once believed that it might be useful. It wasn't. We never used it.
Remove it so we don't slowly leak memory.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
In theory, some of the other table-driven tests could be moved into this
form now but I didn't want to disturb too much good test code.
Includes a commented-out test for #2384 that is currently failing.
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
The DERPTestPort int meant two things before: which port to use, and
whether to disable TLS verification. Users would like to set the port
without disabling TLS, so break it into two options.
Updates #1264
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
To avoid the generated nixos disk images from becoming immune from the
GC, I delete the symlink to the nix store at the end of tests.
`t.Cleanup` runs at the end of a test. I changed this part of the code
to have a separate timer for how long it takes to run NixOS builds, but
I did that by using a subtest. This means that it was creating the NixOS
image, deleting its symlink and then trying to use that symlink to find
the resulting disk image, making the whole thing ineffectual.
This was a mistake. I am reverting this change made in
https://github.com/tailscale/tailscale/pull/2360 to remove this layer of
subtesting.
Signed-off-by: Christine Dodrill <xe@tailscale.com>
This allows the test to be run inside a mounted filesystem,
which I'm doing now as a I develop on a linux VM.
Fixes#2367.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
This tests incoming and outgoing UDP traffic. It would test incoming UDP
traffic however our socks server doesn't seem to allow for connecting to
destinations over UDP. When the socks server gets that support the
incoming test should pass without issue.
Signed-off-by: Christine Dodrill <xe@tailscale.com>
This adapts the existing in-process logcatcher from tstest/integration
into a public type and uses it on the side of testcontrol. This also
fixes a bug in the Alpine Linux OpenRC unit that makes every value in
`/etc/default/tailscaled` exported into tailscaled's environment, a-la
systemd [Service].EnviromentFile.
Signed-off-by: Christine Dodrill <xe@tailscale.com>
This does a few things:
1. Rewrites the tests so that we get a log of what individual tests
failed at the end of a test run.
2. Adds a test that runs an HTTP server via the tester tailscale node and
then has the VMs connect to that over Tailscale.
3. Dials the VM over Tailscale and ensures it answers SSH requests.
4. Other minor framework refactoring.
Signed-off-by: Christine Dodrill <xe@tailscale.com>
Oracle Linux[1] is a CentOS fork. It is not very special. I am adding it
to the integration jungle because I am adding it to pkgs and the website
directions.
[1]: https://www.oracle.com/linux/
Signed-off-by: Christine Dodrill <xe@tailscale.com>
To remove some multi-case selects, we intentionally allowed
sends on closed channels (cc23049cd2).
However, we also introduced concurrent sends and closes,
which is a data race.
This commit fixes the data race. The mutexes here are uncontended,
and thus very cheap.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
It was caching too aggressively, as it didn't see our deps due to our
running "go install tailscaled" as a child process.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
version.sh was removed in commit 5088af68. Use `build_dist.sh shellvars`
to provide version information instead.
Signed-off-by: Irshad Pananilath <pmirshad+code@gmail.com>
This makes sure `tailscale status` and `tailscale ping` works. It also
switches goexpect to use a batch instead of manually banging out each
line, which makes the tests so much easier to read.
Signed-off-by: Christine Dodrill <xe@tailscale.com>
Running hex.Encode(b, b) is a bad idea.
The first byte of input will overwrite the first two bytes of output.
Subsequent bytes have no impact on the output.
Not related to today's IPv6 bug, but...wh::ps.
This caused us to spuriously ignore some wireguard config updates.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Calculate whether the packet is injected directly,
rather than via an else branch.
Unify the exit paths. It is easier here than duplicating them.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Every TUN Read went through several multi-case selects.
We know from past experience with wireguard-go that these are slow
and cause scheduler churn.
The selects served two purposes: they separated errors from data and
gracefully handled shutdown. The first is fairly easy to replace by sending
errors and data over a single channel. The second, less so.
We considered a few approaches: Intricate webs of channels,
global condition variables. They all get ugly fast.
Instead, let's embrace the ugly and handle shutdown ungracefully.
It's horrible, but the horror is simple and localized.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The implementation of the preview function has changed since the
API was documented, update the document to match.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
We can't access b.netMap without holding b.mu.
We already grabbed it earlier in the function with the lock held.
Introduced in Nov 2020 in 7ea809897d.
Discovered during stress testing.
Apparently it's a pretty rare?
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
For instance, ephemeral nodes with only IPv6 addresses can now
SOCKS5-dial out to names like "foo" and resolve foo's IPv6 address
rather than foo's IPv4 address and get a "no route"
(*tcpip.ErrNoRoute) error from netstack's dialer.
Per https://github.com/tailscale/tailscale/issues/2268#issuecomment-870027626
which is only part of the isuse.
Updates #2268
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We also have to make a one-off change to /etc/wsl.conf to stop every
invocation of wsl.exe clobbering the /etc/resolv.conf. This appears to
be a safe change to make permanently, as even though the resolv.conf is
constantly clobbered, it is always the same stable internal IP that is
set as a nameserver. (I believe the resolv.conf clobbering predates the
MS stub resolver.)
Tested on WSL2, should work for WSL1 too.
Fixes#775
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
This is preliminary work for using the directManager as
part of a wslManager on windows, where in addition to configuring
windows we'll use wsl.exe to edit the linux file system and modify the
system resolv.conf.
The pinholeFS is a little funky, but it's designed to work through
simple unix tools via wsl.exe without invoking bash. I would not have
thought it would stand on its own like this, but it turns out it's
useful for writing a test for the directManager.
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
After allowing for custom DERP maps, it's convenient to be able to see their latency in
netcheck. This adds a query to the local tailscaled for the current DERPMap.
Updates #1264
Signed-off-by: julianknodt <julianknodt@gmail.com>
Turns out we never reliably log the control plane URL a client connects
to. Do it here, and include the server public key, which might
inadvertently tell us something interesting some day.
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
This is an experiment to see how often this test would fail if we run it
on every commit. This depends on #2145 to fix a flaky part of the test.
Signed-off-by: Christine Dodrill <xe@tailscale.com>
Okay, so, at a high level testing NixOS is a lot different than
other distros due to NixOS' determinism. Normally NixOS wants packages to
be defined in either an overlay, a custom packageOverrides or even
yolo-inline as a part of the system configuration. This is going to have
us take a different approach compared to other distributions. The overall
plan here is as following:
1. make the binaries as normal
2. template in their paths as raw strings to the nixos system module
3. run `nixos-generators -f qcow -o $CACHE_DIR/tailscale/nixos/version -c generated-config.nix`
4. pass that to the steps that make the virtual machine
It doesn't really make sense for us to use a premade virtual machine image
for this as that will make it harder to deterministically create the image.
Nix commands generate a lot of output, so their output is hidden behind the
`-verbose-nix-output` flag.
This unfortunately makes this test suite have a hard dependency on
Nix/NixOS, however the test suite has only ever been run on NixOS (and I
am not sure if it runs on other distros at all), so this probably isn't too
big of an issue.
Signed-off-by: Christine Dodrill <xe@tailscale.com>
This has been bothering me for a while, but everytime I run format from the root directory
it also formats this file. I didn't want to add it to my other PRs but it's annoying to have to
revert it every time.
Signed-off-by: julianknodt <julianknodt@gmail.com>
Move derpmap.Prod to a static JSON file (go:generate'd) instead,
to make its role explicit. And add a TODO about making dnsfallback
use an update-over-time DERP map file instead of a baked-in one.
Updates #1264
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Previously this test would reach out to the public DERP servers in order
to help machines connect with eachother. This is not ideal given our
plans to run these tests completely disconnected from the internet. This
patch introduces an in-process DERP server running on its own randomly
assigned HTTP port.
Updates #1988
Signed-off-by: Christine Dodrill <xe@tailscale.com>
Occasionally the test framework would fail with a timeout due to a
virtual machine not phoning home in time. This seems to be happen
whenever qemu can't bind the VNC or SSH ports for a virtual machine.
This was fixed by taking the following actions:
1. Don't listen on VNC unless the `-use-vnc` flag is passed, this
removes the need to listen on VNC at all in most cases. The option to
use VNC is still left in for debugging virtual machines, but removing
this makes it easier to deal with (VNC uses this odd system of
"displays" that are mapped to ports above 5900, and qemu doesn't
offer a decent way to use a normal port number, so we just disable
VNC by default as a compromise).
2. Use a (hopefully) inactive port for SSH. In an ideal world I'd just
have the VM's SSH port be exposed via a Unix socket, however the QEMU
documentation doesn't really say if you can do this or not. While I
do more research, this stopgap will have to make do.
3. Strictly tie more VM resource lifetimes to the tests themselves.
Previously the disk image layers for virtual machines were only
cleaned up at the end of the test and existed in the parent
test-scoped temporary folder. This can make your tmpfs run out of
space, which is not ideal. This should minimize the use of temporary
storage as much as I know how to.
4. Strictly tie the qemu process lifetime to the lifetime of the test
using testing.T#Cleanup. Previously it used a defer statement to
clean up the qemu process, however if the tests timed out this defer
was not run. This left around an orphaned qemu process that had to be
killed manually. This change ensures that all qemu processes exit
when their relevant tests finish.
Signed-off-by: Christine Dodrill <xe@tailscale.com>
Fix regression from 19c3e6cc9e
which made the locking coarser.
Found while debugging #2245, which ended up looking like a tswin/Windows
issue where Crawshaw had blocked cmd.exe's output.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This uses a debug envvar to optionally disable filter logging rate
limits by setting the environment variable
TS_DEBUG_FILTER_RATE_LIMIT_LOGS to "all", and if it matches,
the code will effectively disable the limits on the log rate by
setting the limit to 1 millisecond. This should make sure that all
filter logs will be captured.
Signed-off-by: Christine Dodrill <xe@tailscale.com>
This change (subject to some limitations) looks for the EDNS OPT record
in queries and responses, clamping the size field to fit within our DNS
receive buffer. If the size field is smaller than the DNS receive buffer
then it is left unchanged.
I think we will eventually need to transition to fully processing the
DNS queries to handle all situations, but this should cover the most
common case.
Mostly fixes#2066
Signed-off-by: Adrian Dewhurst <adrian@tailscale.com>
This adds a flag to the DERP server which specifies to verify clients through a local
tailscaled. It is opt-in, so should not affect existing clients, and is mainly intended for
users who want to run their own DERP servers. It assumes there is a local tailscaled running and
will attempt to hit it for peer status information.
Updates #1264
Signed-off-by: julianknodt <julianknodt@gmail.com>
Unused so far, but eventually we'll want this for SOCKS5 UDP binds (we
currently only do TCP with SOCKS5), and also for #2102 for forwarding
MagicDNS upstream to Tailscale IPs over netstack.
Updates #2102
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Windows 8.1 incorrectly handles search paths on an interface with no
associated resolver, so we have to provide a full primary DNS config
rather than use Windows 8.1's nascent-but-present NRPT functionality.
Fixes#2237.
Signed-off-by: David Anderson <danderson@tailscale.com>
This adds a flag to derp maps which specifies that default Tailscale DERP servers should not be
used. If true and there are entries in this map, it indicates that the entries in this map
should take precedent and not hit any of tailscale's DERP servers.
This change is backwards compatible, as the default behavior should be false.
Updates #1264
Signed-off-by: julianknodt <julianknodt@gmail.com>
In order to clone DERPMaps, it was necessary to extend the cloner so that it supports
nested pointers inside of maps which are also cloneable. This also adds cloning for DERPRegions
and DERPNodes because they are on DERPMap's maps.
Signed-off-by: julianknodt <julianknodt@gmail.com>
The only connectivity an AWS Lambda container has is an IPv4 link-local
169.254.x.x address using NAT:
12: vtarget_1@if11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
qdisc noqueue state UP group default qlen 1000
link/ether 7e:1c:3f:00:00:00 brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet 169.254.79.1/32 scope global vtarget_1
valid_lft forever preferred_lft forever
If there are no other IPv4/v6 addresses available, and we are running
in AWS Lambda, allow IPv4 169.254.x.x addresses to be used.
----
Similarly, a Google Cloud Run container's only connectivity is
a Unique Local Address fddf:3978:feb1:d745::c001/128.
If there are no other addresses available then allow IPv6
Unique Local Addresses to be used.
We actually did this in an earlier release, but now refactor it to
work the same way as the IPv4 link-local support is being done.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
Before it was using the local address and port, so fix that.
The fields in the response from `ss` are:
State, Recv-Q, Send-Q, Local Address:Port, Peer Address:Port, Process
Signed-off-by: julianknodt <julianknodt@gmail.com>
This adds a handler on the DERP server for logging bytes send and received by clients of the
server, by holding open a connection and recording if there is a difference between the number
of bytes sent and received. It sends a JSON marshalled object if there is an increase in the
number of bytes.
Signed-off-by: julianknodt <julianknodt@gmail.com>
Split out of Denton's #2164, to make that diff smaller to review.
This change has no behavior changes.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Previously we used t.Logf indirectly via package log. This worked, but
it was not ideal for our needs. It could cause the streams of output to
get crossed. This change uses a logger.FuncWriter every place log.Output
was previously used, which will more correctly write log information to
the right test output stream.
Signed-off-by: Christine Dodrill <xe@tailscale.com>
It's possible to install a configuration that passes our current checks
for systemd-resolved, without actually pointing to systemd-resolved. In
that case, we end up programming DNS in resolved, but that config never
applies to any name resolution requests on the system.
This is quite a far-out edge case, but there's a simple additional check
we can do: if the header comment names systemd-resolved, there should be
a single nameserver in resolv.conf pointing to 127.0.0.53. If not, the
configuration should be treated as an unmanaged resolv.conf.
Fixes#2136.
Signed-off-by: David Anderson <danderson@tailscale.com>
The dependency is a "soft" ordering dependency only, meaning that
tailscaled will start after those services if those services were
going to be run anyway, but doesn't force either of them to run.
That's why it's safe to specify this dependency unconditionally,
even for systems that don't run those services.
Updates #2127.
Signed-off-by: David Anderson <danderson@tailscale.com>
AWS Lambda uses Docker containers but does not
have the string "docker" in its /proc/1/cgroup.
Infer AWS Lambda via the environment variables
it sets.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
It would be useful to know the time that packets spend inside of a queue before they are sent
off, as that can be indicative of the load the server is handling (and there was also an
existing TODO). This adds a simple exponential moving average metric to track the average packet
queue duration.
Changes during review:
Add CAS loop for recording queue timing w/ expvar.Func, rm snake_case, annotate in milliseconds,
convert
Signed-off-by: julianknodt <julianknodt@gmail.com>
Alpine Linux[1] is a minimal Linux distribution built around musl libc.
It boots very quickly, requires very little ram and is as close as you
can get to an ideal citizen for testing Tailscale on musl. Alpine has a
Tailscale package already[2], but this patch also makes it easier for us
to provide an Alpine Linux package off of pkgs in the future.
Alpine only offers Tailscale on the rolling-release edge branch.
[1]: https://alpinelinux.org/
[2]: https://pkgs.alpinelinux.org/packages?name=tailscale&branch=edge
Updates #1988
Signed-off-by: Christine Dodrill <xe@tailscale.com>
This fails pretty reliably with a lot of output now showing what's
happening:
TS_DEBUG_MAP=1 go test --failfast -v -run=Ping -race -count=20 ./tstest/integration --verbose-tailscaled
I haven't dug into the details yet, though.
Updates #2079
The route creation for the `tun` device was augmented in #1469 but
didn't account for adding IPv4 vs. IPv6 routes. There are 2 primary
changes as a result:
* Ensure that either `-inet` or `-inet6` was used in the
[`route(8)`](https://man.openbsd.org/route) command
* Use either the `localAddr4` or `localAddr6` for the gateway argument
depending which destination network is being added
The basis for the approach is based on the implementation from
`router_userspace_bsd.go`, including the `inet()` helper function.
Fixes#2048
References #1469
Signed-off-by: Fletcher Nichol <fnichol@nichol.ca>
This raises the maximum DNS response message size from 512 to 4095. This
should be large enough for almost all situations that do not need TCP.
We still do not recognize EDNS, so we will still forward requests that
claim support for a larger response size than 4095 (that will be solved
later). For now, when a response comes back that is too large to fit in
our receive buffer, we now set the truncation flag in the DNS header,
which is an improvement from before but will prompt attempts to use TCP
which isn't supported yet.
On Windows, WSARecvFrom into a buffer that's too small returns an error
in addition to the data. On other OSes, the extra data is silently
discarded. In this case, we prefer the latter so need to catch the error
on Windows.
Partially addresses #1123
Signed-off-by: Adrian Dewhurst <adrian@tailscale.com>
This runner is in my homelab while we muse about a better, more
permanent home for these tests.
Updates #1988
Signed-off-by: Christine Dodrill <xe@tailscale.com>
This makes integration tests pull pristine VM images from Amazon S3 if
they don't exist on disk. If the S3 fetch fails, it will fall back to
grabbing the image from the public internet. The VM images on the public
internet are known to be updated without warning and thusly change their
SHA256 checksum. This is not ideal for a test that we want to be able to
fire and forget, then run reliably for a very long time.
This requires an AWS profile to be configured at the default path. The
S3 bucket is rigged so that the requester pays. The VM images are
currently about 6.9 gigabytes. Please keep this in mind when running
these tests on your machine.
Documentation was added to the integration test folder to aid others in
running these tests on their machine.
Some wording in the logs of the tests was altered.
Updates #1988
Signed-off-by: Christine Dodrill <xe@tailscale.com>
Some downstream distros eval'd version/version.sh to get at the shell variables
within their own build process. They can now `./build_dist.sh shellvars` to get
those.
Fixes#2058.
Signed-off-by: David Anderson <dave@natulte.net>
It is a bit faster.
But more importantly, it matches upstream byte-for-byte,
which ensures there'll be no corner cases in which we disagree.
name old time/op new time/op delta
SetPeers-8 3.58µs ± 0% 3.16µs ± 2% -11.74% (p=0.016 n=4+5)
name old alloc/op new alloc/op delta
SetPeers-8 2.53kB ± 0% 2.53kB ± 0% ~ (all equal)
name old allocs/op new allocs/op delta
SetPeers-8 99.0 ± 0% 99.0 ± 0% ~ (all equal)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The image downloads can take a significant amount of time for the tests.
This creates a new test that will download every distro image into the
local cache in parallel, optionally matching the distribution regex.
Updates #1988
Signed-off-by: Christine Dodrill <xe@tailscale.com>
I've run into a couple issues where the tests time out while a VM image
is being downloaded, making the cache poisoned for the next run. This
moves the hash checking into its own function and calls it much sooner
in the testing chain. If the hash check fails, the OS is redownloaded.
Signed-off-by: Christine Dodrill <xe@tailscale.com>
Most of the time qemu will output nothing when it is running. This is
expected behavior. However when qemu is unable to start due to some
problem, it prints that to either stdout or stderr. Previously this
output wasn't being captured. This patch captures that output to aid in
debugging qemu issues.
Updates #1988
Signed-off-by: Christine Dodrill <xe@tailscale.com>
We were crashing on in initPeerAPIListener when called from
authReconfig when b.netMap is nil. But authReconfig already returns
before the call to initPeerAPIListener when b.netMap is nil, but it
releases the b.mu mutex before calling initPeerAPIListener which
reacquires it and assumes it's still nil.
The only thing that can be setting it to nil is setNetMapLocked, which
is called by ResetForClientDisconnect, Logout/logout, or Start, all of
which can happen during an authReconfig.
So be more defensive.
Fixes#1996
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We used to use "redo" for that, but it was pretty vague.
Also, fix the build tags broken in interfaces_default_route_test.go from
a9745a0b68, moving those Linux-specific
tests to interfaces_linux_test.go.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Previously this built the binaries for every distro. This is a bit
overkill given we are using static binaries. This patch makes us only
build once.
There was also a weird issue with how processes were being managed.
Previously we just killed qemu with Process.Kill(), however that was
leaving behind zombies. This has been mended to not only kill qemu but
also waitpid() the process so it doesn't become a zombie.
Updates #1988
Signed-off-by: Christine Dodrill <xe@tailscale.com>
The OpenSUSE 15.1 image we are using (and conseqentially the only one
that is really available easily given it is EOL) has cloud-init
hardcoded to use the OpenStack metadata thingy. Other OpenSUSE Leap
images function fine with the NoCloud backend, but this one seems to
just not work with it. No bother, we can just pretend to be OpenStack.
Thanks to Okami for giving me an example OpenStack configuration seed
image.
Updates #1988
Signed-off-by: Christine Dodrill <xe@tailscale.com>
Arch is a bit of a weirder distro, however as a side effect it is much
more of a systemd purist experience. Adding it to our test suite will
make sure that we are working in the systemd happy path.
Updates #1988
Signed-off-by: Christine Dodrill <xe@tailscale.com>
This distro is about to be released. OpenSUSE has historically had the
least coverage for functional testing, so this may prove useful in the
future.
Signed-off-by: Christine Dodrill <xe@tailscale.com>
DNS names consist of labels, but outside of length limits, DNS
itself permits any content within the labels. Some records require
labels to conform to hostname limitations (which is what we implemented
before), but not all.
Fixes#2024.
Signed-off-by: David Anderson <danderson@tailscale.com>
Instead of testing all the VMs at once when they are all ready, this
patch changes the testing logic so that the vms are tested as soon as
they register with testcontrol. Also limit the amount of VM ram used at
once with the `-ram-limit` flag. That uses a semaphore to guard resource
use.
Also document CentOS' sins.
Updates #1988
Signed-off-by: Christine Dodrill <xe@tailscale.com>
The resulting empty Prefs had AllowSingleHosts=false and
Routeall=false, so that on iOS if you did these steps:
- Login and leave running
- Terminate the frontend
- Restart the frontend (fast path restart, missing prefs)
- Set WantRunning=false
- Set WantRunning=true
...then you would have Tailscale running, but with no routes. You would
also accidentally disable the ExitNodeID/IP prefs (symptom: the current
exit node setting didn't appear in the UI), but since nothing
else worked either, you probably didn't notice.
The fix was easy enough. It turns out we already knew about the
problem, so this also fixes one of the BUG entries in state_test.
Fixes: #1918 (BUG-1) and some as-yet-unreported bugs with exit nodes.
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
Previously, there was no server round trip required to log out, so when
you asked ipnlocal to Logout(), it could clear the netmap immediately
and switch to NeedsLogin state.
In v1.8, we added a true Logout operation. ipn.Logout() would trigger
an async cc.StartLogout() and *also* immediately switch to NeedsLogin.
Unfortunately, some frontends would see NeedsLogin and immediately
trigger a new StartInteractiveLogin() operation, before the
controlclient auth state machine actually acted on the Logout command,
thus accidentally invalidating the entire logout operation, retaining
the netmap, and violating the user's expectations.
Instead, add a new LogoutFinished signal from controlclient
(paralleling LoginFinished) and, upon starting a logout, don't update
the ipn state machine until it's received.
Updates: #1918 (BUG-2)
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
If you set `-distro-regex` to match a subset of distros, only those
distros will be tested. Ex:
$ go test -run-vm-tests -distro-regex='opensuse'
Signed-off-by: Christine Dodrill <xe@tailscale.com>
Don't try to do heuristics on the name. Use the net/interfaces package
which we already have to do this sort of stuff.
Fixes#2011
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Instead of pulling packages from pkgs.tailscale.com, we should use the
tailscale binaries that are local to this git commit. This exposes a bit
of the integration testing stack in order to copy the binaries
correctly.
This commit also bumps our version of github.com/pkg/sftp to the latest
commit.
If you run into trouble with yaml, be sure to check out the
commented-out alpine linux image complete with instructions on how to
use it.
Updates #1988
Signed-off-by: Christine Dodrill <xe@tailscale.com>
netaddr allocated at the time this was written. No longer.
name old time/op new time/op delta
TailscaleServiceAddr-4 5.46ns ± 4% 1.83ns ± 3% -66.52% (p=0.008 n=5+5)
A bunch of the others can probably be simplified too, but this
was the only one with just an IP and not an IPPrefix.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Previously we spewed a lot of output to stdout and stderr, even when
`-v` wasn't set. This is sub-optimal for various reasons. This patch
shunts that output to test logs so it only shows up when `-v` is set.
Updates #1988
Signed-off-by: Christine Dodrill <xe@tailscale.com>
The cyolosecurity fork of certstore did not update its module name and
thus can only be used with a replace directive. This interferes with
installing using `go install` so I created a tailscale fork with an
updated module name.
Signed-off-by: Adrian Dewhurst <adrian@tailscale.com>
Instead of relying on a libvirtd bridge address that you probably won't
have on your system.
Updates #1988
Signed-off-by: Christine Dodrill <xe@tailscale.com>
On clean installs we didn't set use iptables, but during upgrades it
looks like we could use old prefs that directed us to go into the iptables
paths that might fail on Synology.
Updates #1995Fixestailscale/tailscale-synology#57 (I think)
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This will spin up a few vms and then try and make them connect to a
testcontrol server.
Updates #1988
Signed-off-by: Christine Dodrill <xe@tailscale.com>
When tailscaled starts up, these lines run:
func run() error {
// ...
pol := logpolicy.New("tailnode.log.tailscale.io")
pol.SetVerbosityLevel(args.verbose)
// ...
}
If there are old log entries present, they immediate start getting uploaded. This races with the call to pol.SetVerbosityLevel.
This manifested itself as a test failure in tailscale.com/tstest/integration
when run with -race:
WARNING: DATA RACE
Read at 0x00c0001bc970 by goroutine 24:
tailscale.com/logtail.(*Logger).Write()
/Users/josh/t/corp/oss/logtail/logtail.go:517 +0x27c
log.(*Logger).Output()
/Users/josh/go/ts/src/log/log.go:184 +0x2b8
log.Printf()
/Users/josh/go/ts/src/log/log.go:323 +0x94
tailscale.com/logpolicy.newLogtailTransport.func1()
/Users/josh/t/corp/oss/logpolicy/logpolicy.go:509 +0x36c
net/http.(*Transport).dial()
/Users/josh/go/ts/src/net/http/transport.go:1168 +0x238
net/http.(*Transport).dialConn()
/Users/josh/go/ts/src/net/http/transport.go:1606 +0x21d0
net/http.(*Transport).dialConnFor()
/Users/josh/go/ts/src/net/http/transport.go:1448 +0xe4
Previous write at 0x00c0001bc970 by main goroutine:
tailscale.com/logtail.(*Logger).SetVerbosityLevel()
/Users/josh/t/corp/oss/logtail/logtail.go:131 +0x98
tailscale.com/logpolicy.(*Policy).SetVerbosityLevel()
/Users/josh/t/corp/oss/logpolicy/logpolicy.go:463 +0x60
main.run()
/Users/josh/t/corp/oss/cmd/tailscaled/tailscaled.go:178 +0x50
main.main()
/Users/josh/t/corp/oss/cmd/tailscaled/tailscaled.go:163 +0x71c
Goroutine 24 (running) created at:
net/http.(*Transport).queueForDial()
/Users/josh/go/ts/src/net/http/transport.go:1417 +0x4d8
net/http.(*Transport).getConn()
/Users/josh/go/ts/src/net/http/transport.go:1371 +0x5b8
net/http.(*Transport).roundTrip()
/Users/josh/go/ts/src/net/http/transport.go:585 +0x7f4
net/http.(*Transport).RoundTrip()
/Users/josh/go/ts/src/net/http/roundtrip.go:17 +0x30
net/http.send()
/Users/josh/go/ts/src/net/http/client.go:251 +0x4f0
net/http.(*Client).send()
/Users/josh/go/ts/src/net/http/client.go:175 +0x148
net/http.(*Client).do()
/Users/josh/go/ts/src/net/http/client.go:717 +0x1d0
net/http.(*Client).Do()
/Users/josh/go/ts/src/net/http/client.go:585 +0x358
tailscale.com/logtail.(*Logger).upload()
/Users/josh/t/corp/oss/logtail/logtail.go:367 +0x334
tailscale.com/logtail.(*Logger).uploading()
/Users/josh/t/corp/oss/logtail/logtail.go:289 +0xec
Rather than complicate the logpolicy API,
allow the verbosity to be adjusted concurrently.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Pull in the latest version of wireguard-windows.
Switch to upstream wireguard-go.
This requires reverting all of our import paths.
Unfortunately, this has to happen at the same time.
The wireguard-go change is very low risk,
as that commit matches our fork almost exactly.
(The only changes are import paths, CI files, and a go.mod entry.)
So if there are issues as a result of this commit,
the first place to look is wireguard-windows changes.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
We repeat many peers each time we call SetPeers.
Instead of constructing strings for them from scratch every time,
keep strings alive across iterations.
name old time/op new time/op delta
SetPeers-8 3.58µs ± 1% 2.41µs ± 1% -32.60% (p=0.000 n=9+10)
name old alloc/op new alloc/op delta
SetPeers-8 2.53kB ± 0% 1.30kB ± 0% -48.73% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
SetPeers-8 99.0 ± 0% 16.0 ± 0% -83.84% (p=0.000 n=10+10)
We could reduce alloc/op 12% and allocs/op 23% if strs had
type map[string]strCache instead of map[string]*strCache,
but that wipes out the execution time impact.
Given that re-use is the most common scenario, let's optimize for it.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
e66d4e4c81 added AppendTo methods
to some key types. Their marshaled form is longer than 64 bytes.
name old time/op new time/op delta
Hash-8 15.5µs ± 1% 14.8µs ± 1% -4.17% (p=0.000 n=9+9)
name old alloc/op new alloc/op delta
Hash-8 1.18kB ± 0% 0.47kB ± 0% -59.87% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
Hash-8 12.0 ± 0% 6.0 ± 0% -50.00% (p=0.000 n=10+10)
This is still a bit worse than explicitly handling the types,
but much nicer.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
All netaddr types that we are concerned with now implement AppendTo.
Use the AppendTo method if available, and remove all references to netaddr.
This is slower but cleaner, and more readily re-usable by others.
name old time/op new time/op delta
Hash-8 12.6µs ± 0% 14.8µs ± 1% +18.05% (p=0.000 n=8+10)
HashMapAcyclic-8 21.4µs ± 1% 21.9µs ± 1% +2.39% (p=0.000 n=10+9)
name old alloc/op new alloc/op delta
Hash-8 408B ± 0% 408B ± 0% ~ (p=1.000 n=10+10)
HashMapAcyclic-8 1.00B ± 0% 1.00B ± 0% ~ (all equal)
name old allocs/op new allocs/op delta
Hash-8 6.00 ± 0% 6.00 ± 0% ~ (all equal)
HashMapAcyclic-8 0.00 0.00 ~ (all equal)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
These exist so we can use the optimized MapIter APIs
while still working with released versions of Go.
They're pretty simple, but some docs won't hurt.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Reduce to just a single external endpoint.
Convert from a variadic number of interfaces to a slice there.
name old time/op new time/op delta
Hash-8 14.4µs ± 0% 14.0µs ± 1% -3.08% (p=0.000 n=9+9)
name old alloc/op new alloc/op delta
Hash-8 873B ± 0% 793B ± 0% -9.16% (p=0.000 n=9+6)
name old allocs/op new allocs/op delta
Hash-8 18.0 ± 0% 14.0 ± 0% -22.22% (p=0.000 n=10+10)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Slightly slower, but lots less garbage.
We will recover the speed lost in a follow-up commit.
name old time/op new time/op delta
Hash-8 13.5µs ± 1% 14.3µs ± 0% +5.84% (p=0.000 n=10+9)
name old alloc/op new alloc/op delta
Hash-8 1.46kB ± 0% 0.87kB ± 0% -40.10% (p=0.000 n=7+10)
name old allocs/op new allocs/op delta
Hash-8 43.0 ± 0% 18.0 ± 0% -58.14% (p=0.000 n=10+10)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
This requires changes to the Go toolchain.
The changes are upstream at https://golang.org/cl/320929.
They haven't been pulled into our fork yet.
No need to allocate new iteration scratch values for every map.
name old time/op new time/op delta
Hash-8 13.6µs ± 0% 13.5µs ± 0% -1.01% (p=0.008 n=5+5)
HashMapAcyclic-8 21.2µs ± 1% 21.1µs ± 2% ~ (p=0.310 n=5+5)
name old alloc/op new alloc/op delta
Hash-8 1.58kB ± 0% 1.46kB ± 0% -7.60% (p=0.008 n=5+5)
HashMapAcyclic-8 152B ± 0% 128B ± 0% -15.79% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
Hash-8 49.0 ± 0% 43.0 ± 0% -12.24% (p=0.008 n=5+5)
HashMapAcyclic-8 4.00 ± 0% 2.00 ± 0% -50.00% (p=0.008 n=5+5)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
To get the benefit of this optimization requires help from the Go toolchain.
The changes are upstream at https://golang.org/cl/320929,
and have been pulled into the Tailscale fork at
728ecc58fd.
It also requires building with the build tag tailscale_go.
name old time/op new time/op delta
Hash-8 14.0µs ± 0% 13.6µs ± 0% -2.88% (p=0.008 n=5+5)
HashMapAcyclic-8 24.3µs ± 1% 21.2µs ± 1% -12.47% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
Hash-8 2.16kB ± 0% 1.58kB ± 0% -27.01% (p=0.008 n=5+5)
HashMapAcyclic-8 2.53kB ± 0% 0.15kB ± 0% -93.99% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
Hash-8 77.0 ± 0% 49.0 ± 0% -36.36% (p=0.008 n=5+5)
HashMapAcyclic-8 202 ± 0% 4 ± 0% -98.02% (p=0.008 n=5+5)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
setkey
The acyclic map code interacts badly with netaddr.IPs.
One of the netaddr.IP fields is an *intern.Value,
and we use a few sentinel values.
Those sentinel values make many of the netaddr data structures appear cyclic.
One option would be to replace the cycle-detection code with
a Floyd-Warshall style algorithm. The downside is that this will take
longer to detect cycles, particularly if the cycle is long.
This problem is exacerbated by the fact that the acyclic cycle detection
code shares a single visited map for the entire data structure,
not just the subsection of the data structure localized to the map.
Unfortunately, the extra allocations and work (and code) to use per-map
visited maps make this option not viable.
Instead, continue to special-case netaddr data types.
name old time/op new time/op delta
Hash-8 22.4µs ± 0% 14.0µs ± 0% -37.59% (p=0.008 n=5+5)
HashMapAcyclic-8 23.8µs ± 0% 24.3µs ± 1% +1.75% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
Hash-8 2.49kB ± 0% 2.16kB ± 0% ~ (p=0.079 n=4+5)
HashMapAcyclic-8 2.53kB ± 0% 2.53kB ± 0% ~ (all equal)
name old allocs/op new allocs/op delta
Hash-8 86.0 ± 0% 77.0 ± 0% -10.47% (p=0.008 n=5+5)
HashMapAcyclic-8 202 ± 0% 202 ± 0% ~ (all equal)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Hash and xor each entry instead, then write final xor'ed result.
name old time/op new time/op delta
Hash-4 33.6µs ± 4% 34.6µs ± 3% +3.03% (p=0.013 n=10+9)
name old alloc/op new alloc/op delta
Hash-4 1.86kB ± 0% 1.77kB ± 0% -5.10% (p=0.000 n=10+9)
name old allocs/op new allocs/op delta
Hash-4 51.0 ± 0% 49.0 ± 0% -3.92% (p=0.000 n=10+10)
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
At the start of a dev cycle we'll upgrade all dependencies.
Done with:
$ for Dep in $(cat go.mod | perl -ne '/(\S+) v/ and print "$1\n"'); do go get $Dep@upgrade; done
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Our wireguard-go fork used different values from upstream for
package device's memory limits on iOS.
This was the last blocker to removing our fork.
These values are now vars rather than consts for iOS.
c27ff9b9f6
Adjust them on startup to our preferred values.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Typical maps in production are considerably longer.
This helps benchmarks more accurately reflect the costs per key
vs the costs per map in deephash.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
A couple of code paths in ipnserver use a NewBackendServer with a nil
backend just to call the callback with an encapsulated error message.
This covers a panic case seen in logs.
For #1920
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
This leads to a cleaner separation of intent vs. implementation
(Routes is now the only place specifying who handles DNS requests),
and allows for cleaner expression of a configuration that creates
MagicDNS records without serving them to the OS.
Signed-off-by: David Anderson <danderson@tailscale.com>
* Added new Addresses / AllowedIPs fields to testcontrol when creating new &tailcfg.Node
Signed-off-by: Simeng He <simeng@tailscale.com>
* Added single node test to check Addresses and AllowedIPs
Signed-off-by: Simeng He <simeng@tailscale.com>
Co-authored-by: Simeng He <simeng@tailscale.com>
The script detects one of the supported OS/version combos, and issues
the right install instructions for it.
Co-authored-by: Christine Dodrill <xe@tailscale.com>
Signed-off-by: David Anderson <danderson@tailscale.com>
If --until-direct is set, the goal is to make a direct connection.
If we failed at that, say so, and exit with an error.
RELNOTE=tailscale ping --until-direct (the default) now exits with
a non-zero exit code if no direct connection was established.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
This code path is very tricky since it was originally designed for the
"re-authenticate to refresh my keys" use case, which didn't want to
lose the original session even if the refresh cycle failed. This is why
it acts differently from the Logout(); Login(); case.
Maybe that's too fancy, considering that it probably never quite worked
at all, for switching between users without logging out first. But it
works now.
This was more invasive than I hoped, but the necessary fixes actually
removed several other suspicious BUG: lines from state_test.go, so I'm
pretty confident this is a significant net improvement.
Fixestailscale/corp#1756.
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
If the engine was shutting down from a previous session
(e.closing=true), it would return an error code when trying to get
status. In that case, ipnlocal would never unblock any callers that
were waiting on the status.
Not sure if this ever happened in real life, but I accidentally
triggered it while writing a test.
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
magicsock.Conn.ParseEndpoint requires a peer's public key,
disco key, and legacy ip/ports in order to do its job.
We currently accomplish that by:
* adding the public key in our wireguard-go fork
* encoding the disco key as magic hostname
* using a bespoke comma-separated encoding
It's a bit messy.
Instead, switch to something simpler: use a json-encoded struct
containing exactly the information we need, in the form we use it.
Our wireguard-go fork still adds the public key to the
address when it passes it to ParseEndpoint, but now the code
compensating for that is just a couple of simple, well-commented lines.
Once this commit is in, we can remove that part of the fork
and remove the compensating code.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
The new code is ugly, but much faster and leaner.
name old time/op new time/op delta
SetPeers-8 7.81µs ± 1% 3.59µs ± 1% -54.04% (p=0.000 n=9+10)
name old alloc/op new alloc/op delta
SetPeers-8 7.68kB ± 0% 2.53kB ± 0% -67.08% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
SetPeers-8 237 ± 0% 99 ± 0% -58.23% (p=0.000 n=10+10)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Because it showed up on hello profiles.
Cycle through some moderate-sized sets of peers.
This should cover the "small tweaks to netmap"
and the "up/down cycle" cases.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Yes, it printed, but that was an implementation detail for hashing.
And coming optimization will make it print even less.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Not that it matters, but we were missing a close parens.
It's cheap, so add it.
name old time/op new time/op delta
Hash-8 6.64µs ± 0% 6.67µs ± 1% +0.42% (p=0.008 n=9+10)
name old alloc/op new alloc/op delta
Hash-8 1.54kB ± 0% 1.54kB ± 0% ~ (all equal)
name old allocs/op new allocs/op delta
Hash-8 37.0 ± 0% 37.0 ± 0% ~ (all equal)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The struct field names don't change within a single run,
so they are irrelevant. Use the field index instead.
name old time/op new time/op delta
Hash-8 6.52µs ± 0% 6.64µs ± 0% +1.91% (p=0.000 n=6+9)
name old alloc/op new alloc/op delta
Hash-8 1.67kB ± 0% 1.54kB ± 0% -7.66% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
Hash-8 53.0 ± 0% 37.0 ± 0% -30.19% (p=0.000 n=10+10)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
These show up a lot in our data structures.
name old time/op new time/op delta
Hash-8 11.5µs ± 1% 7.8µs ± 1% -32.17% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
Hash-8 1.98kB ± 0% 1.67kB ± 0% -15.73% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
Hash-8 82.0 ± 0% 53.0 ± 0% -35.37% (p=0.000 n=10+10)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The sha256 hash writer doesn't implement WriteString.
(See https://github.com/golang/go/issues/38776.)
As a consequence, we end up converting many strings to []byte.
Wrapping a bufio.Writer around the hash writer lets us
avoid these conversions by using WriteString.
Using a bufio.Writer is, perhaps surprisingly, almost as cheap as using unsafe.
The reason is that the sha256 writer does internal buffering,
but doesn't do any when handed larger writers.
Using a bufio.Writer merely shifts the data copying from one buffer
to a different one.
Using a concrete type for Print and print cuts 10% off of the execution time.
name old time/op new time/op delta
Hash-8 15.3µs ± 0% 11.5µs ± 0% -24.84% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
Hash-8 2.82kB ± 0% 1.98kB ± 0% -29.57% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
Hash-8 140 ± 0% 82 ± 0% -41.43% (p=0.000 n=10+10)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
deepprint currently accounts for 15% of allocs in tailscaled.
This is a useful benchmark to have.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
On benchmark completion, we shut down the wgengine.
If we happen to poll for status during shutdown,
we get an "engine closing" error.
It doesn't hurt anything; ignore it.
Fixestailscale/corp#1776
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
interfaces.Tailscale only returns an interface if it has at least one Tailscale
IP assigned to it. In the resolved DNS manager, when we're called upon to tear
down DNS config, the interface no longer has IPs.
Instead, look up the interface index on construction and reuse it throughout
the daemon lifecycle.
Fixes#1892.
Signed-off-by: David Anderson <dave@natulte.net>
If nobody is connected to the IPN bus, don't burn CPU & waste
allocations (causing more GC) by encoding netmaps for nobody.
This will notably help hello.ipn.dev.
Updates tailscale/corp#1773
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Without any synchronization here, the "first packet" callback can
be delayed indefinitely, while other work continues.
Since the callback starts the benchmark timer, this could skew results.
Worse, if the benchmark manages to complete before the benchmark timer begins,
it'll cause a data race with the benchmark shutdown performed by package testing.
That is what is reported in #1881.
This is a bit unfortunate, in that it means that users of TrafficGen have
to be careful to keep this callback speedy and lightweight and to avoid deadlocks.
Fixes#1881
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
It is possible to get multiple status callbacks from an Engine.
We need to wait for at least one from each Engine.
Without limiting to one per Engine,
wait.Wait can exit early or can panic due to a negative counter.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
This reduces the speed with which these benchmarks exhaust their supply fds.
Not to zero unfortunately, but it's still helpful when doing long runs.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
This function accounted for ~1% of all allocs by tailscaled.
It is trivial to improve, so may as well.
name old time/op new time/op delta
KeyMarshalText-8 197ns ± 0% 47ns ± 0% -76.12% (p=0.016 n=4+5)
name old alloc/op new alloc/op delta
KeyMarshalText-8 200B ± 0% 80B ± 0% -60.00% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
KeyMarshalText-8 5.00 ± 0% 1.00 ± 0% -80.00% (p=0.008 n=5+5)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The old way was way too fragile and had felt like it had more special
cases than normal cases. (see #1874, #1860, #1834, etc) It became very
obvious the old algorithm didn't work when we made the output be
pretty and try to show the user the command they need to run in
5ecc7c7200 for #1746)
The new algorithm is to map the prefs (current and new) back to flags
and then compare flags. This nicely handles the OS-specific flags and
the n:1 and 1:n flag:pref cases.
No change in the existing already-massive test suite, except some ordering
differences (the missing items are now sorted), but some new tests are
added for behavior that was broken before. In particular, it now:
* preserves non-pref boolean flags set to false, and preserves exit
node IPs (mapping them back from the ExitNodeID pref, as well as
ExitNodeIP),
* doesn't ignore --advertise-exit-node when doing an EditPrefs call
(#1880)
* doesn't lose the --operator on the non-EditPrefs paths (e.g. with
--force-reauth, or when the backend was not in state Running).
Fixes#1880
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Needed for the "up checker" to map back from exit node stable IDs (the
ipn.Prefs.ExitNodeID) back to an IP address in error messages.
But also previously requested so people can use it to then make API
calls. The upcoming "tailscale admin" subcommand will probably need it
too.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This reverts commit 7d16c8228b.
I have no idea how I ended up here. The bug I was fixing with this change
fails to reproduce on Ubuntu 18.04 now, and this change definitely does
break 20.04, 20.10, and Debian Buster. So, until we can reliably reproduce
the problem this was meant to fix, reverting.
Part of #1875
Signed-off-by: David Anderson <dave@natulte.net>
The --advertise-routes and --advertise-exit-node flags both mutating
one pref is the gift that keeps on giving.
I need to rewrite the this up warning code to first map prefs back to
flag values and then just compare flags instead of comparing prefs,
but this is the minimal fix for now.
This also includes work on the tests, to make them easier to write
(and more accurate), by letting you write the flag args directly and
have that parse into the upArgs/MaskedPrefs directly, the same as the
code, rather than them being possibly out of sync being written by
hand.
Fixes https://twitter.com/EXPbits/status/1390418145047887877
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Fields rename only.
Part of the general effort to make our code agnostic about endpoint formatting.
It's just a name, but it will soon be a misleading one; be more generic.
Do this as a separate commit because it generates a lot of whitespace changes.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Upstream wireguard-go renamed the interface method
from CreateEndpoint to ParseEndpoint.
I missed some comments. Fix them.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
By using conn.NewDefaultBind, this test requires that our endpoints
be comprehensible to wireguard-go. Instead, use a no-op bind that
treats endpoints as opaque strings.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Instead of calling ParseHex, do the hex.Decode directly.
name old time/op new time/op delta
UnmarshalJSON-8 86.9ns ± 0% 42.6ns ± 0% -50.94% (p=0.000 n=15+14)
name old alloc/op new alloc/op delta
UnmarshalJSON-8 128B ± 0% 0B -100.00% (p=0.000 n=15+15)
name old allocs/op new allocs/op delta
UnmarshalJSON-8 2.00 ± 0% 0.00 -100.00% (p=0.000 n=15+15)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Legacy endpoints (addrSet) currently reconstruct their dst string when requested.
Instead, store the dst string we were given to begin with.
In addition to being simpler and cheaper, this makes less code
aware of how to interpret endpoint strings.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Prefer the error from the actual wireguard-go device method call,
not {To,From}UAPI, as those tend to be less interesting I/O errors.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
When wireguard-go's UAPI interface fails with an error, ReconfigDevice hangs.
Fix that by buffering the channel and closing the writer after the call.
The code now matches the corresponding code in DeviceConfig, where I got it right.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
It is unused, and has been since early Feb 2021 (Tailscale 1.6).
We can't get delete the DeviceOptions entirely yet;
first #1831 and #1839 need to go in, along with some wireguard-go changes.
Deleting this chunk of code now will make the later commits more clearly correct.
Pingers can now go too.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The earlier eb06ec172f fixed
the flaky SSH issue (tailscale/corp#1725) by making sure that packets
addressed to Tailscale IPs in hybrid netstack mode weren't delivered
to netstack, but another issue remained:
All traffic handled by netstack was also potentially being handled by
the host networking stack, as the filter hook returned "Accept", which
made it keep processing. This could lead to various random racey chaos
as a function of OS/firewalls/routes/etc.
Instead, once we inject into netstack, stop our caller's packet
processing.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Whenever we dropped a packet due to ACLs, wireguard-go was logging:
Failed to write packet to TUN device: packet dropped by filter
Instead, just lie to wireguard-go and pretend everything is okay.
Fixes#1229
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Upstream wireguard-go renamed the interface method
from CreateEndpoint to ParseEndpoint.
I updated the log call site but not the allowlist.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
#1817 removed the only place in our CI where we executed our benchmark code.
Fix that by executing it everywhere.
The benchmarks are generally cheap and fast,
so this should add minimal overhead.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
To prevent issues like #1786, run staticcheck on the primary GOOSes:
linux, mac, and windows.
Windows also has a fair amount of GOARCH-specific code.
If we ever have GOARCH staticcheck failures on other GOOSes,
we can expand the test matrix further.
This requires installing the staticcheck binary so that
we can execute it with different GOOSes.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
This is needed because the original opts.Prefs field was at some point
subverted for use in frontend->backend state migration for backward
compatibility on some platforms. We still need that feature, but we
also need the feature of providing the full set of prefs from
`tailscale up`, *not* including overwriting the prefs.Persist keys, so
we can't use the original field from `tailscale up`.
`tailscale up` had attempted to compensate for that by doing SetPrefs()
before Start(), but that violates the ipn.Backend contract, which says
you should call Start() before anything else (that's why it's called
Start()). As a result, doing SetPrefs({ControlURL=...,
WantRunning=true}) would cause a connection to the *previous* control
server (because WantRunning=true), and then connect to the *new*
control server only after running Start().
This problem may have been avoided before, but only by pure luck.
It turned out to be relatively harmless since the connection to the old
control server was immediately closed and replaced anyway, but it
created a race condition that could have caused spurious notifications
or rejected keys if the server responded quickly.
As already covered by existing TODOs, a better fix would be to have
Start() get out of the business of state migration altogether. But
we're approaching a release so I want to make the minimum possible fix.
Fixes#1840.
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
We were over-eager in running tailscale in GUI mode.
f42ded7acf fixed that by
checking for a variety of shell-ish env vars and using those
to force us into CLI mode.
However, for reasons I don't understand, those shell env vars
are present when Xcode runs Tailscale.app on my machine.
(I've changed no configs, modified nothing on a brand new machine.)
Work around that by adding an additional "only in GUI mode" check.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
I was going to write a test for this using the tstest/integration test
stuff, but the testcontrol implementation isn't quite there yet (it
always registers nodes and doesn't provide AuthURLs). So, manually
tested for now.
Fixes#1843
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Per discussion, we want to have only one test assertion library,
and we want to start by exploring quicktest.
This was a mostly mechanical translation.
I think we could make this nicer by defining a few helper
closures at the beginning of the test. Later.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
This fixes#1833 in two ways:
* stop setting NoSNAT on non-Linux. It only matters on Linux and the flag
is hidden on non-Linux, but the code was still setting it. Because of
that, the new pref-reverting safety checks were failing when it was
changing.
* Ignore the two Linux-only prefs changing on non-Linux.
Fixes#1833
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
There's no need to warn that it was not provided on the command line
after doing a sequence of up; logout; up --args. If you're asking for
tailscale to be up, you always mean that you prefer LoggedOut to become
false.
Fixes#1828
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
Prior to wireguard-go using printf-style logging,
all wireguard-go logging occurred using format string "%s".
We fixed that but continued to use %s when we rewrote
peer identifiers into Tailscale style.
This commit removes that %sl, which makes rate limiting work correctly.
As a happy side-benefit, it should generate less garbage.
Instead of replacing all wireguard-go peer identifiers
that might occur anywhere in a fully formatted log string,
assume that they only come from args.
Check all args for things that look like *device.Peers
and replace them with appropriately reformatted strings.
There is a variety of ways that this could go wrong
(unusual format verbs or modifiers, peer identifiers
occurring as part of a larger printed object, future API changes),
but none of them occur now, are likely to be added,
or would be hard to work around if they did.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
The "stop phrases" we use all occur in wireguard-go in the format string.
We can avoid doing a bunch of fmt.Sprintf work when they appear.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
This removes the NewLocalBackendWithClientGen constructor added in
b4d04a065f and instead adds
LocalBackend.SetControlClientGetterForTesting, mirroring
LocalBackend.SetHTTPTestClient. NewLocalBackendWithClientGen was
weird in being exported but taking an unexported type. This was noted
during code review:
https://github.com/tailscale/tailscale/pull/1818#discussion_r623155669
which ended in:
"I'll leave it for y'all to clean up if you find some way to do it elegantly."
This is more idiomatic.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Without this, macOS would fail to display its menu state correctly if you
started it while !WantRunning. It relies on the netmap in order to show
the logged-in username.
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
There was logic that would make a "down" tailscale backend (ie.
!WantRunning) refuse to do any network activity. Unfortunately, this
makes the macOS and iOS UI unable to render correctly if they start
while !WantRunning.
Now that we have Prefs.LoggedOut, use that instead. So `tailscale down`
will still allow the controlclient to connect its authroutine, but
pause the maproutine. `tailscale logout` will entirely stop all
activity.
This new behaviour is not obviously correct; it's a bit annoying that
`tailsale down` doesn't terminate all activity like you might expect.
Maybe we should redesign the UI code to render differently when
disconnected, and then revert this change.
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
EditPrefs should be just a wrapper around the action of changing prefs,
but someone had added a side effect of calling Login() sometimes. The
side effect happened *after* running the state machine, which would
sometimes result in us going into NeedsLogin immediately before calling
cc.Login().
This manifested as the macOS app not being able to Connect if you
launched it with LoggedOut=false and WantRunning=false. Trying to
Connect() would sent us to the NeedsLogin state instead.
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
- Switch to our own simpler token bucket, since x/time/rate is missing
necessary stuff (can't provide your own time func; can't check the
current bucket contents) and it's overkill anyway.
- Add tests that actually include advancing time.
- Don't remove the rate limit on a message until there's enough room to
print at least two more of them. When we do, we'll also print how
many we dropped, as a contextual reminder that some were previously
lost. (This is more like how the Linux kernel does it.)
- Reformat the [RATE LIMITED] messages to be shorter, and to not
corrupt original message. Instead, we print the message, then print
its format string.
- Use %q instead of \"%s\", for more accurate parsing later, if the
format string contained quotes.
Fixes#1772
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
A very long unit test that verifies the way the controlclient and
ipn.Backend interact.
This is a giant sequential test of the state machine. The test passes,
but only because it's asserting all the wrong behaviour. I marked all
the behaviour I think is wrong with BUG comments, and several
additional test opportunities with TODO.
Note: the new test supercedes TestStartsInNeedsLoginState, which was
checking for incorrect behaviour (although the new test still checks
for the same incorrect behaviour) and assumed .Start() would converge
before returning, which it happens to do, but only for this very
specific case, for the current implementation. You're supposed to wait
for the notifications.
Updates: tailscale/corp#1660
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
With this change, shared node names resolve correctly on split DNS-supporting
operating systems.
Fixestailscale/corp#1706
Signed-off-by: David Anderson <danderson@tailscale.com>
Only minimal tailscale + tailscaled for now.
And a super minimal in-memory logcatcher.
No control ... yet.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Pointer receivers used with MarshalJSON are code rakes.
https://github.com/golang/go/issues/22967https://github.com/dominikh/go-tools/issues/911
I just stepped on one, and it hurt. Turn it over.
While we're here, optimize the code a bit.
name old time/op new time/op delta
MarshalJSON-8 184ns ± 0% 44ns ± 0% -76.03% (p=0.000 n=20+19)
name old alloc/op new alloc/op delta
MarshalJSON-8 184B ± 0% 80B ± 0% -56.52% (p=0.000 n=20+20)
name old allocs/op new allocs/op delta
MarshalJSON-8 4.00 ± 0% 1.00 ± 0% -75.00% (p=0.000 n=20+20)
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
For historical reasons, we ended up with two near-duplicate
copies of curve25519 key types, one in the wireguard-go module
(wgcfg) and one in the tailscale module (types/wgkey).
Then we moved wgcfg to the tailscale module.
We can now remove the wgcfg key type in favor of wgkey.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
One of the consequences of the bind refactoring in 6f23087175
is that attempting to bind an IPv6 socket will always
result in c.pconn6.pconn being non-nil.
If the bind fails, it'll be set to a placeholder packet conn
that blocks forever.
As a result, we can always run ReceiveIPv6 and health check it.
This removes IPv4/IPv6 asymmetry and also will allow health checks
to detect any IPv6 receive func failures.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
It must be an IP address; enforce that at the type level.
Suggested-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
We had two separate code paths for the initial UDP listener bind
and any subsequent rebinds.
IPv6 got left out of the rebind code.
Rather than duplicate it there, unify the two code paths.
Then improve the resulting code:
* Rebind had nested listen attempts to try the user-specified port first,
and then fall back to :0 if that failed. Convert that into a loop.
* Initial bind tried only the user-specified port.
Rebind tried the user-specified port and 0.
But there are actually three ports of interest:
The one the user specified, the most recent port in use, and 0.
We now try all three in order, as appropriate.
* In the extremely rare case in which binding to port 0 fails,
use a dummy net.PacketConn whose reads block until close.
This will keep the wireguard-go receive func goroutine alive.
As a pleasant side-effect of this, if we decide that
we need to resuscitate #1796, it will now be much easier.
Fixes#1799
Co-authored-by: David Anderson <danderson@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
Assume it'll stay at 0 forever, so hard-code it
and delete code conditional on it being non-0.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
It was set to context.Background by all callers, for the same reasons.
Set it locally instead, to simplify call sites.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
For when we need to tweak behavior or errors as a function of which of
3 macOS Tailscale variants we're using. (more accessors coming later
as needed)
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The old implementation knew too much about how wireguard-go worked.
As a result, it missed genuine problems that occurred due to unrelated bugs.
This fourth attempt to fix the health checks takes a black box approach.
A receive func is healthy if one (or both) of these conditions holds:
* It is currently running and blocked.
* It has been executed recently.
The second condition is required because receive functions
are not continuously executing. wireguard-go calls them and then
processes their results before calling them again.
There is a theoretical false positive if wireguard-go go takes
longer than one minute to process the results of a receive func execution.
If that happens, we have other problems.
Updates #1790
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
They were not doing their job.
They need yet another conceptual re-think.
Start by clearing the decks.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
We had a long-standing bug in which our TUN events channel
was being received from simultaneously in two places.
The first is wireguard-go.
At wgengine/userspace.go:366, we pass e.tundev to wireguard-go,
which starts a goroutine (RoutineTUNEventReader)
that receives from that channel and uses events to adjust the MTU
and bring the device up/down.
At wgengine/userspace.go:374, we launch a goroutine that
receives from e.tundev, logs MTU changes, and triggers
state updates when up/down changes occur.
Events were getting delivered haphazardly between the two of them.
We don't really want wireguard-go to receive the up/down events;
we control the state of the device explicitly by calling device.Up.
And the userspace.go loop MTU logging duplicates logging that
wireguard-go does when it received MTU updates.
So this change splits the single TUN events channel into up/down
and other (aka MTU), and sends them to the parties that ought
to receive them.
I'm actually a bit surprised that this hasn't caused more visible trouble.
If a down event went to wireguard-go but the subsequent up event
went to userspace.go, we could end up with the wireguard-go device disappearing.
I believe that this may also (somewhat accidentally) be a fix for #1790.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
The intention was always that files only get written to *.partial
files and renamed at the end once fully received, but somewhere in the
process that got lost in buffered mode and *.partial files were only
being used in direct receive mode. This fix prevents WaitingFiles
from returning files that are still being transferred.
Updates tailscale/corp#1626
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
If DeleteFile fails on Windows due to another process (anti-virus,
probably) having our file open, instead leave a marker file that the
file is logically deleted, and remove it from API calls and clean it
up lazily later.
Updates tailscale/corp#1626
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The old decay-based one took a while to converge. This new one (based
very loosely on TCP BBR) seems to converge quickly on what seems to be
the best speed.
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
This tries to generate traffic at a rate that will saturate the
receiver, without overdoing it, even in the event of packet loss. It's
unrealistically more aggressive than TCP (which will back off quickly
in case of packet loss) but less silly than a blind test that just
generates packets as fast as it can (which can cause all the CPU to be
absorbed by the transmitter, giving an incorrect impression of how much
capacity the total system has).
Initial indications are that a syscall about every 10 packets (TCP bulk
delivery) is roughly the same speed as sending every packet through a
channel. A syscall per packet is about 5x-10x slower than that.
The whole tailscale wireguard-go + magicsock + packet filter
combination is about 4x slower again, which is better than I thought
we'd do, but probably has room for improvement.
Note that in "full" tailscale, there is also a tundev read/write for
every packet, effectively doubling the syscall overhead per packet.
Given these numbers, it seems like read/write syscalls are only 25-40%
of the total CPU time used in tailscale proper, so we do have
significant non-syscall optimization work to do too.
Sample output:
$ GOMAXPROCS=2 go test -bench . -benchtime 5s ./cmd/tailbench
goos: linux
goarch: amd64
pkg: tailscale.com/cmd/tailbench
cpu: Intel(R) Core(TM) i7-4785T CPU @ 2.20GHz
BenchmarkTrivialNoAlloc/32-2 56340248 93.85 ns/op 340.98 MB/s 0 %lost 0 B/op 0 allocs/op
BenchmarkTrivialNoAlloc/124-2 57527490 99.27 ns/op 1249.10 MB/s 0 %lost 0 B/op 0 allocs/op
BenchmarkTrivialNoAlloc/1024-2 52537773 111.3 ns/op 9200.39 MB/s 0 %lost 0 B/op 0 allocs/op
BenchmarkTrivial/32-2 41878063 135.6 ns/op 236.04 MB/s 0 %lost 0 B/op 0 allocs/op
BenchmarkTrivial/124-2 41270439 138.4 ns/op 896.02 MB/s 0 %lost 0 B/op 0 allocs/op
BenchmarkTrivial/1024-2 36337252 154.3 ns/op 6635.30 MB/s 0 %lost 0 B/op 0 allocs/op
BenchmarkBlockingChannel/32-2 12171654 494.3 ns/op 64.74 MB/s 0 %lost 1791 B/op 0 allocs/op
BenchmarkBlockingChannel/124-2 12149956 507.8 ns/op 244.17 MB/s 0 %lost 1792 B/op 1 allocs/op
BenchmarkBlockingChannel/1024-2 11034754 528.8 ns/op 1936.42 MB/s 0 %lost 1792 B/op 1 allocs/op
BenchmarkNonlockingChannel/32-2 8960622 2195 ns/op 14.58 MB/s 8.825 %lost 1792 B/op 1 allocs/op
BenchmarkNonlockingChannel/124-2 3014614 2224 ns/op 55.75 MB/s 11.18 %lost 1792 B/op 1 allocs/op
BenchmarkNonlockingChannel/1024-2 3234915 1688 ns/op 606.53 MB/s 3.765 %lost 1792 B/op 1 allocs/op
BenchmarkDoubleChannel/32-2 8457559 764.1 ns/op 41.88 MB/s 5.945 %lost 1792 B/op 1 allocs/op
BenchmarkDoubleChannel/124-2 5497726 1030 ns/op 120.38 MB/s 12.14 %lost 1792 B/op 1 allocs/op
BenchmarkDoubleChannel/1024-2 7985656 1360 ns/op 752.86 MB/s 13.57 %lost 1792 B/op 1 allocs/op
BenchmarkUDP/32-2 1652134 3695 ns/op 8.66 MB/s 0 %lost 176 B/op 3 allocs/op
BenchmarkUDP/124-2 1621024 3765 ns/op 32.94 MB/s 0 %lost 176 B/op 3 allocs/op
BenchmarkUDP/1024-2 1553750 3825 ns/op 267.72 MB/s 0 %lost 176 B/op 3 allocs/op
BenchmarkTCP/32-2 11056336 503.2 ns/op 63.60 MB/s 0 %lost 0 B/op 0 allocs/op
BenchmarkTCP/124-2 11074869 533.7 ns/op 232.32 MB/s 0 %lost 0 B/op 0 allocs/op
BenchmarkTCP/1024-2 8934968 671.4 ns/op 1525.20 MB/s 0 %lost 0 B/op 0 allocs/op
BenchmarkWireGuardTest/32-2 1403702 4547 ns/op 7.04 MB/s 14.37 %lost 467 B/op 3 allocs/op
BenchmarkWireGuardTest/124-2 780645 7927 ns/op 15.64 MB/s 1.537 %lost 420 B/op 3 allocs/op
BenchmarkWireGuardTest/1024-2 512671 11791 ns/op 86.85 MB/s 0.5206 %lost 411 B/op 3 allocs/op
PASS
ok tailscale.com/wgengine/bench 195.724s
Updates #414.
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
NetworkManager fixed the bug that forced us to use NetworkManager
if it's programming systemd-resolved, and in the same release also
made NetworkManager ignore DNS settings provided for unmanaged
interfaces... Which breaks what we used to do. So, with versions
1.26.6 and above, we MUST NOT use NetworkManager to indirectly
program systemd-resolved, but thankfully we can talk to resolved
directly and get the right outcome.
Fixes#1788
Signed-off-by: David Anderson <danderson@tailscale.com>
The existing implementation was completely, embarrassingly conceptually broken.
We aren't able to see whether wireguard-go's receive function goroutines
are running or not. All we can do is model that based on what we have done.
This commit fixes that model.
Fixes#1781
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
Avery reported a sub-ms health transition from "receiveIPv4 not running" to "ok".
To avoid these transient false-positives, be more precise about
the expected lifetime of receive funcs. The problematic case is one in which
they were started but exited prior to a call to connBind.Close.
Explicitly represent started vs running state, taking care with the order of updates.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
The connection failure diagnostic code was never updated enough for
exit nodes, so disable its misleading output when the node it picks
(incorrectly) to diagnose is only an exit node.
Fixes#1754
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The new "tailscale up" checks previously didn't protect against
--advertise-exit-node being omitted in the case that
--advertise-routes was also provided. It wasn't done before because
there is no corresponding pref for "--advertise-exit-node"; it's a
helper flag that augments --advertise-routes. But that's an
implementation detail and we can still help users. We just have to
special case that pref and look whether the current routes include
both the v4 and v6 /0 routes.
Fixes#1767
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This doesn't make --operator implicit (which we might do in the
future), but it at least doesn't require repeating it in the future
when it already matches $USER.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
It was getting cleared on notify.
Document that authURL is cleared on notify and add a new field that
isn't, using the new field for the JSON status.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
I've spent two days searching for a theoretical wireguard-go bug
around receive functions exiting early.
I've found many bugs, but none of the flavor we're looking for.
Restore wireguard-go's logging around starting and stopping receive functions,
so that we can definitively rule in or out this particular theory.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
I see a bunch of these in some logs I'm looking at,
separated only by a few seconds.
Log the error so we can tell what's going on here.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
These were getting rate-limited for nodes with many peers.
Consolate the output into single lines, which are nicer anyway.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
With this change, the ipnserver's safesocket.Listen (the localhost
tcp.Listen) happens right away, before any synchronous
TUN/DNS/Engine/etc setup work, which might be slow, especially on
early boot on Windows.
Because the safesocket.Listen starts up early, that means localhost
TCP dials (the safesocket.Connect from the GUI) complete successfully
and thus the GUI avoids the MessageBox error. (I verified that
pacifies it, even without a Listener.Accept; I'd feared that Windows
localhost was maybe special and avoided the normal listener backlog).
Once the GUI can then connect immediately without errors, the various
timeouts then matter less, because the backend is no longer trying to
race against the GUI's timeout. So keep retrying on errors for a
minute, or 10 minutes if the system just booted in the past 10
minutes.
This should fix the problem with Windows 10 desktops auto-logging in
and starting the Tailscale frontend which was then showing a
MessageBox error about failing to connect to tailscaled, which was
slow coming up because the Windows networking stack wasn't up
yet. Fingers crossed.
Fixes#1313 (previously #1187, etc)
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This change implements Windows version of install-system-daemon and
uninstall-system-daemon subcommands. When running the commands the
user will install or remove Tailscale Windows service.
Updates #1232
Signed-off-by: Alex Brainman <alex.brainman@gmail.com>
This used to not be necessary, because MagicDNS always did full proxying.
But with split DNS, we need to know which names to route to our resolver,
otherwise reverse lookups break.
This captures the entire CGNAT range, as well as our Tailscale ULA.
Signed-off-by: David Anderson <danderson@tailscale.com>
Otherwise, the existence of authoritative domains forces full
DNS proxying even when no other DNS config is present.
Signed-off-by: David Anderson <danderson@tailscale.com>
Logout used to be a no-op, so the ipnserver previously synthensized a Logout
on disconnect. Now that Logout actually invalidates the node key that was
forcing all GUI closes to log people out.
Instead, add a method to LocalBackend to specifically mean "the
Windows GUI closed, please forget all the state".
Fixestailscale/corp#1591 (ignoring the notification issues, tracked elsewhere)
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Let caller (macOS) do it so Finder progress bar can be dismissed
without races.
Updates tailscale/corp#1575
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We were accidentally logging oldPort -> oldPort.
Log oldPort as well as c.port; if we failed to get the preferred port
in a previous rebind, oldPort might differ from c.port.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
On macOS, we link the CLI into the GUI executable so it can be included in
the Mac App Store build.
You then need to run it like:
/Applications/Tailscale.app/Contents/MacOS/Tailscale <command>
But our old detection of whether you're running that Tailscale binary
in CLI mode wasn't accurate and often bit people. For instance, when
they made a typo, it then launched in GUI mode and broke their
existing GUI connection (starting a new IPNExtension) and took down
their network.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
It used to just store received files URL-escaped on disk, but that was
a half done lazy implementation, and pushed the burden to callers to
validate and write things to disk in an unescaped way.
Instead, do all the validation in the receive handler and only
accept filenames that are UTF-8 and in the intersection of valid
names that all platforms support.
Fixestailscale/corp#1594
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
So the NetworkMap-from-incremental-MapResponses can be tested easily.
And because direct.go was getting too big.
No change in behavior at this point. Just movement.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The ipn.NewPrefs func returns a populated ipn.Prefs for historical
reasons. It's not used or as important as it once was, but it hasn't
yet been removed. Meanwhile, it contains some default values that are
used on some platforms. Notably, for this bug (#1725), Windows/Mac use
its Prefs.RouteAll true value (to accept subnets), but Linux users
have always gotten a "false" value for that, because that's what
cmd/tailscale's CLI default flag is _for all operating systems_. That
meant that "tailscale up" was rightfully reporting that the user was
changing an implicit setting: RouteAll was changing from true with
false with the user explicitly saying so.
An obvious fix might be to change ipn.NewPrefs to return
Prefs.RouteAll == false on some platforms, but the logic is
complicated by darwin: we want RouteAll true on windows, android, ios,
and the GUI mac app, but not the CLI tailscaled-on-macOS mode. But
even if we used build tags (e.g. the "redo" build tag) to determine
what the default is, that then means we have duplicated and differing
"defaults" between both the CLI up flags and ipn.NewPrefs. Furthering
that complication didn't seem like a good idea.
So, changing the NewPrefs defaults is too invasive at this stage of
the release, as is removing the NewPrefs func entirely.
Instead, tweak slightly the semantics of the ipn.Prefs.ControlURL
field. This now defines that a ControlURL of the empty string means
both "we're uninitialized" and also "just use the default".
Then, once we have the "empty-string-means-unintialized" semantics,
use that to suppress "tailscale up"'s recent implicit-setting-revert
checking safety net, if we've never initialized Tailscale yet.
And update/add tests.
Fixes#1725
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Will add more tests later but this locks in all the existing warnings
and errors at least, and some of the existing non-error behavior.
Mostly I want this to exist before I actually fix#1725.
Updates #1725
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
And fix PeerSeenChange bug where it was ignored unless there were
other peer changes.
Updates tailscale/corp#1574
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Track endpoints internally with a new tailcfg.Endpoint type that
includes a typed netaddr.IPPort (instead of just a string) and
includes a type for how that endpoint was discovered (STUN, local,
etc).
Use []tailcfg.Endpoint instead of []string internally.
At the last second, send it to the control server as the existing
[]string for endpoints, but also include a new parallel
MapRequest.EndpointType []tailcfg.EndpointType, so the control server
can start filtering out less-important endpoint changes from
new-enough clients. Notably, STUN-discovered endpoints can be filtered
out from 1.6+ clients, as they can discover them amongst each other
via CallMeMaybe disco exchanges started over DERP. And STUN endpoints
change a lot, causing a lot of MapResposne updates. But portmapped
endpoints are worth keeping for now, as they they work right away
without requiring the firewall traversal extra RTT dance.
End result will be less control->client bandwidth. (despite negligible
increase in client->control bandwidth)
Updates tailscale/corp#1543
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
They were scattered/duplicated in misc places before.
It can't be in the client package itself for circular dep reasons.
This new package is basically tailcfg but for localhost
communications, instead of to control.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This changes the behavior of "tailscale up".
Previously "tailscale up" always did a new Start and reset all the settings.
Now "tailscale up" with no flags just brings the world [back] up.
(The opposite of "tailscale down").
But with flags, "tailscale up" now only is allowed to change
preferences if they're explicitly named in the flags. Otherwise it's
an error. Or you need to use --reset to explicitly nuke everything.
RELNOTE=tailscale up change
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Some paths already didn't. And in the future I hope to shut all the
notify funcs down end-to-end when nothing is connected (as in the
common case in tailscaled). Then we can save some JSON encoding work.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We've been slowly making Start less special and making IPN a
multi-connection "watch" bus of changes, but this Start specialness
had remained.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Clear LLMNR and mdns flags, update reasoning for our settings,
and set our override priority harder than before when we want
to be primary resolver.
Signed-off-by: David Anderson <danderson@tailscale.com>
Debian resolvconf is not legacy, it's alive and well,
just historically before the other implementations.
Signed-off-by: David Anderson <danderson@tailscale.com>
On FreeBSD, we add the interface IP as a /48 to work around a kernel
bug, so we mustn't then try to add a /48 route to the Tailscale ULA,
since that will fail as a dupe.
Signed-off-by: David Anderson <danderson@tailscale.com>
It was only Linux and BSDs before, but now with netstack mode, it also works on
Windows and darwin. It's not worth limiting it to certain platforms.
Tailscaled itself can complain/fail if it doesn't like the settings
for the mode/OS it's operating under.
Updates #707
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This allows split-DNS configurations to not break clients on OSes that
haven't yet been ported to understand split DNS, by falling back to quad-9
as a global resolver when handed an "impossible to implement"
split-DNS config.
Part of #953. Needs to be removed before shipping 1.8.
Signed-off-by: David Anderson <danderson@tailscale.com>
With this change, all OSes can sort-of do split DNS, except that the
default upstream is hardcoded to 8.8.8.8 pending further plumbing.
Additionally, Windows 8-10 can do split DNS fully correctly, without
the 8.8.8.8 hack.
Part of #953.
Signed-off-by: David Anderson <danderson@tailscale.com>
When searching for the matching client identity, the returned
certificate chain was accidentally set to that of the last identity
returned by the certificate store instead of the one corresponding to
the selected identity.
Also, add some extra error checking for invalid certificate chains, just
in case.
Signed-off-by: Adrian Dewhurst <adrian@tailscale.com>
We already had SetNotifyCallback elsewhere on controlclient, so use
that name.
Baby steps towards some CLI refactor work.
Updates tailscale/tailscale#1436
It seems that all the setups that support split DNS understand
this distinction, and it's an important one when translating
high-level configuration.
Part of #953.
Signed-off-by: David Anderson <danderson@tailscale.com>
Correctly reports that Win7 cannot do split DNS, and has a helper to
discover the "base" resolvers for the system.
Part of #953
Signed-off-by: David Anderson <danderson@tailscale.com>
OS implementations are going to support split DNS soon.
Until they're all in place, hardcode Primary=true to get
the old behavior.
Signed-off-by: David Anderson <danderson@tailscale.com>
This is usually the same as the requested interface, but on some
unixes can vary based on device number allocation, and on Windows
it's the GUID instead of the pretty name, since everything relating
to configuration wants the GUID.
Signed-off-by: David Anderson <danderson@tailscale.com>
wgengine/router.CallbackRouter needs to support both the Router
and OSConfigurator interfaces, so the setters can't both be called
Set.
Signed-off-by: David Anderson <danderson@tailscale.com>
It existed to work around the frequent opening and closing
of the conn.Bind done by wireguard-go.
The preceding commit removed that behavior,
so we can simply close the connections
when we are done with them.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
We don't use the port that wireguard-go passes to us (via magicsock.connBind.Open).
We ignore it entirely and use the port we selected.
When we tell wireguard-go that we're changing the listen_port,
it calls connBind.Close and then connBind.Open.
And in the meantime, it stops calling the receive functions,
which means that we stop receiving and processing UDP and DERP packets.
And that is Very Bad.
That was never a problem prior to b3ceca1dd7,
because we passed the SkipBindUpdate flag to our wireguard-go fork,
which told wireguard-go not to re-bind on listen_port changes.
That commit eliminated the SkipBindUpdate flag.
We could write a bunch of code to work around the gap.
We could add background readers that process UDP and DERP packets when wireguard-go isn't.
But it's simpler to never create the conditions in which wireguard-go rebinds.
The other scenario in which wireguard-go re-binds is device.Down.
Conveniently, we never call device.Down. We go from device.Up to device.Close,
and the latter only when we're shutting down a magicsock.Conn completely.
Rubber-ducked-by: Avery Pennarun <apenwarr@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The shim implements both network and DNS configurators,
and feeds both into a single callback that receives
both configs.
Signed-off-by: David Anderson <danderson@tailscale.com>
Upstream wireguard-go has changed its receive model.
NewDevice now accepts a conn.Bind interface.
The conn.Bind is stateless; magicsock.Conns are stateful.
To work around this, we add a connBind type that supports
cheap teardown and bring-up, backed by a Conn.
The new conn.Bind allows us to specify a set of receive functions,
rather than having to shoehorn everything into ReceiveIPv4 and ReceiveIPv6.
This lets us plumbing DERP messages directly into wireguard-go,
instead of having to mux them via ReceiveIPv4.
One consequence of the new conn.Bind layer is that
closing the wireguard-go device is now indistinguishable
from the routine bring-up and tear-down normally experienced
by a conn.Bind. We thus have to explicitly close the magicsock.Conn
when the close the wireguard-go device.
One downside of this change is that we are reliant on wireguard-go
to call receiveDERP to process DERP messages. This is fine for now,
but is perhaps something we should fix in the future.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The common Linux start-up path (fallback file defined but not
existing) was missing the log print of initializing Prefs. The code
was too twisty. Simplify a bit.
Updates #1573
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
They need some rework to do the right thing, in the meantime the direct
and resolvconf managers will work out.
The resolved implementation was never selected due to control-side settings.
The networkmanager implementation mostly doesn't get selected due to
unforeseen interactions with `resolvconf` on many platforms.
Both implementations also need rework to support the various routing modes
they're capable of.
Signed-off-by: David Anderson <danderson@tailscale.com>
It's only use to skip some optional initialization during cleanup,
but that work is very minor anyway, and about to change drastically.
Signed-off-by: David Anderson <danderson@tailscale.com>
It's currently unused, and no longer makes sense with the upcoming
DNS infrastructure. Keep it in tailcfg for now, since we need protocol
compat for a bit longer.
Signed-off-by: David Anderson <danderson@tailscale.com>
Old macOS clients required we populate this field to a non-null
value so we were unable to remove this field before.
Instead, keep the field but change its type to a custom empty struct
that can marshal/unmarshal JSON. And lock it in with a test.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The bool was already called useNetstack at the caller.
isUserspace (to mean netstack) is confusing next to wgengine.NewUserspaceEngine, as that's
a different type of 'userspace'.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The code is not obviously better or worse, but this makes the little warning
triangle in my editor go away, and the distraction removal is worth it.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Google Cloud Run does not implement NETLINK_ROUTE RTMGRP.
If initialization of the netlink socket or group membership
fails, fall back to a polling implementation.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
The resolver still only supports a single upstream config, and
ipn/wgengine still have to split up the DNS config, but this moves
closer to unifying the DNS configs.
As a handy side-effect of the refactor, IPv6 MagicDNS records exist
now.
Signed-off-by: David Anderson <danderson@tailscale.com>
They're only used internally and in tests, and have surprising
semantics in that they only resolve MagicDNS names, not upstream
resolver queries.
Signed-off-by: David Anderson <danderson@tailscale.com>
This adds a new ipn.MaskedPrefs embedding a ipn.Prefs, along with a
bunch of "has bits", kept in sync with tests & reflect.
Then it adds a Prefs.ApplyEdits(MaskedPrefs) method.
Then the ipn.Backend interface loses its weirdo SetWantRunning(bool)
method (that I added in 483141094c for "tailscale down")
and replaces it with EditPrefs (alongside the existing SetPrefs for now).
Then updates 'tailscale down' to use EditPrefs instead of SetWantRunning.
In the future, we can use this to do more interesting things with the
CLI, reconfiguring only certain properties without the reset-the-world
"tailscale up".
Updates #1436
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We were going to remove this in Tailscale 1.3 but forgot.
This means Tailscale 1.8 users won't be able to downgrade to Tailscale
1.0, but that's fine.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Adding a subcommand which prints and logs a log marker. This should help
diagnose any issues that users face.
Fixes#1466
Signed-off-by: Maisem Ali <maisem@tailscale.com>
Instead of having the CLI check whether IP forwarding is enabled, ask
tailscaled. It has a better idea. If it's netstack, for instance, the
sysctl values don't matter. And it's possible that only the daemon has
permission to know.
Fixes#1626
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The call to appendEndpoint updates cpeer.Endpoints.
Then it is overwritten in the next line.
The only errors from appendEndpoint occur when
the host/port pair is malformed, but that cannot happen.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
IPv6 Unique Local Addresses are sometimes used with Network
Prefix Translation to reach the Internet. In that respect
their use is similar to the private IPv4 address ranges
10/8, 172.16/12, and 192.168/16.
Treat them as sufficient for AnyInterfaceUp(), but specifically
exclude Tailscale's own IPv6 ULA prefix to avoid mistakenly
trying to bootstrap Tailscale using Tailscale.
This helps in supporting Google Cloud Run, where the addresses
are 169.254.8.1/32 and fddf:3978:feb1:d745::c001/128 on eth1.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
It can end up executing an a new goroutine,
at which point instead of immediately stopping test execution, it hangs.
Since this is unexpected anyway, panic instead.
As a bonus, it makes call sites nicer and removes a kludge comment.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Without this, `tailscale status` ignores the --socket flag on macOS and
always talks to the IPNExtension, even if you wanted it to inspect a
userspace tailscaled.
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
So we have a documented & tested way to check whether we're in
netstack mode. To be used by future commits.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
For discovery when an explicit hostname/IP is known. We'll still
also send it via control for finding peers by a list.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The concrete type being encoded changed from a value to pointer
earlier and this was never adjusted.
(People don't frequently use TS_DEBUG_MAP to see requests, so it went
unnoticed until now.)
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
"Fake" doesn't mean a lot any more, given that many components
of the engine can be faked out, including in valid production
configurations like userspace-networking.
Signed-off-by: David Anderson <danderson@tailscale.com>
This makes setup more explicit in prod codepaths, without
requiring a bunch of arguments or helpers for tests and
userspace mode.
Signed-off-by: David Anderson <danderson@tailscale.com>
The Windows CI machine experiences significant random execution delays.
For example, in this code from watchdog.go:
done := make(chan bool)
go func() {
start := time.Now()
mu.Lock()
There was a 500ms delay from initializing done to locking mu.
This test checks that we receive a sufficient number of events quickly enough.
In the face of random 500ms delays, unsurprisingly, the test fails.
There's not much principled we can do about it.
We could build a system of retries or attempt to detect these random delays,
but that game isn't worth the candle.
Skip the test.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
This works around the close syscall being slow.
We can revert this if we find a fix or if Apple makes close fast again.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The tstun packagen contains both constructors for generic tun
Devices, and a wrapper that provides additional functionality.
Signed-off-by: David Anderson <danderson@tailscale.com>
IPv4 and IPv6 both work remotely, but IPv6 doesn't yet work from the
machine itself due to routing mysteries.
Untested yet on iOS, but previous prototype worked on iOS, so should
work the same.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Now callers (wgengine/monitor) don't need to mutate the state to remove
boring interfaces before calling State.Equal. Instead, the methods
to remove boring interfaces from the State are removed, as is
the reflect-using Equal method itself, and in their place is
a new EqualFiltered method that takes a func predicate to match
interfaces to compare.
And then the FilterInteresting predicate is added for use
with EqualFiltered to do the job that that wgengine/monitor
previously wanted.
Now wgengine/monitor can keep the full interface state around,
including the "boring" interfaces, which we'll need for peerapi on
macOS/iOS to bind to the interface index of the utunN device.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We have it already but threw it away. But macOS/iOS code will
be needing the interface index, so hang on to it.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
control/controlclient: sign RegisterRequest
Some customers wish to verify eligibility for devices to join their
tailnets using machine identity certificates. TLS client certs could
potentially fulfill this role but the initial customer for this feature
has technical requirements that prevent their use. Instead, the
certificate is loaded from the Windows local machine certificate store
and uses its RSA public key to sign the RegisterRequest message.
There is room to improve the flexibility of this feature in future and
it is currently only tested on Windows (although Darwin theoretically
works too), but this offers a reasonable starting place for now.
Updates tailscale/coral#6
Signed-off-by: Adrian Dewhurst <adrian@tailscale.com>
So we can empty import the guts of cmd/tailscaled from another
module for go mod tidy reasons.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This adds an easy and portable way for us to document how to get
your Tailscale IP address.
$ tailscale ip
100.74.70.3
fd7a:115c:a1e0:ab12:4843:cd96:624a:4603
$ tailscale ip -4
100.74.70.3
$ tailscale ip -6
fd7a:115c:a1e0:ab12:4843:cd96:624a:4603
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
e.g.
$ tailscale ping 1.1.1.1
exit node found but not enabled
$ tailscale ping 10.2.200.2
node "tsbfvlan2" found, but not using its 10.2.200.0/24 route
$ sudo tailscale up --accept-routes
$ tailscale ping 10.2.200.2
pong from tsbfvlan2 (100.124.196.94) via 10.2.200.34:41641 in 1ms
$ tailscale ping mon.ts.tailscale.com
pong from monitoring (100.88.178.64) via DERP(sfo) in 83ms
pong from monitoring (100.88.178.64) via DERP(sfo) in 21ms
pong from monitoring (100.88.178.64) via [2604:a880:4:d1::37:d001]:41641 in 22ms
This necessarily moves code up from magicsock to wgengine, so we can
look at the actual wireguard config.
Fixes#1564
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Add proto to flowtrack.Tuple.
Add types/ipproto leaf package to break a cycle.
Server-side ACL work remains.
Updates #1516
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Mash up some code from ffcli and std's flag package to make a default
usage func that's super explicit for those not familiar with the Go
style flags. Only show double hyphens in usage text (but still accept both),
and show default values, and only show the proper usage of boolean flags.
Fixes#1353Fixes#1529
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
"public IP" is defined as an IP address configured on the exit node
itself that isn't in the list of forbidden ranges (RFC1918, CGNAT,
Tailscale).
Fixes#1522.
Signed-off-by: David Anderson <danderson@tailscale.com>
The direct client already logs it in JSON form. Then it's immediately
logged again in an unformatted dump, so this removes that unformatted
one.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This reverts the revert commit 84aba349d9.
And changes us to use inet.af/netstack.
Updates #1518
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
In f45a9e291b (2021-03-04), I tried to bump CurrentMapRequestVersion
to 12 but only documented the meaning of 12 but forgot to actually
increase it from 11.
Mapver 11 was added in ea49b1e811 (2021-03-03).
Fix this in its own commit so we can cherry-pick it to the 1.6 release
branch.
Should help iOS battery life on NEProvider.wake/skip events
with useless route updates that shouldn't cause re-STUNs.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We strip them control-side anyway, and we already strip IPv4 link
local, so there's no point uploading them. And iOS has a ton of them,
which results in somewhat silly amount of traffic in the MapRequest.
We'll be doing same-LAN-inter-tailscaled link-local traffic a
different way, with same-LAN discovery.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
To atone for 1d7f9d5b4a, the revert of 4224b3f731.
At least it's fast again, even if it's shelling out to cmd.exe (once now).
Updates #1478
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
gVisor fixed their google/gvisor#1446 so we can include gVisor mode
on 32-bit machines.
A few minor upstream API changes, as normal.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We basically already had the RIB-parsing Go code for this in both
net/interfaces and wgengine/monitor, for other reasons.
Fixes#1426Fixes#1471
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This change makes it impossible to set your own IP address as the exit node for this system.
Fixes#1489
Signed-off-by: Christine Dodrill <xe@tailscale.com>
So a region can be used if needed, but won't be STUN-probed or used as
its home.
This gives us another possible debugging mechanism for #1310, or can
be used as a short-term measure against DERP flip-flops for people
equidistant between regions if our hysteresis still isn't good enough.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
IP forwarding is not required when advertising a machine's local IPs
over Tailscale.
Fixes#1435.
Signed-off-by: David Anderson <danderson@tailscale.com>
This reverts commit 08949d4ef1.
I think this code was aspirational. There's no code that sets up the
appropriate NAT code using pfctl/etc. See #911 and #1475.
Updates #1475
Updates #911
No server support yet, but we want Tailscale 1.6 clients to be able to respond
to them when the server can do it.
Updates #1310
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
There was a logical race where Conn.Rebind could acquire the
RebindingUDPConn mutex, close the connection, fail to rebind, release
the mutex, and then because the mutex was no longer held, ReceiveIPv4
wouldn't retry reads that failed with net.ErrClosed, letting that
error back to wireguard-go, which would then stop running that receive
IP goroutine.
Instead, keep the RebindingUDPConn mutex held for the entirety of the
replacement in all cases.
Updates tailscale/corp#1289
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
interfaces.State.String tries to print a concise summary of the
network state, removing any interfaces that don't have any or any
interesting IP addresses. On macOS and iOS, for instance, there are a
ton of misc things.
But the link monitor based its are-there-changes decision on
interfaces.State.Equal, which just used reflect.DeepEqual, including
comparing all the boring interfaces. On macOS, when turning wifi on or off, there
are a ton of misc boring interface changes, resulting in hitting an earlier
check I'd added on suspicion this was happening:
[unexpected] network state changed, but stringification didn't
This fixes that by instead adding a new
interfaces.State.RemoveUninterestingInterfacesAndAddresses method that
does, uh, that. Then use that in the monitor. So then when Equal is
used later, it's DeepEqualing the already-cleaned version with only
interesting interfaces.
This makes cmd/tailscaled debug --monitor much less noisy.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Windows was only running the localapi on the debug port which was a
stopgap at the time while doing peercreds work. Removed that, and
wired it up correctly, with some more docs.
More clean-up to do after 1.6, moving the localhost TCP auth code into
the peercreds package. But that's too much for now, so the docs will
have to suffice, even if it's at a bit of an awkward stage with the
newly-renamed "NotWindows" field, which still isn't named well, but
it's better than its old name of "Unknown" which hasn't been accurate
since unix sock peercreds work anyway.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
So the control server can test whether a client's actually present.
Most clients are over HTTP/2, so these pings (to the same host) are
super cheap.
This mimics the earlier goroutine dump mechanism.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Previously the CLI could only find the HTTP auth token when running
the CLI outside the sandbox, not like
/Applications/Tailscale.app/Contents/MacOS/Tailscale when that was
from the App Store.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The debub subcommand was moved in
6254efb9ef because the monitor brought
in tons of dependencies to the cmd/tailscale binary, but there wasn't
any need to remove the whole subcommand itself.
Add it back, with a tool to dump the local daemon's goroutines.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Not beautiful, but I'm debugging connectivity problems on
NEProvider.sleep+wake and need more clues.
Updates #1426
Updates tailscale/corp#1289
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This is important because some of those v6 sockets are actually
dual-stacked sockets, so this is our only chance of discovering
some services.
Fixes#1443.
Signed-off-by: David Anderson <danderson@tailscale.com>
And if we have over 10,000 CGNAT routes, just route the entire
CGNAT range. (for the hello test server)
Fixes#1450
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The Engine.LinkChange method was recently removed in
e3df29d488 while misremembering how
Android's link state mechanism worked.
Rather than do some last minute rearchitecting of link state on
Android before Tailscale 1.6, restore the old Engine.LinkChange hook
for now so the Android client doesn't need any changes. But change how
it's implemented to instead inject an event into the link monitor.
Fixes#1427
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This is necessary because either protocol can be disabled globally by a
Windows registry policy, at which point trying to touch that address
family results in "Element not found" errors. This change skips programming
address families that Windows tell us are unavailable.
Fixes#1396.
Signed-off-by: David Anderson <danderson@tailscale.com>
Not great, but lets people working on new ports get going more quickly
without having to do everything up front.
As the link monitor is getting used more, I felt bad having a useless
implementation.
Updates #815
Updates #1427
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
DefaultRouteInterface was previously guarded by build tags such that
it was only accessible to tailscaled-on-macos, but there was no reason
for that. It runs fine in the sandbox and gives better default info,
so merge its file into interfaces_darwin.go.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We used to allow that, but now it just crashes.
Separately I need to figure out why it got into this path at all,
which is #1416.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Tor has a location-hidden service feature that enables users to host services
from inside the Tor network. Each of these gets a unique DNS name that ends with
.onion. As it stands now, if a misbehaving application somehow manages to make
a .onion DNS request to our DNS server, we will forward that to the DNS server,
which could leak that to malicious third parties. See the recent bug Brave had
with this[1] for more context.
RFC 7686 suggests that name resolution APIs and libraries MUST respond with
NXDOMAIN unless they can actually handle Tor lookups. We can't handle .onion
lookups, so we reject them.
[1]: https://twitter.com/albinowax/status/1362737949872431108Fixestailscale/corp#1351
Signed-off-by: Christine Dodrill <xe@tailscale.com>
Part of overall effort to clean up, unify, use link monitoring more,
and make Tailscale quieter when all networks are down. This is especially
bad on macOS where we can get killed for not being polite it seems.
(But we should be polite in any case)
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Don't use os.NewFile or (*os.File).Close on the AF_ROUTE socket. It
apparently does weird things to the fd and at least doesn't seem to
close it. Just use the unix package.
The test doesn't actually fail reliably before the fix, though. It
was an attempt. But this fixes the integration tests.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Prior to e3df29d488, the Engine.SetLinkChangeCallback fired
immediately, even if there was no change. The ipnlocal code apparently
depended on that, and it broke integration tests (which live in
another repo). So mimic the old behavior and call the ipnlocal
callback immediately at init.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This is a fork of wireguard-windows's firewall package, with
the firewall rules adjusted to better line up with tailscale's
needs.
The package was taken from commit 3cc76ed5f222ec82748ef3bd8c41d4b059e28cdb
in our fork of wireguard-go.
Signed-off-by: David Anderson <danderson@tailscale.com>
Gets it out of wgengine so the Engine isn't responsible for being a
callback registration hub for it.
This also removes the Engine.LinkChange method, as it's no longer
necessary. The monitor tells us about changes; it doesn't seem to
need any help. (Currently it was only used by Swift, but as of
14dc790137 we just do the same from Go)
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
And add a --socks5-server flag.
And fix a race in SOCKS5 replies where the response header was written
concurrently with the copy from the backend.
Co-authored with Naman Sood.
Updates #707
Updates #504
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Previously tailscaled on macOS was running "/sbin/route monitor" as a
child process, but child processes aren't allowed in the Network
Extension / App Store sandbox. Instead, just do what "/sbin/route monitor"
itself does: unix.Socket(unix.AF_ROUTE, unix.SOCK_RAW, 0) and read that.
We also parse it now, but don't do anything with the parsed results yet.
We will over time, as we have with Linux netlink messages over time.
Currently any message is considered a signal to poll and see what changed.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Currently it assumes exactly 1 registered callback. This changes it to
support 0, 1, or more than 1.
This is a step towards plumbing wgengine/monitor into more places (and
moving some of wgengine's interface state fetching into monitor in a
later step)
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
For option (d) of #1405.
For an HTTPS request of /bootstrap-dns, this returns e.g.:
{
"log.tailscale.io": [
"2600:1f14:436:d603:342:4c0d:2df9:191b",
"34.210.105.16"
],
"login.tailscale.com": [
"2a05:d014:386:203:f8b4:1d5a:f163:e187",
"3.121.18.47"
]
}
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
UIs need to see the full unedited netmap in order to know what exit nodes they
can offer to the user.
Signed-off-by: David Anderson <danderson@tailscale.com>
* move probing out of netcheck into new net/portmapper package
* use PCP ANNOUNCE op codes for PCP discovery, rather than causing
short-lived (sub-second) side effects with a 1-second-expiring map +
delete.
* track when we heard things from the router so we can be less wasteful
in querying the router's port mapping services in the future
* use portmapper from magicsock to map a public port
Fixes#1298Fixes#1080Fixes#1001
Updates #864
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Importing the non-main package was missing some dependencies that
"go mod tidy" would then cleanup. Also added a non-ignore build tag to
avoid other tools getting upset about importing a main package.
Signed-off-by: Filippo Valsorda <hi@filippo.io>
$ GOOS=openbsd GOARCH=arm64 go install tailscale.com/cmd/...@latest
pkg/mod/github.com/kr/pty@v1.1.4-0.20190131011033-7dc38fb350b1/pty_openbsd.go:24:10: undefined: ptmget
pkg/mod/github.com/kr/pty@v1.1.4-0.20190131011033-7dc38fb350b1/pty_openbsd.go:25:34: undefined: ioctl_PTMGET
"go mod tidy" did some unrelated work in go.sum, maybe because it was
not run with Go 1.16 before.
Signed-off-by: Filippo Valsorda <hi@filippo.io>
This makes cidrDiff do as much as possible before failing, and makes a
delete of an already-deleted rule be a no-op. We should never do this
ourselves, but other things on the system can, and this should help us
recover a bit.
Also adds the start of root-requiring tests.
TODO: hook into wgengine/monitor and notice when routes are changed
behind our back, and invalidate our routes map and re-read from
kernel (via the ip command) at least on the next reconfig call.
Updates tailscale/corp#1338
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
And open up socket permissions like Linux, now that we know who
connections are from.
This uses the new inet.af/peercred that supports Linux and Darwin at
the moment.
Fixes#1347Fixes#1348
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Tangentially related to #987, #177, #594, #925, #505
Motivated by rebooting a launchd-controlled tailscaled and it going
into SetNetworkUp(false) mode immediately because there really is no
network up at system boot, but then it got stuck in that paused state
forever, without a monitor implementation.
The interface.State logging tried to only log interfaces which had
interesting IPs, but the what-is-interesting checks differed between
the code that gathered the interface names to print and the printing
of their addresses.
When a handshake race occurs, a queued data packet can get lost.
TestTwoDevicePing expected that the very first data packet would arrive.
This caused occasional flakes.
Change TestTwoDevicePing to repeatedly re-send packets
and succeed when one of them makes it through.
This is acceptable (vs making WireGuard not drop the packets)
because this only affects communication with extremely old clients.
And those extremely old clients will eventually connect,
because the kernel will retry sends on timeout.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
We modified the standard net package to not allocate a *net.UDPAddr
during a call to (*net.UDPConn).ReadFromUDP if the caller's use
of the *net.UDPAddr does not cause it to escape.
That is https://golang.org/cl/291390.
This is the companion change to magicsock.
There are two changes required.
First, call ReadFromUDP instead of ReadFrom, if possible.
ReadFrom returns a net.Addr, which is an interface, which always allocates.
Second, reduce the lifetime of the returned *net.UDPAddr.
We do this by immediately converting it into a netaddr.IPPort.
We left the existing RebindingUDPConn.ReadFrom method in place,
as it is required to satisfy the net.PacketConn interface.
With the upstream change and both of these fixes in place,
we have removed one large allocation per packet received.
name old time/op new time/op delta
ReceiveFrom-8 16.7µs ± 5% 16.4µs ± 8% ~ (p=0.310 n=5+5)
name old alloc/op new alloc/op delta
ReceiveFrom-8 112B ± 0% 64B ± 0% -42.86% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
ReceiveFrom-8 3.00 ± 0% 2.00 ± 0% -33.33% (p=0.008 n=5+5)
Co-authored-by: Sonia Appasamy <sonia@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
addrSet maintained duplicate lists of netaddr.IPPorts and net.UDPAddrs.
Unify to use the netaddr type only.
This makes (*Conn).ReceiveIPvN a bit uglier,
but that'll be cleaned up in a subsequent commit.
This is preparatory work to remove an allocation from ReceiveIPv4.
Co-authored-by: Sonia Appasamy <sonia@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
I based my estimation of the required timeout based on locally
observed behavior. But CI machines are worse than my local machine.
16s was enough to reduce flakiness but not eliminate it. Bump it up again.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Build tags have been updated to build native Apple M1 binaries, existing build
tags for ios have been changed from darwin,arm64 to ios,arm64.
With this change, running go build cmd/tailscale{,d}/tailscale{,d}.go on an Apple
machine with the new processor works and resulting binaries show the expected
architecture, e.g. tailscale: Mach-O 64-bit executable arm64.
Tested using go version go1.16beta1 darwin/arm64.
Updates #943
Signed-off-by: moncho <50428+moncho@users.noreply.github.com>
It only affects 'go install ./...', etc, and only on darwin/arm64 (M1 Macs) where
the go-ole package doesn't compile.
No need to build it.
Updates #943
This was in place because retrieved allowed_ips was very expensive.
Upstream changed the data structure to make them cheaper to compute.
This commit is an experiment to find out whether they're now cheap enough.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
We removed the "fast retry" code from our wireguard-go fork.
As a result, pings can take longer to transit when retries are required.
Allow that.
Fixes#1277
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The fix can make this test run unconditionally.
This moves code from 5c619882bc for
testability but doesn't fix it yet. The #1282 problem remains (when I
wrote its wake-up mechanism, I forgot there were N DERP readers
funneling into 1 UDP reader, and the code just isn't correct at all
for that case).
Also factor out some test helper code from BenchmarkReceiveFrom.
The refactoring in magicsock.go for testability should have no
behavior change.
If no exit node is specified, the filter must still run to remove
offered default routes from all peers.
Signed-off-by: David Anderson <danderson@tailscale.com>
This one alone doesn't modify the global dependency map much
(depaware.txt if anything looks slightly worse), but it leave
controlclient as only containing NetworkMap:
bradfitz@tsdev:~/src/tailscale.com/ipn$ grep -F "controlclient." *.go
backend.go: NetMap *controlclient.NetworkMap // new netmap received
fake_test.go: b.notify(Notify{NetMap: &controlclient.NetworkMap{}})
fake_test.go: b.notify(Notify{NetMap: &controlclient.NetworkMap{}})
handle.go: netmapCache *controlclient.NetworkMap
handle.go:func (h *Handle) NetMap() *controlclient.NetworkMap {
Once that goes into a leaf package, then ipn doesn't depend on
controlclient at all, and then the client gets smaller.
Updates #1278
Upstream wireguard-go decided to use errors.Is(err, net.ErrClosed)
instead of checking the error string.
It also provided an unsafe linknamed version of net.ErrClosed
for clients running Go 1.15. Switch to that.
This reduces the time required for the wgengine/magicsock tests
on my machine from ~35s back to the ~13s it was before
456cf8a376.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Magicsock started dropping all traffic internally when Tailscale is
shut down, to avoid spurious wireguard logspam. This made the benchmark
not receive anything. Setting a dummy private key is sufficient to get
magicsock to pass traffic for benchmarking purposes.
Fixes#1270.
Signed-off-by: David Anderson <danderson@tailscale.com>
And move a couple other types down into leafier packages.
Now cmd/tailscale doesn't bring in netlink, magicsock, wgengine, etc.
Fixes#1181
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Unused for now, but I want to backport this commit to 1.4 so 1.6 can
start sending these and then at least 1.4 logs will stringify nicely.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This reverts commit da4ec54756.
Since v6 got disabled for Windows nodes, I need the debug flag back
to figure out why it was broken.
Signed-off-by: David Anderson <danderson@tailscale.com>
Use tb.Cleanup to simplify both the API and the implementation.
One behavior change: When the number of goroutines shrinks, don't log.
I've never found these logs to be useful, and they frequently add noise.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The code was using a C "int", which is a signed 32-bit integer.
That means some valid IP addresses were negative numbers.
(In particular, the default router address handed out by AT&T
fiber: 192.168.1.254. No I don't know why they do that.)
A negative number is < 255, and so was treated by the Go code
as an error.
This fixes the unit test failure:
$ go test -v -run=TestLikelyHomeRouterIPSyscallExec ./net/interfaces
=== RUN TestLikelyHomeRouterIPSyscallExec
interfaces_darwin_cgo_test.go:15: syscall() = invalid IP, false, netstat = 192.168.1.254, true
--- FAIL: TestLikelyHomeRouterIPSyscallExec (0.00s)
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
Previously we disabled v6 support if the disable_policy knob was
missing in /proc, but some kernels support policy routing without
exposing the toggle. So instead, treat disable_policy absence as a
"maybe", and make the direct `ip -6 rule` probing a bit more
elaborate to compensate.
Fixes#1241.
Signed-off-by: David Anderson <danderson@tailscale.com>
This is mostly code movement from the wireguard-go repo.
Most of the new wgcfg package corresponds to the wireguard-go wgcfg package.
wgengine/wgcfg/device{_test}.go was device/config{_test}.go.
There were substantive but simple changes to device_test.go to remove
internal package device references.
The API of device.Config (now wgcfg.DeviceConfig) grew an error return;
we previously logged the error and threw it away.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
We log lines like this:
c.logf("[v1] magicsock: disco: %v->%v (%v, %v) sent %v", c.discoShort, dstDisco.ShortString(), dstKey.ShortString(), derpStr(dst.String()), disco.MessageSummary(m))
The leading [v1] causes it to get unintentionally rate limited.
Until we have a proper fix, work around it.
Fixes#1216
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Signed-off-by: Sonia Appasamy <sonia@tailscale.com>
Consolidates the node display name logic from each of the clients into
tailcfg.Node. UI clients can use these names directly, rather than computing
them independently.
On Windows, configureInterface starts a goroutine reconfiguring the
Windows firewall.
But if configureInterface fails later, that goroutine kept running and
likely failing forever, spamming logs. Make it stop quietly if its
launching goroutine filed.
Rewrite log lines on the fly, based on the set of known peers.
This enables us to use upstream wireguard-go logging,
but maintain the Tailscale-style peer public key identifiers
that the rest of our systems (and people) expect.
Fixes#1183
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Continuation of earlier two umask changes,
5611f290eb and
d6e9fb1df0.
This change mostly affects us, running tailscaled as root by hand (wit
a umask of 0077), not under systemd. End users running tailscaled
under systemd won't have a umask.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Also, don't try to use IPv6 LinkLocalUnicast addresses for now. Like endpoints
exchanged with control, we share them but don't yet use them.
Updates #1172
c8c493f3d9 made it always say
`created=false` which scared me when I saw it, as that would've implied
things were broken much worse. Fortunately the logging was just wrong.
DstToString is used in two places in wireguard-go: Logging and uapi.
We are switching to use uapi for wireguard-go config.
To preserve existing behavior, we need the full set of addrs.
And for logging, having the full set of addrs seems useful.
(The Addrs method itself is slated for removal. When that happens,
the implementation will move to DstToString.)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
To save CPU and wakeups, don't run the DERP cleanup timer regularly
unless there is a non-home DERP connection open.
Also eliminates the goroutine, moving to a time.AfterFunc.
Updates #1034
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This reverts commit 08baa17d9a.
It caused deadlocks due to lock ordering violations.
It was not the right fix, and thus should simply be reverted
while we look for the right fix (if we haven't already found it
in the interim; we've fixed other logging-after-test issues).
Fixes#1161
context.cancelCtx.Done involves a mutex and isn't as cheap as I
previously assumed. Convert the donec method into a struct field and
store the channel value once. Our one magicsock.Conn gets one pointer
larger, but it cuts ~1% of the CPU time of the ReceiveFrom benchmark
and removes a bubble from the --svg output :)
+ add a test for parseAndRemoveLogLevel()
+ add a test for drainPendingMessages()
+ test JSON log encoding including several special cases
Other tests frequently send logs but a) don't check the result and
b) do so by happenstance, such that the code in encode() was not
consistently being exercised and leading to spurious changes in
code coverage. These tests attempt to more systematically test
the logging function.
This is the second attempt to add these tests, the first attempt
(in https://github.com/tailscale/tailscale/pull/1114) had two issues:
1. httptest.NewServer creates multiple goroutine handlers, and
logtail uses goroutines to upload, but the first version had no
locking in the server to guard this.
Moved data handling into channels to get synchronization.
2. The channel to notify the test of the arrival of data had a depth
of 1, in cases where the Logger sent multiple uploads it would
block the server.
This resulted in the first iteration of these tests being flaky,
and we reverted it.
This new version of the tests has passed with
go test -race -count=10000
and seems solid.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
This test serves two purposes:
+ check that Write() returns an error if the tstun has been
closed.
+ ensure that the close-related code in tstun is exercised in
a test case. We were getting spurious code coverage adds/drops
based on timing of when the test case finished.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
Windows has a low resolution timer.
Some of the tests assumed that unblock takes effect immediately.
Consider:
t := time.Now()
elapsed := time.Now().After(t)
It seems plausible that elapsed should always be true.
However, with a low resolution timer, that might fail.
Change time.Now().After to !time.Now().Before,
so that unblocking always takes effect immediately.
Fixes#873.
22507adf54 stopped relying on
our fork of wireguard-go's UpdateDst callback.
As a result, we can unwind that code,
and the extra return value of ReceiveIPv{4,6}.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
TwoDevicePing is explicitly testing the behavior of the legacy codepath, everything
else is happy to assume that code no longer exists.
Signed-off-by: David Anderson <danderson@tailscale.com>
Previously, this benchmark relied on behavior of the legacy
receive codepath, which I changed in 22507adf. With this
change, the benchmark instead relies on the new active discovery
path.
Signed-off-by: David Anderson <danderson@tailscale.com>
This prevents us from continuing to do unnecessary work
(including logging) after the connection has closed.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
This adds a new IP Protocol type, TSMP on protocol number 99 for
sending inter-tailscale messages over WireGuard, currently just for
why a peer rejects TCP SYNs (ACL rejection, shields up, and in the
future: nothing listening, something listening on that port but wrong
interface, etc)
Updates #1094
Updates tailscale/corp#1185
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This partially reverts d6e9fb1df0, which modified the permissions
on the tailscaled Unix socket and thus required "sudo tailscale" even
for "tailscale status".
Instead, open the permissions back up (on Linux only) but have the
server look at the peer creds and only permit read-only actions unless
you're root.
In the future we'll also have a group that can do mutable actions.
On OpenBSD and FreeBSD, the permissions on the socket remain locked
down to 0600 from d6e9fb1df0.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Commit 68ddf1 removed code that reads
`SOFTWARE\Tailscale IPN\SearchList` registry value. But the commit
left code that writes that value.
So now this package writes and never reads the value.
Remove the code to stop pointless work.
Updates #853
Signed-off-by: Alex Brainman <alex.brainman@gmail.com>
This eliminates a dependency on wgcfg.Endpoint,
as part of the effort to eliminate our wireguard-go fork.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
This makes connectivity between ancient and new tailscale nodes slightly
worse in some cases, but only in cases where the ancient version would
likely have failed to get connectivity anyway.
Signed-off-by: David Anderson <danderson@tailscale.com>
Here's an example log line in the new format:
[RATE LIMITED] format string "open-conn-track: timeout opening %v; no associated peer node" (example: "open-conn-track: timeout opening ([ip] => [ip]); no associated peer node")
This should make debugging logging issues a bit easier, and give more
context as to why something was rate limited. This change was proposed
in a comment on #1110.
Signed-off-by: Smitty <me@smitop.com>
This reverts commit e4f53e9b6f.
At least two of these tests are flakey, reverting until they can be
made more robust.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
This is what every other DNS resolver I could find does, so tsdns
should do it to. This also helps avoid weird error messages about
non-existent records being unimplemented, and thus fixes#848.
Signed-off-by: Smitty <me@smitop.com>
* logtail: test parseAndRemoveLogLevel()
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
* logtail: test JSON log encoding.
Expand TestUploadMessages to also exercise the encoding functions
in logtail, like JSON logging and timestamps.
Other tests frequently send logs but a) don't check the result and
b) do so by happenstance, such that the lines in encode() were not
consistently being exercised and leading to spurious changes in
code coverage.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
* logtail: add a test for drainPendingMessages
Make the client buffer some messages before the upload server
becomes available.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
* logtail: use %q, raw strings, and io.WriteString
%q escapes binary characters for us.
raw strings avoid so much backslash escaping
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
Right now TestFastShutdown tries to upload logs to localhost:1234,
which will most likely respond with an error. However if one has an
actual service running on port 1234, it would receive a connection
attempting to POST every time the unit test runs.
Start a local server and direct the upload there instead.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
In sendDiscoMessage there is a check of whether the connection is
closed, which is not being reliably exercised by other tests.
This shows up in code coverage reports, the lines of code in
sendDiscoMessage are alternately added and subtracted from
code coverage.
Add a test to specifically exercise and verify this code path.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
Start an HTTP server to accept POST requests, and upload some logs to
it. Check that uploaded logs were received.
Code in logtail:drainPending was not being reliably exercised by other
tests. This shows up in code coverage reports, as lines of code in
drainPending are alternately added and subtracted from code coverage.
This test will reliably exercise and verify this code.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
All cases in lessThan are not reliably exercised by other tests.
This shows up in code coverage metrics as lines in lessThan are
alternately added and removed from coverage.
Add a test case to systematically test all conditions.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
In derpWriteChanOfAddr when we call derphttp.NewRegionClient(),
there is a check of whether the connection is already errored and
if so it returns before grabbing the lock. The lock might already
be held and would be a deadlock.
This corner case is not being reliably exercised by other tests.
This shows up in code coverage reports, the lines of code in
derpWriteChanOfAddr are alternately added and subtracted from
code coverage.
Add a test to specifically exercise this code path, and verify that
it doesn't deadlock.
This is the best tradeoff I could come up with:
+ the moment code calls Err() to check if there is an error, we
grab the lock to make sure it would deadlock if it tries to grab
the lock itself.
+ if a new call to Err() is added in this code path, only the
first one will be covered and the rest will not be tested.
+ this test doesn't verify whether code is checking for Err() in
the right place, which ideally I guess it would.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
Users in Amsterdam (as one example) were flipping back and forth
between equidistant London & Frankfurt relays too much.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
netaddr.IP no longer allocates, so don't need a cache or all its associated
code/complexity.
This totally removes groupcache/lru from the deps.
Also go mod tidy.
* wengine/netstack: bump gvisor to latest version
Signed-off-by: Naman Sood <naman@tailscale.com>
* update dependencies
Signed-off-by: Naman Sood <naman@tailscale.com>
* Don't change hardcoded IP
Signed-off-by: Naman Sood <naman@tailscale.com>
Not usefully functional yet (mostly a proof of concept), but getting
it submitted for some work @namansood is going to do atop this.
Updates #707
Updates #634
Updates #48
Updates #835
* show DNS name over hostname, removing domain's common MagicDNS suffix.
only show hostname if there's no DNS name.
but still show shared devices' MagicDNS FQDN.
* remove nerdy low-level details by default: endpoints, DERP relay,
public key. They're available in JSON mode still for those who need
them.
* only show endpoint or DERP relay when it's active with the goal of
making debugging easier. (so it's easier for users to understand
what's happening) The asterisks are gone.
* remove Tx/Rx numbers by default for idle peers; only show them when
there's traffic.
* include peers' owner login names
* add CLI option to not show peers (matching --self=true, --peers= also
defaults to true)
* sort by DNS/host name, not public key
* reorder columns
If any goroutine continues to use the logger in TestLocalLogLines
after the test finishes, the test panics.
The culprit for this was wireguard-go; the previous commit fixed that.
This commit adds suspenders: When the test is done, make logging calls
into no-ops.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The log lines that wireguard-go prints as it starts
and stops its worker routines are mostly noise.
They also happen after other work is completed,
which causes failures in some of the log testing packages.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
This appears to have been the intent of the previous code,
but in practice, it only returned A records.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
To be honest I'm not fond of Golden Bytes tests like this, but
not so much as to want to rewrite the whole test. The DNS byte
format is essentially immutable at this point, the encoded bytes
aren't going to change. The rest of the test assumptions about
hostnames might, but we can fix that when it comes.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
Previously, any change to endpoints or hostinfo (or hostinfo's
netinfo) would result in the long-running map request HTTP stream
being torn down and restarted, losing all compression context along
with it.
This change makes us instead send a lite map request (OmitPeers: true,
Stream: false) that doesn't subscribe to anything, and then the
coordination server knows to not close other streams for that node
when it recives a lite request.
Fixestailscale/corp#797
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
+ we don't need an exactly accurate count of the number of times each
time ran. Remove -covermode, the default "set" will be fine to just
track whether a given line ran at all.
+ add -benchtime=1x. We only need to run the benchmarks once.
+ -bench=. to match any character.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
We include -bench because some parts of the codebase, like
smallzstd, do not have regular unit tests but do have very
good benchmark tests that covers all functions.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
eccc167 introduced closeHandle which opened the handle,
but never closed it.
Windows handles should be closed.
Updates #921
Signed-off-by: Alex Brainman <alex.brainman@gmail.com>
Research in issue #1063 uncovered why tailscaled would fail with
ProtectClock enabled (it implicitly enabled DevicePolicy=closed).
This knowledge in turn also opens the door for locking down /dev
further, e.g. explicitly setting DevicePolicy=strict (instead of
closed), and making /dev private for the unit.
Additional possible future (or downstream) lockdown that can be done
is setting `PrivateDevices=true` (with `BindPaths=/dev/net/`), however,
systemd 233 or later is required for this, and tailscaled currently need
to work for systemd down to version 215.
Closes https://github.com/tailscale/tailscale/issues/1063
Signed-off-by: Frederik “Freso” S. Olesen <freso.dk@gmail.com>
Previously the client had heuristics to calculate which DNS search domains
to set, based on the peers' names. Unfortunately that prevented us from
doing some things we wanted to do server-side related to node sharing.
So, bump MapRequest.Version to 9 to signal that the client only uses the
explicitly configured DNS search domains and doesn't augment it with its own
list.
Updates tailscale/corp#1026
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This is a replacement for the key-related parts
of the wireguard-go wgcfg package.
This is almost a straight copy/paste from the wgcfg package.
I have slightly changed some of the exported functions and types
to avoid stutter, added and tweaked some comments,
and removed some now-unused code.
To avoid having wireguard-go depend on this new package,
wgcfg will keep its key types.
We translate into and out of those types at the last minute.
These few remaining uses will be eliminated alongside
the rest of the wgcfg package.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
While not a full capability lockdown of the systemd unit, this still
improves sandboxing and security of the running process a good deal.
Signed-off-by: Frederik “Freso” S. Olesen <freso.dk@gmail.com>
The fallthrough happened to work in controlclient already due to the
/etc/os-release PRETTY_NAME default, but make it explicit so it
doesn't look like an accident.
Also add it to version/distro, even though nothing needs it yet.
The windows key timeout is longer than the wgengine watchdog timeout,
which means we never reach the timeout, instead the process exits.
Reduce the timeout so if we do hit it, at least the process continues.
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
On Win10, there's a hardcoded GUID and this works.
On Win7, this GUID changes and we need to ask the tun for its
LUID and convert that from the GUID.
This commit uses the computed GUID that is placed in InterfaceName.
Diagnosed by Jason Donnenfeld. (Thanks!)
Log levels can now be specified with "[v1] " or "[v2] " substrings
that are then stripped and filtered at the final logger. This follows
our existing "[unexpected]" etc convention and doesn't require a
wholesale reworking of our logging at the moment.
cmd/tailscaled then gets a new --verbose=N flag to take a log level
that controls what gets logged to stderr (and thus systemd, syslog,
etc). Logtail is unaffected by --verbose.
This commit doesn't add annotations to any existing log prints. That
is in the next commit.
Updates #924
Updates #282
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
For now, the server will only send v6 configuration to mapversion 8 clients
as part of an early-adopter program, while we verify that the functionality
is robust.
Signed-off-by: David Anderson <danderson@tailscale.com>
In practice, we already provide IPv6 endpoint addresses via netcheck,
and that address is likely to match a local address anyway (i.e. no NAT66).
The comment at that piece of the code mentions needing to figure out a
good priority ordering, but that only applies to non-active-discovery
clients, who already don't do anything with IPv6 addresses.
Signed-off-by: David Anderson <danderson@tailscale.com>
Lazy wg configuration now triggers if a peer has only endpoint
addresses (/32 for IPv4, /128 for IPv6). Subnet routers still
trigger eager configuration to avoid the need for a CIDR match
in the hot packet path.
Signed-off-by: David Anderson <danderson@tailscale.com>
This caused some confusion in issue #460, since usually raw format
strings aren't printed directly. Hopefully by directly logging that
they are intended to be raw format strings, this will be more clear.
Rate limited format strings now look like:
[RATE LIMITED] format string "control: sendStatus: %s: %v"
Closes#460.
Signed-off-by: Smitty <me@smitop.com>
The previous code used a lot of whole-function variables and shared
behavior that only triggered based on prior action from a single codepath.
Instead of that, move the small amounts of "shared" code into each switch
case.
Signed-off-by: David Anderson <danderson@tailscale.com>
Before, tailscaled would log every 10 seconds when the periodic noteRecvActivity
call happens. This is noisy, but worse it's misleading, because the message
suggests that the disco code is starting a lazy config run for a missing peer,
whereas in fact it's just an internal piece of keepalive logic.
With this change, we still log when going from 0->1 tunnel for the peer, but
not every 10s thereafter.
Signed-off-by: David Anderson <danderson@tailscale.com>
Addresses #964
Still to be done:
- Figure out the correct logging lines in util/systemd
- Figure out if we need to slip the systemd.Status function anywhere
else
- Log util/systemd errors? (most of the errors are of the "you cannot do
anything about this, but it might be a bad idea to crash the program if
it errors" kind)
Assistance in getting this over the finish line would help a lot.
Signed-off-by: Christine Dodrill <me@christine.website>
util/systemd: rename the nonlinux file to appease the magic
Signed-off-by: Christine Dodrill <me@christine.website>
util/systemd: fix package name
Signed-off-by: Christine Dodrill <me@christine.website>
util/systemd: fix review feedback from @mdlayher
Signed-off-by: Christine Dodrill <me@christine.website>
cmd/tailscale{,d}: update depaware manifests
Signed-off-by: Christine Dodrill <me@christine.website>
util/systemd: use sync.Once instead of func init
Signed-off-by: Christine Dodrill <me@christine.website>
control/controlclient: minor review feedback fixes
Signed-off-by: Christine Dodrill <me@christine.website>
{control,ipn,systemd}: fix review feedback
Signed-off-by: Christine Dodrill <me@christine.website>
review feedback fixes
Signed-off-by: Christine Dodrill <me@christine.website>
ipn: fix sprintf call
Signed-off-by: Christine Dodrill <me@christine.website>
ipn: make staticcheck less sad
Signed-off-by: Christine Dodrill <me@christine.website>
ipn: print IP address in connected status
Signed-off-by: Christine Dodrill <me@christine.website>
ipn: review feedback
Signed-off-by: Christine Dodrill <me@christine.website>
final fixups
Signed-off-by: Christine Dodrill <me@christine.website>
Upgrading staticcheck upgraded golang.org/x/sync
(one downside of mixing our tools in with our regular go.mod),
which introduced a new dependency via
https://go-review.googlesource.com/c/sync/+/251677
That CL could and probably should be written without runtime/debug,
but it's not clear to me that that is better at this moment
than simply accepting the additional package as a dependency.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
These ignore built files that don't exist anymore, and just serve
to clutter up the .gitignore file. (I was initially confused when
I saw those lines, since I (correctly) thought that the only
Tailscale binaries were tailscale and tailscaled):
- taillogin was removed in d052586
- relaynode was removed in a56e853
Signed-off-by: Smitty <me@smitop.com>
This is a repeat of commit 3aa68cd397
which was lost in a rework of version.sh.
git worktrees have a .git file rather than a .git directory, so building
in a worktree caused version.sh to generate an error.
Signed-off-by: Adrian Dewhurst <adrian@tailscale.com>
After mapver 5's incremental netmap updates & user profiles, much of
the remaining bandwidth for streamed MapResponses were redundant,
unchanged PacketFilters. So make MapRequest.Version 6 mean that nil
means unchanged from the previous value.
Noticed these in MapResponses to clients.
MachineAuthorized was set true, but once we fix the coordination server
to zero out that field, then it can be omittted.
Add content length hints to headers.
The server can use these hints to more efficiently select buffers.
Stop attempting to compress tiny requests.
The bandwidth savings are negligible (and sometimes negative!),
and it makes extra work for the server.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
It appears some users have corrupted pref.conf files. Have LoadPrefs
treat these files as non-existent. This way tailscale will make user
login, and not crash.
Fixes#954
Signed-off-by: Alex Brainman <alex.brainman@gmail.com>
The cloner tool adds static checks that the Clone methods are up to
date, so failing to update Clone causes a compiler error.
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
The cornerstone API is a more memory-efficient Unmarshal.
The savings come from re-using a json.Decoder.
BenchmarkUnmarshal-8 4016418 288 ns/op 8 B/op 1 allocs/op
BenchmarkStdUnmarshal-8 4189261 283 ns/op 184 B/op 2 allocs/op
It also includes a Bytes type to reduce allocations
when unmarshalling a non-hex-encoded JSON string into a []byte.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
likelyHomeRouterIPDarwinSyscall iterates through the list of routes,
looking for a private gateway, returning the first one it finds.
likelyHomeRouterIPDarwinExec does the same thing,
except that it returns the last one it finds.
As a result, when there are multiple gateways,
TestLikelyHomeRouterIPSyscallExec fails.
(At least, I think that that is what is happening;
I am going inferring from observed behavior.)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Please check if your bug is [already filed](https://github.com/tailscale/tailscale/issues).
Have an urgent issue? Let us know by emailing us at <support@tailscale.com>.
- type:textarea
id:what-happened
attributes:
label:What is the issue?
description:What happened? What did you expect to happen?
placeholder:oh no
validations:
required:true
- type:textarea
id:steps
attributes:
label:Steps to reproduce
description:What are the steps you took that hit this issue?
validations:
required:false
- type:textarea
id:changes
attributes:
label:Are there any recent changes that introduced the issue?
description:If so, what are those changes?
validations:
required:false
- type:dropdown
id:os
attributes:
label:OS
description:What OS are you using? You may select more than one.
multiple:true
options:
- Linux
- macOS
- Windows
- iOS
- Android
- Synology
- Other
validations:
required:false
- type:input
id:os-version
attributes:
label:OS version
description:What OS version are you using?
placeholder:e.g., Debian 11.0, macOS Big Sur 11.6, Synology DSM 7
validations:
required:false
- type:input
id:ts-version
attributes:
label:Tailscale version
description:What Tailscale version are you using?
placeholder:e.g., 1.14.4
validations:
required:false
- type:input
id:bug-report
attributes:
label:Bug report
description:Please run [`tailscale bugreport`](https://tailscale.com/kb/1080/cli/?q=Cli#bugreport) and share the bug identifier. The identifier is a random string which allows Tailscale support to locate your account and gives a point to focus on when looking for errors.
@@ -8,11 +8,12 @@ Private WireGuard® networks made easy
This repository contains all the open source Tailscale client code and
the `tailscaled` daemon and `tailscale` CLI tool. The `tailscaled`
daemon runs primarily on Linux; it also works to varying degrees on
FreeBSD, OpenBSD, Darwin, and Windows.
daemon runs on Linux, Windows and [macOS](https://tailscale.com/kb/1065/macos-variants/), and to varying degrees on FreeBSD, OpenBSD, and Darwin. (The Tailscale iOS and Android apps use this repo's code, but this repo doesn't contain the mobile GUI code.)
The Android app is at https://github.com/tailscale/tailscale-android
The Synology package is at https://github.com/tailscale/tailscale-synology
## Using
We serve packages for a variety of distros at
@@ -43,7 +44,7 @@ If your distro has conventions that preclude the use of
distro's way, so that bug reports contain useful version information.
We only guarantee to support the latest Go release and any Go beta or
release candidate builds (currently Go 1.15) in module mode. It might
release candidate builds (currently Go 1.18) in module mode. It might
work in earlier Go versions or in GOPATH mode, but we're making no
dev=flag.Bool("dev",false,"run in localhost development mode")
addr=flag.String("a",":443","server address")
addr=flag.String("a",":443","server HTTPS listen address, in form \":port\", \"ip:port\", or for IPv6 \"[ip]:port\". If the IP is omitted, it defaults to all interfaces.")
httpPort=flag.Int("http-port",80,"The port on which to serve HTTP. Set to -1 to disable. The listener is bound to the same IP (if any) as specified in the -a flag.")
stunPort=flag.Int("stun-port",3478,"The UDP port on which to serve STUN. The listener is bound to the same IP (if any) as specified in the -a flag.")
configPath=flag.String("c","","config file path")
certMode=flag.String("certmode","letsencrypt","mode for getting a cert. possible options: manual, letsencrypt")
certDir=flag.String("certdir",tsweb.DefaultCertDir("derper-certs"),"directory to store LetsEncrypt certs, if addr's port is :443")
hostname=flag.String("hostname","derp.tailscale.com","LetsEncrypt host name, if addr's port is :443")
logCollection=flag.String("logcollection","","If non-empty, logtail collection to log to")
runSTUN=flag.Bool("stun",false,"also run a STUN server")
runSTUN=flag.Bool("stun",true,"whether to run a STUN server. It will bind to the same IP (if any) as the --addr flag value.")
meshPSKFile=flag.String("mesh-psk-file",defaultMeshPSKFile(),"if non-empty, path to file containing the mesh pre-shared key file. It should contain some hex string; whitespace is trimmed.")
meshWith=flag.String("mesh-with","","optional comma-separated list of hostnames to mesh with; the server's own hostname can be in the list")
bootstrapDNS=flag.String("bootstrap-dns-names","","optional comma-separated list of hostnames to make available at /bootstrap-dns")
verifyClients=flag.Bool("verify-clients",false,"verify clients to this DERP server through a local tailscaled instance.")
acceptConnLimit=flag.Float64("accept-connection-limit",math.Inf(+1),"rate limit for accepting new connection")
acceptConnBurst=flag.Int("accept-connection-burst",math.MaxInt,"burst limit for accepting new connection")
// Copyright (c) 2022 Tailscale Inc & AUTHORS All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// Admin commands.
packagecli
import(
"bytes"
"context"
"encoding/json"
"errors"
"flag"
"fmt"
"io"
"net/http"
"net/url"
"os"
"strings"
"github.com/peterbourgon/ff/v3/ffcli"
"tailscale.com/client/tailscale"
)
varadminCmd=&ffcli.Command{
Name:"admin",
Exec:runAdmin,
LongHelp:`"tailscale admin" contains admin commands to manage a Tailscale network.`,
FlagSet:(func()*flag.FlagSet{
fs:=newFlagSet("admin")
fs.StringVar(&adminArgs.apiBase,"api-server","https://api.tailscale.com","which Tailscale server instance to use. Ignored when --token-file is empty.")
fs.StringVar(&adminArgs.tokenFile,"token-file","","if non-empty, filename containing API token to use. If empty, authentication is done via the active Tailscale control plane connection.")
fs.StringVar(&adminArgs.tailnet,"tailnet","","Tailnet to query or edit. Required if token-file is used. Must be blank if token-file is blank, in which case the tailnet used is the same as the active tailnet.")
fs.BoolVar(&asJSON,"json",false,"if true, return ACL is JSON format. The default of false means to use the original HuJSON JSON superset form that allows comments and trailing commas.")
// Copyright (c) 2021 Tailscale Inc & AUTHORS All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
packagecli
import(
"bytes"
"context"
"crypto/tls"
"flag"
"fmt"
"log"
"net/http"
"os"
"strings"
"github.com/peterbourgon/ff/v3/ffcli"
"tailscale.com/atomicfile"
"tailscale.com/client/tailscale"
"tailscale.com/ipn"
"tailscale.com/version"
)
varcertCmd=&ffcli.Command{
Name:"cert",
Exec:runCert,
ShortHelp:"get TLS certs",
ShortUsage:"cert [flags] <domain>",
FlagSet:(func()*flag.FlagSet{
fs:=newFlagSet("cert")
fs.StringVar(&certArgs.certFile,"cert-file","","output cert file or \"-\" for stdout; defaults to DOMAIN.crt if --cert-file and --key-file are both unset")
fs.StringVar(&certArgs.keyFile,"key-file","","output cert file or \"-\" for stdout; defaults to DOMAIN.key if --cert-file and --key-file are both unset")
fs.BoolVar(&certArgs.serve,"serve-demo",false,"if true, serve on port :443 using the cert as a demo, instead of writing out the files to disk")
returnfmt.Errorf("failed to connect to local tailscaled process; is the Tailscale service running?")
case"darwin":
returnfmt.Errorf("failed to connect to local Tailscale service; is Tailscale running?")
case"linux":
returnfmt.Errorf("failed to connect to local tailscaled; it doesn't appear to be running (sudo systemctl start tailscaled ?)")
}
returnfmt.Errorf("failed to connect to local tailscaled process; it doesn't appear to be running")
}
returnfmt.Errorf("failed to connect to local tailscaled (which appears to be running as %v, pid %v). Got error: %w",foundProc.Executable(),foundProc.Pid(),origErr)
flag.Var(flagtype.PortValue(&args.port,magicsock.DefaultPort),"port","UDP port to listen on for WireGuard and peer-to-peer traffic; 0 means automatically select")
flag.StringVar(&args.statepath,"state",paths.DefaultTailscaledStateFile(),"path of state file")
flag.StringVar(&args.socksAddr,"socks5-server","",`optional [ip]:port to run a SOCK5 server (e.g. "localhost:1080")`)
flag.StringVar(&args.httpProxyAddr,"outbound-http-proxy-listen","",`optional [ip]:port to run an outbound HTTP proxy (e.g. "localhost:8080")`)
flag.StringVar(&args.tunname,"tun",defaultTunName(),`tunnel interface name; use "userspace-networking" (beta) to not use TUN`)
flag.Var(flagtype.PortValue(&args.port,0),"port","UDP port to listen on for WireGuard and peer-to-peer traffic; 0 means automatically select")
flag.StringVar(&args.statepath,"state",paths.DefaultTailscaledStateFile(),"absolute path of state file; use 'kube:<secret-name>' to use Kubernetes secrets or 'arn:aws:ssm:...' to store in AWS SSM; use 'mem:' to not store state and register as an emphemeral node. If empty and --statedir is provided, the default is <statedir>/tailscaled.state")
flag.StringVar(&args.statedir,"statedir","","path to directory for storage of config state, TLS certs, temporary incoming Taildrop files, etc. If empty, it's derived from --state when possible.")
flag.StringVar(&args.socketpath,"socket",paths.DefaultTailscaledSocket(),"path of the service unix socket")
flag.StringVar(&args.birdSocketPath,"bird-socket","","path of the bird unix socket")
flag.BoolVar(&printVersion,"version",false,"print version information and exit")
err:=fixconsole.FixConsoleIfNeeded()
iferr!=nil{
log.Fatalf("fixConsoleOutput: %v",err)
iflen(os.Args)>1{
sub:=os.Args[1]
iffp,ok:=subCommands[sub];ok{
if*fp==nil{
log.SetFlags(0)
log.Fatalf("%s not available on %v",sub,runtime.GOOS)
log.Fatalf("tailscaled is broken on macOS with go1.18 due to upstream bug https://github.com/golang/go/issues/51759; use 1.18.1+ or Tailscale's Go fork")
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.