Compare commits

..

48 Commits

Author SHA1 Message Date
Lee Briggs
d998cf0837 cmd/containerboot: introduce TS_STATE env var
Fixes #12180
Fixed #13409

Signed-off-by: Lee Briggs <lee@leebriggs.co.uk>
2024-12-13 11:57:24 +00:00
Brad Fitzpatrick
73128e2523 ssh/tailssh: remove unused public key support
When we first made Tailscale SSH, we assumed people would want public
key support soon after. Turns out that hasn't been the case; people
love the Tailscale identity authentication and check mode.

In light of CVE-2024-45337, just remove all our public key code to not
distract people, and to make the code smaller. We can always get it
back from git if needed.

Updates tailscale/corp#25131
Updates golang/go#70779

Co-authored-by: Percy Wegmann <percy@tailscale.com>
Change-Id: I87a6e79c2215158766a81942227a18b247333c22
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-12-12 11:16:55 -08:00
Adrian Dewhurst
716cb37256 util/dnsname: use vizerror for all errors
The errors emitted by util/dnsname are all written at least moderately
friendly and none of them emit sensitive information. They should be
safe to display to end users.

Updates tailscale/corp#9025

Change-Id: Ic58705075bacf42f56378127532c5f28ff6bfc89
Signed-off-by: Adrian Dewhurst <adrian@tailscale.com>
2024-12-12 10:29:36 -05:00
Joe Tsai
c9188d7760 types/bools: add IfElse (#14272)
The IfElse function is equivalent to the ternary (c ? a : b) operator
in many other languages like C. Unfortunately, this function
cannot perform short-circuit evaluation like in many other languages,
but this is a restriction that's not much different
than the pre-existing cmp.Or function.

The argument against ternary operators in Go is that
nested ternary operators become unreadable
(e.g., (c1 ? (c2 ? a : b) : (c2 ? x : y))).
But a single layer of ternary expressions can sometimes
make code much more readable.

Having the bools.IfElse function gives code authors the
ability to decide whether use of this is more readable or not.
Obviously, code authors will need to be judicious about
their use of this helper function.
Readability is more of an art than a science.

Updates #cleanup

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2024-12-11 10:55:33 -08:00
Joe Tsai
0045860060 types/iox: add function types for Reader and Writer (#14366)
Throughout our codebase we have types that only exist only
to implement an io.Reader or io.Writer, when it would have been
simpler, cleaner, and more readable to use an inlined function literal
that closes over the relevant types.

This is arguably more readable since it keeps the semantic logic
in place rather than have it be isolated elsewhere.

Note that a function literal that closes over some variables
is semantic equivalent to declaring a struct with fields and
having the Read or Write method mutate those fields.

Updates #cleanup

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2024-12-11 10:55:21 -08:00
Irbe Krumina
6e552f66a0 cmd/containerboot: don't attempt to patch a Secret field without permissions (#14365)
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2024-12-11 14:58:44 +00:00
Tom Proctor
f1ccdcc713 cmd/k8s-operator,k8s-operator: operator integration tests (#12792)
This is the start of an integration/e2e test suite for the tailscale operator.
It currently only tests two major features, ingress proxy and API server proxy,
but we intend to expand it to cover more features over time. It also only
supports manual runs for now. We intend to integrate it into CI checks in a
separate update when we have planned how to securely provide CI with the secrets
required for connecting to a test tailnet.

Updates #12622

Change-Id: I31e464bb49719348b62a563790f2bc2ba165a11b
Co-authored-by: Irbe Krumina <irbe@tailscale.com>
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2024-12-11 14:48:57 +00:00
Irbe Krumina
fa655e6ed3 cmd/containerboot: add more tests, check that egress service config only set on kube (#14360)
Updates tailscale/tailscale#14357

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2024-12-11 12:59:42 +00:00
Irbe Krumina
0cc071f154 cmd/containerboot: don't attempt to write kube Secret in non-kube environments (#14358)
Updates tailscale/tailscale#14354

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2024-12-11 10:56:12 +00:00
Bjorn Neergaard
8b1d01161b cmd/containerboot: guard kubeClient against nil dereference (#14357)
A method on kc was called unconditionally, even if was not initialized,
leading to a nil pointer dereference when TS_SERVE_CONFIG was set
outside Kubernetes.

Add a guard symmetric with other uses of the kubeClient.

Fixes #14354.

Signed-off-by: Bjorn Neergaard <bjorn@neersighted.com>
2024-12-11 09:52:56 +00:00
dependabot[bot]
d54cd59390 .github: Bump github/codeql-action from 3.27.1 to 3.27.6 (#14332)
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.27.1 to 3.27.6.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](4f3212b617...aa57810251)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-10 15:15:11 -07:00
dependabot[bot]
fa28b024d6 .github: Bump actions/cache from 4.1.2 to 4.2.0 (#14331)
Bumps [actions/cache](https://github.com/actions/cache) from 4.1.2 to 4.2.0.
- [Release notes](https://github.com/actions/cache/releases)
- [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md)
- [Commits](6849a64899...1bd1e32a3b)

---
updated-dependencies:
- dependency-name: actions/cache
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-10 14:32:04 -07:00
Mario Minardi
ea3d0bcfd4 prober,derp/derphttp: make dev-mode DERP probes work without TLS (#14347)
Make dev-mode DERP probes work without TLS. Properly dial port `3340`
when not using HTTPS when dialing nodes in `derphttp_client`. Skip
verifying TLS state in `newConn` if we are not running a prober.

Updates tailscale/corp#24635

Signed-off-by: Percy Wegmann <percy@tailscale.com>
Co-authored-by: Percy Wegmann <percy@tailscale.com>
2024-12-10 10:51:03 -07:00
Mike O'Driscoll
24b243c194 derp: add env var setting server send queue depth (#14334)
Use envknob to configure the per client send
queue depth for the derp server.

Fixes tailscale/corp#24978

Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
2024-12-10 08:58:27 -05:00
Tom Proctor
06c5e83c20 hostinfo: fix testing in container (#14330)
Previously this unit test failed if it was run in a container. Update the assert
to focus on exactly the condition we are trying to assert: the package type
should only be 'container' if we use the build tag.

Updates #14317

Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2024-12-09 20:42:10 +00:00
Mike O'Driscoll
c2761162a0 cmd/stunc: enforce read timeout deadline (#14309)
Make argparsing use flag for adding a new
parameter that requires parsing.

Enforce a read timeout deadline waiting for response
from the stun server provided in the args. Otherwise
the program will never exit.

Fixes #14267

Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
2024-12-06 14:27:52 -05:00
Nick Khyl
f817860079 VERSION.txt: this is v1.79.0
Signed-off-by: Nick Khyl <nickk@tailscale.com>
2024-12-06 11:25:12 -06:00
Percy Wegmann
06a82f416f cmd,{get-authkey,tailscale}: remove unnecessary scope qualifier from OAuth clients
OAuth clients that were used to generate an auth_key previously
specified the scope 'device'. 'device' is not an actual scope,
the real scope is 'devices'. The resulting OAuth token ended up
including all scopes from the specified OAuth client, so the code
was able to successfully create auth_keys.

It's better not to hardcode a scope here anyway, so that we have
the flexibility of changing which scope(s) are used in the future
without having to update old clients.

Since the qualifier never actually did anything, this commit simply
removes it.

Updates tailscale/corp#24934

Signed-off-by: Percy Wegmann <percy@tailscale.com>
2024-12-06 09:29:07 -06:00
Brad Fitzpatrick
dc6728729e health: fix TestHealthMetric to pass on release branch
Fixes #14302

Change-Id: I9fd893a97711c72b713fe5535f2ccb93fadf7452
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-12-05 15:50:56 -08:00
Joe Tsai
a482dc037b logpolicy: cleanup options API and allow setting http.Client (#11503)
This package grew organically over time and
is an awful mix of explicitly declared options and
globally set parameters via environment variables and
other subtle effects.

Add a new Options and TransportOptions type to
allow for the creation of a Policy or http.RoundTripper
with some set of options.
The options struct avoids the need to add yet more
NewXXX functions for every possible combination of
ordered arguments.

The goal of this refactor is to allow specifying the http.Client
to use with the Policy.

Updates tailscale/corp#18177

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2024-12-05 15:50:24 -08:00
Andrew Lytvynov
66aa774167 cmd/gitops-pusher: default previousEtag to controlEtag (#14296)
If previousEtag is empty, then we assume control ACLs were not modified
manually and push the local ACLs. Instead, we defaulted to localEtag
which would be different if local ACLs were different from control.

AFAIK this was always buggy, but never reported?

Fixes #14295

Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
2024-12-05 15:00:54 -08:00
James Tucker
b37a478cac go.mod: bump x/net and dependencies
Pulling in upstream fix for #14201.

Updates #14201

Signed-off-by: James Tucker <james@tailscale.com>
2024-12-05 14:35:15 -08:00
Brad Fitzpatrick
87546a5edf cmd/derper: allow absent SNI when using manual certs and IP literal for hostname
Updates #11776

Change-Id: I81756415feb630da093833accc3074903ebd84a7
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-12-05 09:56:48 -08:00
Irbe Krumina
614c612643 net/netcheck: preserve STUN port defaulting to 3478 (#14289)
Updates tailscale/tailscale#14287

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2024-12-05 13:21:03 +00:00
Tom Proctor
df94a14870 cmd/k8s-operator: don't error for transient failures (#14073)
Every so often, the ProxyGroup and other controllers lose an optimistic locking race
with other controllers that update the objects they create. Stop treating
this as an error event, and instead just log an info level log line for it.

Fixes #14072

Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2024-12-05 12:11:22 +00:00
James Tucker
7f9ebc0a83 cmd/tailscale,net/netcheck: add debug feature to force preferred DERP
This provides an interface for a user to force a preferred DERP outcome
for all future netchecks that will take precedence unless the forced
region is unreachable.

The option does not persist and will be lost when the daemon restarts.

Updates tailscale/corp#18997
Updates tailscale/corp#24755

Signed-off-by: James Tucker <james@tailscale.com>
2024-12-04 16:52:56 -08:00
Brad Fitzpatrick
74069774be net/tstun: remove tailscaled_outbound_dropped_packets_total reason=acl metric for now
Updates #14280

Change-Id: Idff102b3d7650fc9dfbe0c340168806bdf542d76
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-12-04 08:55:54 -08:00
Irbe Krumina
2aac916888 cmd/{containerboot,k8s-operator},kube/kubetypes: kube Ingress L7 proxies only advertise HTTPS endpoint when ready (#14171)
cmd/containerboot,kube/kubetypes,cmd/k8s-operator: detect if Ingress is created in a tailnet that has no HTTPS

This attempts to make Kubernetes Operator L7 Ingress setup failures more explicit:
- the Ingress resource now only advertises HTTPS endpoint via status.ingress.loadBalancer.hostname when/if the proxy has succesfully loaded serve config
- the proxy attempts to catch cases where HTTPS is disabled for the tailnet and logs a warning

Updates tailscale/tailscale#12079
Updates tailscale/tailscale#10407

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2024-12-04 12:00:04 +00:00
Irbe Krumina
aa43388363 cmd/k8s-operator: fix a bunch of status equality checks (#14270)
Updates tailscale/tailscale#14269

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2024-12-04 06:46:51 +00:00
Oliver Rahner
cbf1a4efe9 cmd/k8s-operator/deploy/chart: allow reading OAuth creds from a CSI driver's volume and annotating operator's Service account (#14264)
cmd/k8s-operator/deploy/chart: allow reading OAuth creds from a CSI driver's volume and annotating operator's Service account

Updates #14264

Signed-off-by: Oliver Rahner <o.rahner@dke-data.com>
2024-12-03 17:00:40 +00:00
Tom Proctor
efdfd54797 cmd/k8s-operator: avoid port collision with metrics endpoint (#14185)
When the operator enables metrics on a proxy, it uses the port 9001,
and in the near future it will start using 9002 for the debug endpoint
as well. Make sure we don't choose ports from a range that includes
9001 so that we never clash. Setting TS_SOCKS5_SERVER, TS_HEALTHCHECK_ADDR_PORT,
TS_OUTBOUND_HTTP_PROXY_LISTEN, and PORT could also open arbitrary ports,
so we will need to document that users should not choose ports from the
10000-11000 range for those settings.

Updates #13406

Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2024-12-03 15:02:42 +00:00
Irbe Krumina
9f9063e624 cmd/k8s-operator,k8s-operator,go.mod: optionally create ServiceMonitor (#14248)
* cmd/k8s-operator,k8s-operator,go.mod: optionally create ServiceMonitor

Adds a new spec.metrics.serviceMonitor field to ProxyClass.
If that's set to true (and metrics are enabled), the operator
will create a Prometheus ServiceMonitor for each proxy to which
the ProxyClass applies.
Additionally, create a metrics Service for each proxy that has
metrics enabled.

Updates tailscale/tailscale#11292

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2024-12-03 12:35:25 +00:00
Irbe Krumina
eabb424275 cmd/k8s-operator,docs/k8s: run tun mode proxies in privileged containers (#14262)
We were previously relying on unintended behaviour by runc where
all containers where by default given read/write/mknod permissions
for tun devices.
This behaviour was removed in https://github.com/opencontainers/runc/pull/3468
and released in runc 1.2.
Containerd container runtime, used by Docker and majority of Kubernetes distributions
bumped runc to 1.2 in 1.7.24 https://github.com/containerd/containerd/releases/tag/v1.7.24
thus breaking our reference tun mode Tailscale Kubernetes manifests and Kubernetes
operator proxies.

This PR changes the all Kubernetes container configs that run Tailscale in tun mode
to privileged. This should not be a breaking change because all these containers would
run in a Pod that already has a privileged init container.

Updates tailscale/tailscale#14256
Updates tailscale/tailscale#10814

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2024-12-03 07:01:14 +00:00
KevinLiang10
3f54572539 IPN: Update ServeConfig to accept configuration for Services.
This commit updates ServeConfig to allow configuration to Services (VIPServices for now) via Serve.
The scope of this commit is only adding the Services field to ServeConfig. The field doesn't actually
allow packet flowing yet. The purpose of this commit is to unblock other work on k8s end.

Updates #22953

Signed-off-by: KevinLiang10 <37811973+KevinLiang10@users.noreply.github.com>
2024-12-02 17:35:31 -05:00
Brad Fitzpatrick
8d0c690f89 net/netcheck: clean up ICMP probe AddrPort lookup
Fixes #14200

Change-Id: Ib086814cf63dda5de021403fe1db4fb2a798eaae
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-12-02 09:28:00 -08:00
Tom Proctor
24095e4897 cmd/containerboot: serve health on local endpoint (#14246)
* cmd/containerboot: serve health on local endpoint

We introduced stable (user) metrics in #14035, and `TS_LOCAL_ADDR_PORT`
with it. Rather than requiring users to specify a new addr/port
combination for each new local endpoint they want the container to
serve, this combines the health check endpoint onto the local addr/port
used by metrics if `TS_ENABLE_HEALTH_CHECK` is used instead of
`TS_HEALTHCHECK_ADDR_PORT`.

`TS_LOCAL_ADDR_PORT` now defaults to binding to all interfaces on 9002
so that it works more seamlessly and with less configuration in
environments other than Kubernetes, where the operator always overrides
the default anyway. In particular, listening on localhost would not be
accessible from outside the container, and many scripted container
environments do not know the IP address of the container before it's
started. Listening on all interfaces allows users to just set one env
var (`TS_ENABLE_METRICS` or `TS_ENABLE_HEALTH_CHECK`) to get a fully
functioning local endpoint they can query from outside the container.

Updates #14035, #12898

Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2024-12-02 12:18:09 +00:00
Brad Fitzpatrick
a68efe2088 cmd/checkmetrics: add command for checking metrics against kb
This commit adds a command to validate that all the metrics that
are registring in the client are also present in a path or url.

It is intended to be ran from the KB against the latest version of
tailscale.

Updates tailscale/corp#24066
Updates tailscale/corp#22075

Co-Authored-By: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2024-12-02 10:30:46 +01:00
Irbe Krumina
13faa64c14 cmd/k8s-operator: always set stateful filtering to false (#14216)
Updates tailscale/tailscale#12108

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2024-11-29 15:44:58 +00:00
Irbe Krumina
44c8892c18 Makefile,./build_docker.sh: update kube operator image build target name (#14251)
Updates tailscale/corp#24540
Updates tailscale/tailscale#12914

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2024-11-29 15:32:18 +00:00
Irbe Krumina
f8587e321e cmd/k8s-operator: fix port name change bug for egress ProxyGroup proxies (#14247)
Ensure that the ExternalName Service port names are always synced to the
ClusterIP Service, to fix a bug where if users created a Service with
a single unnamed port and later changed to 1+ named ports, the operator
attempted to apply an invalid multi-port Service with an unnamed port.
Also, fixes a small internal issue where not-yet Service status conditons
were lost on a spec update.

Updates tailscale/tailscale#10102

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2024-11-29 10:37:25 +00:00
Kristoffer Dalby
61dd2662ec tsnet: remove flaky test marker from metrics
Updates #13420

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2024-11-28 15:00:26 +01:00
Kristoffer Dalby
caba123008 wgengine/magicsock: packet/bytes metrics should not count disco
Updates #13420

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2024-11-28 15:00:26 +01:00
Kristoffer Dalby
225d8f5a88 tsnet: validate sent data in metrics test
Updates #13420

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2024-11-28 15:00:26 +01:00
Kristoffer Dalby
e55899386b tsnet: split bytes and routes metrics tests
Updates #13420

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2024-11-28 15:00:26 +01:00
Kristoffer Dalby
06d929f9ac tsnet: send less data in metrics integration test
this commit reduced the amount of data sent in the metrics
data integration test from 10MB to 1MB.

On various machines 10MB was quite flaky, while 1MB has not failed
once on 10000 runs.

Updates #13420

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2024-11-28 15:00:26 +01:00
Kristoffer Dalby
41e56cedf8 health: move health metrics test to health_test
Updates #13420

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2024-11-28 15:00:26 +01:00
Joe Tsai
bac3af06f5 logtail: avoid bytes.Buffer allocation (#11858)
Re-use a pre-allocated bytes.Buffer struct and
shallow the copy the result of bytes.NewBuffer into it
to avoid allocating the struct.

Note that we're only reusing the bytes.Buffer struct itself
and not the underling []byte temporarily stored within it.

Updates #cleanup
Updates tailscale/corp#18514
Updates golang/go#67004

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2024-11-27 11:18:04 -08:00
Anton Tolchanov
bb80f14ff4 ipn/localapi: count localapi requests to metric endpoints
Updates tailscale/corp#22075

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
2024-11-27 09:25:06 +00:00
102 changed files with 3705 additions and 1126 deletions

View File

@@ -55,7 +55,7 @@ jobs:
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@4f3212b61783c3c68e8309a0f18a699764811cda # v3.27.1
uses: github/codeql-action/init@aa578102511db1f4524ed59b8cc2bae4f6e88195 # v3.27.6
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
@@ -66,7 +66,7 @@ jobs:
# Autobuild attempts to build any compiled languages (C/C++, C#, or Java).
# If this step fails, then you should remove it and run the build manually (see below)
- name: Autobuild
uses: github/codeql-action/autobuild@4f3212b61783c3c68e8309a0f18a699764811cda # v3.27.1
uses: github/codeql-action/autobuild@aa578102511db1f4524ed59b8cc2bae4f6e88195 # v3.27.6
# Command-line programs to run using the OS shell.
# 📚 https://git.io/JvXDl
@@ -80,4 +80,4 @@ jobs:
# make release
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@4f3212b61783c3c68e8309a0f18a699764811cda # v3.27.1
uses: github/codeql-action/analyze@aa578102511db1f4524ed59b8cc2bae4f6e88195 # v3.27.6

View File

@@ -80,7 +80,7 @@ jobs:
- name: checkout
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
- name: Restore Cache
uses: actions/cache@6849a6489940f00c2f30c0fb92c6274307ccb58a # v4.1.2
uses: actions/cache@1bd1e32a3bdc45362d1e726936510720a7c30a57 # v4.2.0
with:
# Note: unlike the other setups, this is only grabbing the mod download
# cache, rather than the whole mod directory, as the download cache
@@ -159,7 +159,7 @@ jobs:
cache: false
- name: Restore Cache
uses: actions/cache@6849a6489940f00c2f30c0fb92c6274307ccb58a # v4.1.2
uses: actions/cache@1bd1e32a3bdc45362d1e726936510720a7c30a57 # v4.2.0
with:
# Note: unlike the other setups, this is only grabbing the mod download
# cache, rather than the whole mod directory, as the download cache
@@ -260,7 +260,7 @@ jobs:
- name: checkout
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
- name: Restore Cache
uses: actions/cache@6849a6489940f00c2f30c0fb92c6274307ccb58a # v4.1.2
uses: actions/cache@1bd1e32a3bdc45362d1e726936510720a7c30a57 # v4.2.0
with:
# Note: unlike the other setups, this is only grabbing the mod download
# cache, rather than the whole mod directory, as the download cache
@@ -319,7 +319,7 @@ jobs:
- name: checkout
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
- name: Restore Cache
uses: actions/cache@6849a6489940f00c2f30c0fb92c6274307ccb58a # v4.1.2
uses: actions/cache@1bd1e32a3bdc45362d1e726936510720a7c30a57 # v4.2.0
with:
# Note: unlike the other setups, this is only grabbing the mod download
# cache, rather than the whole mod directory, as the download cache
@@ -367,7 +367,7 @@ jobs:
- name: checkout
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
- name: Restore Cache
uses: actions/cache@6849a6489940f00c2f30c0fb92c6274307ccb58a # v4.1.2
uses: actions/cache@1bd1e32a3bdc45362d1e726936510720a7c30a57 # v4.2.0
with:
# Note: unlike the other setups, this is only grabbing the mod download
# cache, rather than the whole mod directory, as the download cache

View File

@@ -100,7 +100,7 @@ publishdevoperator: ## Build and publish k8s-operator image to location specifie
@test "${REPO}" != "ghcr.io/tailscale/tailscale" || (echo "REPO=... must not be ghcr.io/tailscale/tailscale" && exit 1)
@test "${REPO}" != "tailscale/k8s-operator" || (echo "REPO=... must not be tailscale/k8s-operator" && exit 1)
@test "${REPO}" != "ghcr.io/tailscale/k8s-operator" || (echo "REPO=... must not be ghcr.io/tailscale/k8s-operator" && exit 1)
TAGS="${TAGS}" REPOS=${REPO} PLATFORM=${PLATFORM} PUSH=true TARGET=operator ./build_docker.sh
TAGS="${TAGS}" REPOS=${REPO} PLATFORM=${PLATFORM} PUSH=true TARGET=k8s-operator ./build_docker.sh
publishdevnameserver: ## Build and publish k8s-nameserver image to location specified by ${REPO}
@test -n "${REPO}" || (echo "REPO=... required; e.g. REPO=ghcr.io/${USER}/tailscale" && exit 1)
@@ -116,7 +116,6 @@ sshintegrationtest: ## Run the SSH integration tests in various Docker container
GOOS=linux GOARCH=amd64 ./tool/go build -o ssh/tailssh/testcontainers/tailscaled ./cmd/tailscaled && \
echo "Testing on ubuntu:focal" && docker build --build-arg="BASE=ubuntu:focal" -t ssh-ubuntu-focal ssh/tailssh/testcontainers && \
echo "Testing on ubuntu:jammy" && docker build --build-arg="BASE=ubuntu:jammy" -t ssh-ubuntu-jammy ssh/tailssh/testcontainers && \
echo "Testing on ubuntu:mantic" && docker build --build-arg="BASE=ubuntu:mantic" -t ssh-ubuntu-mantic ssh/tailssh/testcontainers && \
echo "Testing on ubuntu:noble" && docker build --build-arg="BASE=ubuntu:noble" -t ssh-ubuntu-noble ssh/tailssh/testcontainers && \
echo "Testing on alpine:latest" && docker build --build-arg="BASE=alpine:latest" -t ssh-alpine-latest ssh/tailssh/testcontainers

View File

@@ -1 +1 @@
1.77.0
1.79.0

View File

@@ -54,7 +54,7 @@ case "$TARGET" in
--annotations="${ANNOTATIONS}" \
/usr/local/bin/containerboot
;;
operator)
k8s-operator)
DEFAULT_REPOS="tailscale/k8s-operator"
REPOS="${REPOS:-${DEFAULT_REPOS}}"
go run github.com/tailscale/mkctr \

View File

@@ -493,6 +493,17 @@ func (lc *LocalClient) DebugAction(ctx context.Context, action string) error {
return nil
}
// DebugActionBody invokes a debug action with a body parameter, such as
// "debug-force-prefer-derp".
// These are development tools and subject to change or removal over time.
func (lc *LocalClient) DebugActionBody(ctx context.Context, action string, rbody io.Reader) error {
body, err := lc.send(ctx, "POST", "/localapi/v0/debug?action="+url.QueryEscape(action), 200, rbody)
if err != nil {
return fmt.Errorf("error %w: %s", err, body)
}
return nil
}
// DebugResultJSON invokes a debug action and returns its result as something JSON-able.
// These are development tools and subject to change or removal over time.
func (lc *LocalClient) DebugResultJSON(ctx context.Context, action string) (any, error) {

View File

@@ -0,0 +1,131 @@
// Copyright (c) Tailscale Inc & AUTHORS
// SPDX-License-Identifier: BSD-3-Clause
// checkmetrics validates that all metrics in the tailscale client-metrics
// are documented in a given path or URL.
package main
import (
"context"
"flag"
"fmt"
"io"
"log"
"net/http"
"net/http/httptest"
"os"
"strings"
"time"
"tailscale.com/ipn/store/mem"
"tailscale.com/tsnet"
"tailscale.com/tstest/integration/testcontrol"
"tailscale.com/util/httpm"
)
var (
kbPath = flag.String("kb-path", "", "filepath to the client-metrics knowledge base")
kbUrl = flag.String("kb-url", "", "URL to the client-metrics knowledge base page")
)
func main() {
flag.Parse()
if *kbPath == "" && *kbUrl == "" {
log.Fatalf("either -kb-path or -kb-url must be set")
}
var control testcontrol.Server
ts := httptest.NewServer(&control)
defer ts.Close()
td, err := os.MkdirTemp("", "testcontrol")
if err != nil {
log.Fatal(err)
}
defer os.RemoveAll(td)
// tsnet is used not used as a Tailscale client, but as a way to
// boot up Tailscale, have all the metrics registered, and then
// verifiy that all the metrics are documented.
tsn := &tsnet.Server{
Dir: td,
Store: new(mem.Store),
UserLogf: log.Printf,
Ephemeral: true,
ControlURL: ts.URL,
}
if err := tsn.Start(); err != nil {
log.Fatal(err)
}
defer tsn.Close()
log.Printf("checking that all metrics are documented, looking for: %s", tsn.Sys().UserMetricsRegistry().MetricNames())
if *kbPath != "" {
kb, err := readKB(*kbPath)
if err != nil {
log.Fatalf("reading kb: %v", err)
}
missing := undocumentedMetrics(kb, tsn.Sys().UserMetricsRegistry().MetricNames())
if len(missing) > 0 {
log.Fatalf("found undocumented metrics in %q: %v", *kbPath, missing)
}
}
if *kbUrl != "" {
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
kb, err := getKB(ctx, *kbUrl)
if err != nil {
log.Fatalf("getting kb: %v", err)
}
missing := undocumentedMetrics(kb, tsn.Sys().UserMetricsRegistry().MetricNames())
if len(missing) > 0 {
log.Fatalf("found undocumented metrics in %q: %v", *kbUrl, missing)
}
}
}
func readKB(path string) (string, error) {
b, err := os.ReadFile(path)
if err != nil {
return "", fmt.Errorf("reading file: %w", err)
}
return string(b), nil
}
func getKB(ctx context.Context, url string) (string, error) {
req, err := http.NewRequestWithContext(ctx, httpm.GET, url, nil)
if err != nil {
return "", fmt.Errorf("creating request: %w", err)
}
resp, err := http.DefaultClient.Do(req)
if err != nil {
return "", fmt.Errorf("getting kb page: %w", err)
}
if resp.StatusCode != http.StatusOK {
return "", fmt.Errorf("unexpected status code: %d", resp.StatusCode)
}
b, err := io.ReadAll(resp.Body)
if err != nil {
return "", fmt.Errorf("reading body: %w", err)
}
return string(b), nil
}
func undocumentedMetrics(b string, metrics []string) []string {
var missing []string
for _, metric := range metrics {
if !strings.Contains(b, metric) {
missing = append(missing, metric)
}
}
return missing
}

View File

@@ -7,7 +7,6 @@ package main
import (
"log"
"net"
"net/http"
"sync"
)
@@ -23,29 +22,29 @@ type healthz struct {
func (h *healthz) ServeHTTP(w http.ResponseWriter, r *http.Request) {
h.Lock()
defer h.Unlock()
if h.hasAddrs {
w.Write([]byte("ok"))
} else {
http.Error(w, "node currently has no tailscale IPs", http.StatusInternalServerError)
http.Error(w, "node currently has no tailscale IPs", http.StatusServiceUnavailable)
}
}
// runHealthz runs a simple HTTP health endpoint on /healthz, listening on the
// provided address. A containerized tailscale instance is considered healthy if
func (h *healthz) update(healthy bool) {
h.Lock()
defer h.Unlock()
if h.hasAddrs != healthy {
log.Println("Setting healthy", healthy)
}
h.hasAddrs = healthy
}
// healthHandlers registers a simple health handler at /healthz.
// A containerized tailscale instance is considered healthy if
// it has at least one tailnet IP address.
func runHealthz(addr string, h *healthz) {
lis, err := net.Listen("tcp", addr)
if err != nil {
log.Fatalf("error listening on the provided health endpoint address %q: %v", addr, err)
}
mux := http.NewServeMux()
func healthHandlers(mux *http.ServeMux) *healthz {
h := &healthz{}
mux.Handle("GET /healthz", h)
log.Printf("Running healthcheck endpoint at %s/healthz", addr)
hs := &http.Server{Handler: mux}
go func() {
if err := hs.Serve(lis); err != nil {
log.Fatalf("failed running health endpoint: %v", err)
}
}()
return h
}

View File

@@ -9,30 +9,56 @@ import (
"context"
"encoding/json"
"fmt"
"log"
"net/http"
"net/netip"
"os"
"tailscale.com/kube/kubeapi"
"tailscale.com/kube/kubeclient"
"tailscale.com/kube/kubetypes"
"tailscale.com/tailcfg"
)
// storeDeviceID writes deviceID to 'device_id' data field of the named
// Kubernetes Secret.
func storeDeviceID(ctx context.Context, secretName string, deviceID tailcfg.StableNodeID) error {
s := &kubeapi.Secret{
Data: map[string][]byte{
"device_id": []byte(deviceID),
},
}
return kc.StrategicMergePatchSecret(ctx, secretName, s, "tailscale-container")
// kubeClient is a wrapper around Tailscale's internal kube client that knows how to talk to the kube API server. We use
// this rather than any of the upstream Kubernetes client libaries to avoid extra imports.
type kubeClient struct {
kubeclient.Client
stateSecret string
canPatch bool // whether the client has permissions to patch Kubernetes Secrets
}
// storeDeviceEndpoints writes device's tailnet IPs and MagicDNS name to fields
// 'device_ips', 'device_fqdn' of the named Kubernetes Secret.
func storeDeviceEndpoints(ctx context.Context, secretName string, fqdn string, addresses []netip.Prefix) error {
func newKubeClient(root string, stateSecret string) (*kubeClient, error) {
if root != "/" {
// If we are running in a test, we need to set the root path to the fake
// service account directory.
kubeclient.SetRootPathForTesting(root)
}
var err error
kc, err := kubeclient.New("tailscale-container")
if err != nil {
return nil, fmt.Errorf("Error creating kube client: %w", err)
}
if (root != "/") || os.Getenv("TS_KUBERNETES_READ_API_SERVER_ADDRESS_FROM_ENV") == "true" {
// Derive the API server address from the environment variables
// Used to set http server in tests, or optionally enabled by flag
kc.SetURL(fmt.Sprintf("https://%s:%s", os.Getenv("KUBERNETES_SERVICE_HOST"), os.Getenv("KUBERNETES_SERVICE_PORT_HTTPS")))
}
return &kubeClient{Client: kc, stateSecret: stateSecret}, nil
}
// storeDeviceID writes deviceID to 'device_id' data field of the client's state Secret.
func (kc *kubeClient) storeDeviceID(ctx context.Context, deviceID tailcfg.StableNodeID) error {
s := &kubeapi.Secret{
Data: map[string][]byte{
kubetypes.KeyDeviceID: []byte(deviceID),
},
}
return kc.StrategicMergePatchSecret(ctx, kc.stateSecret, s, "tailscale-container")
}
// storeDeviceEndpoints writes device's tailnet IPs and MagicDNS name to fields 'device_ips', 'device_fqdn' of client's
// state Secret.
func (kc *kubeClient) storeDeviceEndpoints(ctx context.Context, fqdn string, addresses []netip.Prefix) error {
var ips []string
for _, addr := range addresses {
ips = append(ips, addr.Addr().String())
@@ -44,16 +70,28 @@ func storeDeviceEndpoints(ctx context.Context, secretName string, fqdn string, a
s := &kubeapi.Secret{
Data: map[string][]byte{
"device_fqdn": []byte(fqdn),
"device_ips": deviceIPs,
kubetypes.KeyDeviceFQDN: []byte(fqdn),
kubetypes.KeyDeviceIPs: deviceIPs,
},
}
return kc.StrategicMergePatchSecret(ctx, secretName, s, "tailscale-container")
return kc.StrategicMergePatchSecret(ctx, kc.stateSecret, s, "tailscale-container")
}
// storeHTTPSEndpoint writes an HTTPS endpoint exposed by this device via 'tailscale serve' to the client's state
// Secret. In practice this will be the same value that gets written to 'device_fqdn', but this should only be called
// when the serve config has been successfully set up.
func (kc *kubeClient) storeHTTPSEndpoint(ctx context.Context, ep string) error {
s := &kubeapi.Secret{
Data: map[string][]byte{
kubetypes.KeyHTTPSEndpoint: []byte(ep),
},
}
return kc.StrategicMergePatchSecret(ctx, kc.stateSecret, s, "tailscale-container")
}
// deleteAuthKey deletes the 'authkey' field of the given kube
// secret. No-op if there is no authkey in the secret.
func deleteAuthKey(ctx context.Context, secretName string) error {
func (kc *kubeClient) deleteAuthKey(ctx context.Context) error {
// m is a JSON Patch data structure, see https://jsonpatch.com/ or RFC 6902.
m := []kubeclient.JSONPatch{
{
@@ -61,7 +99,7 @@ func deleteAuthKey(ctx context.Context, secretName string) error {
Path: "/data/authkey",
},
}
if err := kc.JSONPatchResource(ctx, secretName, kubeclient.TypeSecrets, m); err != nil {
if err := kc.JSONPatchResource(ctx, kc.stateSecret, kubeclient.TypeSecrets, m); err != nil {
if s, ok := err.(*kubeapi.Status); ok && s.Code == http.StatusUnprocessableEntity {
// This is kubernetes-ese for "the field you asked to
// delete already doesn't exist", aka no-op.
@@ -72,22 +110,19 @@ func deleteAuthKey(ctx context.Context, secretName string) error {
return nil
}
var kc kubeclient.Client
func initKubeClient(root string) {
if root != "/" {
// If we are running in a test, we need to set the root path to the fake
// service account directory.
kubeclient.SetRootPathForTesting(root)
// storeCapVerUID stores the current capability version of tailscale and, if provided, UID of the Pod in the tailscale
// state Secret.
// These two fields are used by the Kubernetes Operator to observe the current capability version of tailscaled running in this container.
func (kc *kubeClient) storeCapVerUID(ctx context.Context, podUID string) error {
capVerS := fmt.Sprintf("%d", tailcfg.CurrentCapabilityVersion)
d := map[string][]byte{
kubetypes.KeyCapVer: []byte(capVerS),
}
var err error
kc, err = kubeclient.New("tailscale-container")
if err != nil {
log.Fatalf("Error creating kube client: %v", err)
if podUID != "" {
d[kubetypes.KeyPodUID] = []byte(podUID)
}
if (root != "/") || os.Getenv("TS_KUBERNETES_READ_API_SERVER_ADDRESS_FROM_ENV") == "true" {
// Derive the API server address from the environment variables
// Used to set http server in tests, or optionally enabled by flag
kc.SetURL(fmt.Sprintf("https://%s:%s", os.Getenv("KUBERNETES_SERVICE_HOST"), os.Getenv("KUBERNETES_SERVICE_PORT_HTTPS")))
s := &kubeapi.Secret{
Data: d,
}
return kc.StrategicMergePatchSecret(ctx, kc.stateSecret, s, "tailscale-container")
}

View File

@@ -21,7 +21,7 @@ func TestSetupKube(t *testing.T) {
cfg *settings
wantErr bool
wantCfg *settings
kc kubeclient.Client
kc *kubeClient
}{
{
name: "TS_AUTHKEY set, state Secret exists",
@@ -29,14 +29,14 @@ func TestSetupKube(t *testing.T) {
AuthKey: "foo",
KubeSecret: "foo",
},
kc: &kubeclient.FakeClient{
kc: &kubeClient{stateSecret: "foo", Client: &kubeclient.FakeClient{
CheckSecretPermissionsImpl: func(context.Context, string) (bool, bool, error) {
return false, false, nil
},
GetSecretImpl: func(context.Context, string) (*kubeapi.Secret, error) {
return nil, nil
},
},
}},
wantCfg: &settings{
AuthKey: "foo",
KubeSecret: "foo",
@@ -48,14 +48,14 @@ func TestSetupKube(t *testing.T) {
AuthKey: "foo",
KubeSecret: "foo",
},
kc: &kubeclient.FakeClient{
kc: &kubeClient{stateSecret: "foo", Client: &kubeclient.FakeClient{
CheckSecretPermissionsImpl: func(context.Context, string) (bool, bool, error) {
return false, true, nil
},
GetSecretImpl: func(context.Context, string) (*kubeapi.Secret, error) {
return nil, &kubeapi.Status{Code: 404}
},
},
}},
wantCfg: &settings{
AuthKey: "foo",
KubeSecret: "foo",
@@ -67,14 +67,14 @@ func TestSetupKube(t *testing.T) {
AuthKey: "foo",
KubeSecret: "foo",
},
kc: &kubeclient.FakeClient{
kc: &kubeClient{stateSecret: "foo", Client: &kubeclient.FakeClient{
CheckSecretPermissionsImpl: func(context.Context, string) (bool, bool, error) {
return false, false, nil
},
GetSecretImpl: func(context.Context, string) (*kubeapi.Secret, error) {
return nil, &kubeapi.Status{Code: 404}
},
},
}},
wantCfg: &settings{
AuthKey: "foo",
KubeSecret: "foo",
@@ -87,14 +87,14 @@ func TestSetupKube(t *testing.T) {
AuthKey: "foo",
KubeSecret: "foo",
},
kc: &kubeclient.FakeClient{
kc: &kubeClient{stateSecret: "foo", Client: &kubeclient.FakeClient{
CheckSecretPermissionsImpl: func(context.Context, string) (bool, bool, error) {
return false, false, nil
},
GetSecretImpl: func(context.Context, string) (*kubeapi.Secret, error) {
return nil, &kubeapi.Status{Code: 403}
},
},
}},
wantCfg: &settings{
AuthKey: "foo",
KubeSecret: "foo",
@@ -111,11 +111,11 @@ func TestSetupKube(t *testing.T) {
AuthKey: "foo",
KubeSecret: "foo",
},
kc: &kubeclient.FakeClient{
kc: &kubeClient{stateSecret: "foo", Client: &kubeclient.FakeClient{
CheckSecretPermissionsImpl: func(context.Context, string) (bool, bool, error) {
return false, false, errors.New("broken")
},
},
}},
wantErr: true,
},
{
@@ -127,14 +127,14 @@ func TestSetupKube(t *testing.T) {
wantCfg: &settings{
KubeSecret: "foo",
},
kc: &kubeclient.FakeClient{
kc: &kubeClient{stateSecret: "foo", Client: &kubeclient.FakeClient{
CheckSecretPermissionsImpl: func(context.Context, string) (bool, bool, error) {
return false, true, nil
},
GetSecretImpl: func(context.Context, string) (*kubeapi.Secret, error) {
return nil, &kubeapi.Status{Code: 404}
},
},
}},
},
{
// Interactive login using URL in Pod logs
@@ -145,28 +145,28 @@ func TestSetupKube(t *testing.T) {
wantCfg: &settings{
KubeSecret: "foo",
},
kc: &kubeclient.FakeClient{
kc: &kubeClient{stateSecret: "foo", Client: &kubeclient.FakeClient{
CheckSecretPermissionsImpl: func(context.Context, string) (bool, bool, error) {
return false, false, nil
},
GetSecretImpl: func(context.Context, string) (*kubeapi.Secret, error) {
return &kubeapi.Secret{}, nil
},
},
}},
},
{
name: "TS_AUTHKEY not set, state Secret contains auth key, we do not have RBAC to patch it",
cfg: &settings{
KubeSecret: "foo",
},
kc: &kubeclient.FakeClient{
kc: &kubeClient{stateSecret: "foo", Client: &kubeclient.FakeClient{
CheckSecretPermissionsImpl: func(context.Context, string) (bool, bool, error) {
return false, false, nil
},
GetSecretImpl: func(context.Context, string) (*kubeapi.Secret, error) {
return &kubeapi.Secret{Data: map[string][]byte{"authkey": []byte("foo")}}, nil
},
},
}},
wantCfg: &settings{
KubeSecret: "foo",
},
@@ -177,14 +177,14 @@ func TestSetupKube(t *testing.T) {
cfg: &settings{
KubeSecret: "foo",
},
kc: &kubeclient.FakeClient{
kc: &kubeClient{stateSecret: "foo", Client: &kubeclient.FakeClient{
CheckSecretPermissionsImpl: func(context.Context, string) (bool, bool, error) {
return true, false, nil
},
GetSecretImpl: func(context.Context, string) (*kubeapi.Secret, error) {
return &kubeapi.Secret{Data: map[string][]byte{"authkey": []byte("foo")}}, nil
},
},
}},
wantCfg: &settings{
KubeSecret: "foo",
AuthKey: "foo",
@@ -194,9 +194,9 @@ func TestSetupKube(t *testing.T) {
}
for _, tt := range tests {
kc = tt.kc
kc := tt.kc
t.Run(tt.name, func(t *testing.T) {
if err := tt.cfg.setupKube(context.Background()); (err != nil) != tt.wantErr {
if err := tt.cfg.setupKube(context.Background(), kc); (err != nil) != tt.wantErr {
t.Errorf("settings.setupKube() error = %v, wantErr %v", err, tt.wantErr)
}
if diff := cmp.Diff(*tt.cfg, *tt.wantCfg); diff != "" {

View File

@@ -52,11 +52,17 @@
// ${TS_CERT_DOMAIN}, it will be replaced with the value of the available FQDN.
// It cannot be used in conjunction with TS_DEST_IP. The file is watched for changes,
// and will be re-applied when it changes.
// - TS_HEALTHCHECK_ADDR_PORT: if specified, an HTTP health endpoint will be
// served at /healthz at the provided address, which should be in form [<address>]:<port>.
// If not set, no health check will be run. If set to :<port>, addr will default to 0.0.0.0
// The health endpoint will return 200 OK if this node has at least one tailnet IP address,
// otherwise returns 503.
// - TS_HEALTHCHECK_ADDR_PORT: deprecated, use TS_ENABLE_HEALTH_CHECK instead and optionally
// set TS_LOCAL_ADDR_PORT. Will be removed in 1.82.0.
// - TS_LOCAL_ADDR_PORT: the address and port to serve local metrics and health
// check endpoints if enabled via TS_ENABLE_METRICS and/or TS_ENABLE_HEALTH_CHECK.
// Defaults to [::]:9002, serving on all available interfaces.
// - TS_ENABLE_METRICS: if true, a metrics endpoint will be served at /metrics on
// the address specified by TS_LOCAL_ADDR_PORT. See https://tailscale.com/kb/1482/client-metrics
// for more information on the metrics exposed.
// - TS_ENABLE_HEALTH_CHECK: if true, a health check endpoint will be served at /healthz on
// the address specified by TS_LOCAL_ADDR_PORT. The health endpoint will return 200
// OK if this node has at least one tailnet IP address, otherwise returns 503.
// NB: the health criteria might change in the future.
// - TS_EXPERIMENTAL_VERSIONED_CONFIG_DIR: if specified, a path to a
// directory that containers tailscaled config in file. The config file needs to be
@@ -99,6 +105,7 @@ import (
"log"
"math"
"net"
"net/http"
"net/netip"
"os"
"os/signal"
@@ -114,6 +121,7 @@ import (
"tailscale.com/client/tailscale"
"tailscale.com/ipn"
kubeutils "tailscale.com/k8s-operator"
"tailscale.com/kube/kubetypes"
"tailscale.com/tailcfg"
"tailscale.com/types/logger"
"tailscale.com/types/ptr"
@@ -160,9 +168,13 @@ func main() {
bootCtx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
defer cancel()
var kc *kubeClient
if cfg.InKubernetes {
initKubeClient(cfg.Root)
if err := cfg.setupKube(bootCtx); err != nil {
kc, err = newKubeClient(cfg.Root, cfg.KubeSecret)
if err != nil {
log.Fatalf("error initializing kube client: %v", err)
}
if err := cfg.setupKube(bootCtx, kc); err != nil {
log.Fatalf("error setting up for running on Kubernetes: %v", err)
}
}
@@ -178,12 +190,32 @@ func main() {
}
defer killTailscaled()
if cfg.LocalAddrPort != "" && cfg.MetricsEnabled {
m := &metrics{
lc: client,
debugEndpoint: cfg.DebugAddrPort,
var healthCheck *healthz
if cfg.HealthCheckAddrPort != "" {
mux := http.NewServeMux()
log.Printf("Running healthcheck endpoint at %s/healthz", cfg.HealthCheckAddrPort)
healthCheck = healthHandlers(mux)
close := runHTTPServer(mux, cfg.HealthCheckAddrPort)
defer close()
}
if cfg.localMetricsEnabled() || cfg.localHealthEnabled() {
mux := http.NewServeMux()
if cfg.localMetricsEnabled() {
log.Printf("Running metrics endpoint at %s/metrics", cfg.LocalAddrPort)
metricsHandlers(mux, client, cfg.DebugAddrPort)
}
runMetrics(cfg.LocalAddrPort, m)
if cfg.localHealthEnabled() {
log.Printf("Running healthcheck endpoint at %s/healthz", cfg.LocalAddrPort)
healthCheck = healthHandlers(mux)
}
close := runHTTPServer(mux, cfg.LocalAddrPort)
defer close()
}
if cfg.EnableForwardingOptimizations {
@@ -292,12 +324,18 @@ authLoop:
}
}
// Remove any serve config and advertised HTTPS endpoint that may have been set by a previous run of
// containerboot, but only if we're providing a new one.
if cfg.ServeConfigPath != "" {
// Remove any serve config that may have been set by a previous run of
// containerboot, but only if we're providing a new one.
log.Printf("serve proxy: unsetting previous config")
if err := client.SetServeConfig(ctx, new(ipn.ServeConfig)); err != nil {
log.Fatalf("failed to unset serve config: %v", err)
}
if hasKubeStateStore(cfg) {
if err := kc.storeHTTPSEndpoint(ctx, ""); err != nil {
log.Fatalf("failed to update HTTPS endpoint in tailscale state: %v", err)
}
}
}
if hasKubeStateStore(cfg) && isTwoStepConfigAuthOnce(cfg) {
@@ -305,11 +343,17 @@ authLoop:
// authkey is no longer needed. We don't strictly need to
// wipe it, but it's good hygiene.
log.Printf("Deleting authkey from kube secret")
if err := deleteAuthKey(ctx, cfg.KubeSecret); err != nil {
if err := kc.deleteAuthKey(ctx); err != nil {
log.Fatalf("deleting authkey from kube secret: %v", err)
}
}
if hasKubeStateStore(cfg) {
if err := kc.storeCapVerUID(ctx, cfg.PodUID); err != nil {
log.Fatalf("storing capability version and UID: %v", err)
}
}
w, err = client.WatchIPNBus(ctx, ipn.NotifyInitialNetMap|ipn.NotifyInitialState)
if err != nil {
log.Fatalf("rewatching tailscaled for updates after auth: %v", err)
@@ -329,12 +373,9 @@ authLoop:
certDomain = new(atomic.Pointer[string])
certDomainChanged = make(chan bool, 1)
h = &healthz{} // http server for the healthz endpoint
healthzRunner = sync.OnceFunc(func() { runHealthz(cfg.HealthCheckAddrPort, h) })
triggerWatchServeConfigChanges sync.Once
)
if cfg.ServeConfigPath != "" {
go watchServeConfigChanges(ctx, cfg.ServeConfigPath, certDomainChanged, certDomain, client)
}
var nfr linuxfw.NetfilterRunner
if isL3Proxy(cfg) {
nfr, err = newNetfilterRunner(log.Printf)
@@ -435,7 +476,7 @@ runLoop:
// fails.
deviceID := n.NetMap.SelfNode.StableID()
if hasKubeStateStore(cfg) && deephash.Update(&currentDeviceID, &deviceID) {
if err := storeDeviceID(ctx, cfg.KubeSecret, n.NetMap.SelfNode.StableID()); err != nil {
if err := kc.storeDeviceID(ctx, n.NetMap.SelfNode.StableID()); err != nil {
log.Fatalf("storing device ID in Kubernetes Secret: %v", err)
}
}
@@ -508,8 +549,11 @@ runLoop:
resetTimer(false)
backendAddrs = newBackendAddrs
}
if cfg.ServeConfigPath != "" && len(n.NetMap.DNS.CertDomains) != 0 {
cd := n.NetMap.DNS.CertDomains[0]
if cfg.ServeConfigPath != "" {
cd := certDomainFromNetmap(n.NetMap)
if cd == "" {
cd = kubetypes.ValueNoHTTPS
}
prev := certDomain.Swap(ptr.To(cd))
if prev == nil || *prev != cd {
select {
@@ -551,17 +595,21 @@ runLoop:
// TODO (irbekrm): instead of using the IP and FQDN, have some other mechanism for the proxy signal that it is 'Ready'.
deviceEndpoints := []any{n.NetMap.SelfNode.Name(), n.NetMap.SelfNode.Addresses()}
if hasKubeStateStore(cfg) && deephash.Update(&currentDeviceEndpoints, &deviceEndpoints) {
if err := storeDeviceEndpoints(ctx, cfg.KubeSecret, n.NetMap.SelfNode.Name(), n.NetMap.SelfNode.Addresses().AsSlice()); err != nil {
if err := kc.storeDeviceEndpoints(ctx, n.NetMap.SelfNode.Name(), n.NetMap.SelfNode.Addresses().AsSlice()); err != nil {
log.Fatalf("storing device IPs and FQDN in Kubernetes Secret: %v", err)
}
}
if cfg.HealthCheckAddrPort != "" {
h.Lock()
h.hasAddrs = len(addrs) != 0
h.Unlock()
healthzRunner()
if healthCheck != nil {
healthCheck.update(len(addrs) != 0)
}
if cfg.ServeConfigPath != "" {
triggerWatchServeConfigChanges.Do(func() {
go watchServeConfigChanges(ctx, cfg.ServeConfigPath, certDomainChanged, certDomain, client, kc)
})
}
if egressSvcsNotify != nil {
egressSvcsNotify <- n
}
@@ -751,3 +799,22 @@ func tailscaledConfigFilePath() string {
log.Printf("Using tailscaled config file %q to match current capability version %d", filePath, tailcfg.CurrentCapabilityVersion)
return filePath
}
func runHTTPServer(mux *http.ServeMux, addr string) (close func() error) {
ln, err := net.Listen("tcp", addr)
if err != nil {
log.Fatalf("failed to listen on addr %q: %v", addr, err)
}
srv := &http.Server{Handler: mux}
go func() {
if err := srv.Serve(ln); err != nil {
log.Fatalf("failed running server: %v", err)
}
}()
return func() error {
err := srv.Shutdown(context.Background())
return errors.Join(err, ln.Close())
}
}

View File

@@ -31,6 +31,7 @@ import (
"github.com/google/go-cmp/cmp"
"golang.org/x/sys/unix"
"tailscale.com/ipn"
"tailscale.com/kube/egressservices"
"tailscale.com/tailcfg"
"tailscale.com/tstest"
"tailscale.com/types/netmap"
@@ -57,6 +58,16 @@ func TestContainerBoot(t *testing.T) {
if err != nil {
t.Fatalf("error unmarshaling tailscaled config: %v", err)
}
serveConf := ipn.ServeConfig{TCP: map[uint16]*ipn.TCPPortHandler{80: {HTTP: true}}}
serveConfBytes, err := json.Marshal(serveConf)
if err != nil {
t.Fatalf("error unmarshaling serve config: %v", err)
}
egressSvcsCfg := egressservices.Configs{"foo": {TailnetTarget: egressservices.TailnetTarget{FQDN: "foo.tailnetxyx.ts.net"}}}
egressSvcsCfgBytes, err := json.Marshal(egressSvcsCfg)
if err != nil {
t.Fatalf("error unmarshaling egress services config: %v", err)
}
dirs := []string{
"var/lib",
@@ -73,14 +84,16 @@ func TestContainerBoot(t *testing.T) {
}
}
files := map[string][]byte{
"usr/bin/tailscaled": fakeTailscaled,
"usr/bin/tailscale": fakeTailscale,
"usr/bin/iptables": fakeTailscale,
"usr/bin/ip6tables": fakeTailscale,
"dev/net/tun": []byte(""),
"proc/sys/net/ipv4/ip_forward": []byte("0"),
"proc/sys/net/ipv6/conf/all/forwarding": []byte("0"),
"etc/tailscaled/cap-95.hujson": tailscaledConfBytes,
"usr/bin/tailscaled": fakeTailscaled,
"usr/bin/tailscale": fakeTailscale,
"usr/bin/iptables": fakeTailscale,
"usr/bin/ip6tables": fakeTailscale,
"dev/net/tun": []byte(""),
"proc/sys/net/ipv4/ip_forward": []byte("0"),
"proc/sys/net/ipv6/conf/all/forwarding": []byte("0"),
"etc/tailscaled/cap-95.hujson": tailscaledConfBytes,
"etc/tailscaled/serve-config.json": serveConfBytes,
"etc/tailscaled/egress-services-config.json": egressSvcsCfgBytes,
}
resetFiles := func() {
for path, content := range files {
@@ -101,6 +114,26 @@ func TestContainerBoot(t *testing.T) {
argFile := filepath.Join(d, "args")
runningSockPath := filepath.Join(d, "tmp/tailscaled.sock")
var localAddrPort, healthAddrPort int
for _, p := range []*int{&localAddrPort, &healthAddrPort} {
ln, err := net.Listen("tcp", ":0")
if err != nil {
t.Fatalf("Failed to open listener: %v", err)
}
if err := ln.Close(); err != nil {
t.Fatalf("Failed to close listener: %v", err)
}
port := ln.Addr().(*net.TCPAddr).Port
*p = port
}
metricsURL := func(port int) string {
return fmt.Sprintf("http://127.0.0.1:%d/metrics", port)
}
healthURL := func(port int) string {
return fmt.Sprintf("http://127.0.0.1:%d/healthz", port)
}
capver := fmt.Sprintf("%d", tailcfg.CurrentCapabilityVersion)
type phase struct {
// If non-nil, send this IPN bus notification (and remember it as the
@@ -119,6 +152,8 @@ func TestContainerBoot(t *testing.T) {
// WantFatalLog is the fatal log message we expect from containerboot.
// If set for a phase, the test will finish on that phase.
WantFatalLog string
EndpointStatuses map[string]int
}
runningNotify := &ipn.Notify{
State: ptr.To(ipn.Running),
@@ -147,6 +182,11 @@ func TestContainerBoot(t *testing.T) {
"/usr/bin/tailscaled --socket=/tmp/tailscaled.sock --state=mem: --statedir=/tmp --tun=userspace-networking",
"/usr/bin/tailscale --socket=/tmp/tailscaled.sock up --accept-dns=false",
},
// No metrics or health by default.
EndpointStatuses: map[string]int{
metricsURL(9002): -1,
healthURL(9002): -1,
},
},
{
Notify: runningNotify,
@@ -453,10 +493,11 @@ func TestContainerBoot(t *testing.T) {
{
Notify: runningNotify,
WantKubeSecret: map[string]string{
"authkey": "tskey-key",
"device_fqdn": "test-node.test.ts.net",
"device_id": "myID",
"device_ips": `["100.64.0.1"]`,
"authkey": "tskey-key",
"device_fqdn": "test-node.test.ts.net",
"device_id": "myID",
"device_ips": `["100.64.0.1"]`,
"tailscale_capver": capver,
},
},
},
@@ -546,9 +587,10 @@ func TestContainerBoot(t *testing.T) {
"/usr/bin/tailscale --socket=/tmp/tailscaled.sock set --accept-dns=false",
},
WantKubeSecret: map[string]string{
"device_fqdn": "test-node.test.ts.net",
"device_id": "myID",
"device_ips": `["100.64.0.1"]`,
"device_fqdn": "test-node.test.ts.net",
"device_id": "myID",
"device_ips": `["100.64.0.1"]`,
"tailscale_capver": capver,
},
},
},
@@ -575,10 +617,11 @@ func TestContainerBoot(t *testing.T) {
{
Notify: runningNotify,
WantKubeSecret: map[string]string{
"authkey": "tskey-key",
"device_fqdn": "test-node.test.ts.net",
"device_id": "myID",
"device_ips": `["100.64.0.1"]`,
"authkey": "tskey-key",
"device_fqdn": "test-node.test.ts.net",
"device_id": "myID",
"device_ips": `["100.64.0.1"]`,
"tailscale_capver": capver,
},
},
{
@@ -593,10 +636,11 @@ func TestContainerBoot(t *testing.T) {
},
},
WantKubeSecret: map[string]string{
"authkey": "tskey-key",
"device_fqdn": "new-name.test.ts.net",
"device_id": "newID",
"device_ips": `["100.64.0.1"]`,
"authkey": "tskey-key",
"device_fqdn": "new-name.test.ts.net",
"device_id": "newID",
"device_ips": `["100.64.0.1"]`,
"tailscale_capver": capver,
},
},
},
@@ -700,6 +744,199 @@ func TestContainerBoot(t *testing.T) {
},
},
},
{
Name: "metrics_enabled",
Env: map[string]string{
"TS_LOCAL_ADDR_PORT": fmt.Sprintf("[::]:%d", localAddrPort),
"TS_ENABLE_METRICS": "true",
},
Phases: []phase{
{
WantCmds: []string{
"/usr/bin/tailscaled --socket=/tmp/tailscaled.sock --state=mem: --statedir=/tmp --tun=userspace-networking",
"/usr/bin/tailscale --socket=/tmp/tailscaled.sock up --accept-dns=false",
},
EndpointStatuses: map[string]int{
metricsURL(localAddrPort): 200,
healthURL(localAddrPort): -1,
},
}, {
Notify: runningNotify,
},
},
},
{
Name: "health_enabled",
Env: map[string]string{
"TS_LOCAL_ADDR_PORT": fmt.Sprintf("[::]:%d", localAddrPort),
"TS_ENABLE_HEALTH_CHECK": "true",
},
Phases: []phase{
{
WantCmds: []string{
"/usr/bin/tailscaled --socket=/tmp/tailscaled.sock --state=mem: --statedir=/tmp --tun=userspace-networking",
"/usr/bin/tailscale --socket=/tmp/tailscaled.sock up --accept-dns=false",
},
EndpointStatuses: map[string]int{
metricsURL(localAddrPort): -1,
healthURL(localAddrPort): 503, // Doesn't start passing until the next phase.
},
}, {
Notify: runningNotify,
EndpointStatuses: map[string]int{
metricsURL(localAddrPort): -1,
healthURL(localAddrPort): 200,
},
},
},
},
{
Name: "metrics_and_health_on_same_port",
Env: map[string]string{
"TS_LOCAL_ADDR_PORT": fmt.Sprintf("[::]:%d", localAddrPort),
"TS_ENABLE_METRICS": "true",
"TS_ENABLE_HEALTH_CHECK": "true",
},
Phases: []phase{
{
WantCmds: []string{
"/usr/bin/tailscaled --socket=/tmp/tailscaled.sock --state=mem: --statedir=/tmp --tun=userspace-networking",
"/usr/bin/tailscale --socket=/tmp/tailscaled.sock up --accept-dns=false",
},
EndpointStatuses: map[string]int{
metricsURL(localAddrPort): 200,
healthURL(localAddrPort): 503, // Doesn't start passing until the next phase.
},
}, {
Notify: runningNotify,
EndpointStatuses: map[string]int{
metricsURL(localAddrPort): 200,
healthURL(localAddrPort): 200,
},
},
},
},
{
Name: "local_metrics_and_deprecated_health",
Env: map[string]string{
"TS_LOCAL_ADDR_PORT": fmt.Sprintf("[::]:%d", localAddrPort),
"TS_ENABLE_METRICS": "true",
"TS_HEALTHCHECK_ADDR_PORT": fmt.Sprintf("[::]:%d", healthAddrPort),
},
Phases: []phase{
{
WantCmds: []string{
"/usr/bin/tailscaled --socket=/tmp/tailscaled.sock --state=mem: --statedir=/tmp --tun=userspace-networking",
"/usr/bin/tailscale --socket=/tmp/tailscaled.sock up --accept-dns=false",
},
EndpointStatuses: map[string]int{
metricsURL(localAddrPort): 200,
healthURL(healthAddrPort): 503, // Doesn't start passing until the next phase.
},
}, {
Notify: runningNotify,
EndpointStatuses: map[string]int{
metricsURL(localAddrPort): 200,
healthURL(healthAddrPort): 200,
},
},
},
},
{
Name: "serve_config_no_kube",
Env: map[string]string{
"TS_SERVE_CONFIG": filepath.Join(d, "etc/tailscaled/serve-config.json"),
"TS_AUTHKEY": "tskey-key",
},
Phases: []phase{
{
WantCmds: []string{
"/usr/bin/tailscaled --socket=/tmp/tailscaled.sock --state=mem: --statedir=/tmp --tun=userspace-networking",
"/usr/bin/tailscale --socket=/tmp/tailscaled.sock up --accept-dns=false --authkey=tskey-key",
},
},
{
Notify: runningNotify,
},
},
},
{
Name: "serve_config_kube",
Env: map[string]string{
"KUBERNETES_SERVICE_HOST": kube.Host,
"KUBERNETES_SERVICE_PORT_HTTPS": kube.Port,
"TS_SERVE_CONFIG": filepath.Join(d, "etc/tailscaled/serve-config.json"),
},
KubeSecret: map[string]string{
"authkey": "tskey-key",
},
Phases: []phase{
{
WantCmds: []string{
"/usr/bin/tailscaled --socket=/tmp/tailscaled.sock --state=kube:tailscale --statedir=/tmp --tun=userspace-networking",
"/usr/bin/tailscale --socket=/tmp/tailscaled.sock up --accept-dns=false --authkey=tskey-key",
},
WantKubeSecret: map[string]string{
"authkey": "tskey-key",
},
},
{
Notify: runningNotify,
WantKubeSecret: map[string]string{
"authkey": "tskey-key",
"device_fqdn": "test-node.test.ts.net",
"device_id": "myID",
"device_ips": `["100.64.0.1"]`,
"https_endpoint": "no-https",
"tailscale_capver": capver,
},
},
},
},
{
Name: "egress_svcs_config_kube",
Env: map[string]string{
"KUBERNETES_SERVICE_HOST": kube.Host,
"KUBERNETES_SERVICE_PORT_HTTPS": kube.Port,
"TS_EGRESS_SERVICES_CONFIG_PATH": filepath.Join(d, "etc/tailscaled/egress-services-config.json"),
},
KubeSecret: map[string]string{
"authkey": "tskey-key",
},
Phases: []phase{
{
WantCmds: []string{
"/usr/bin/tailscaled --socket=/tmp/tailscaled.sock --state=kube:tailscale --statedir=/tmp --tun=userspace-networking",
"/usr/bin/tailscale --socket=/tmp/tailscaled.sock up --accept-dns=false --authkey=tskey-key",
},
WantKubeSecret: map[string]string{
"authkey": "tskey-key",
},
},
{
Notify: runningNotify,
WantKubeSecret: map[string]string{
"authkey": "tskey-key",
"device_fqdn": "test-node.test.ts.net",
"device_id": "myID",
"device_ips": `["100.64.0.1"]`,
"tailscale_capver": capver,
},
},
},
},
{
Name: "egress_svcs_config_no_kube",
Env: map[string]string{
"TS_EGRESS_SERVICES_CONFIG_PATH": filepath.Join(d, "etc/tailscaled/egress-services-config.json"),
"TS_AUTHKEY": "tskey-key",
},
Phases: []phase{
{
WantFatalLog: "TS_EGRESS_SERVICES_CONFIG_PATH is only supported for Tailscale running on Kubernetes",
},
},
},
}
for _, test := range tests {
@@ -796,7 +1033,26 @@ func TestContainerBoot(t *testing.T) {
return nil
})
if err != nil {
t.Fatal(err)
t.Fatalf("phase %d: %v", i, err)
}
for url, want := range p.EndpointStatuses {
err := tstest.WaitFor(2*time.Second, func() error {
resp, err := http.Get(url)
if err != nil && want != -1 {
return fmt.Errorf("GET %s: %v", url, err)
}
if want > 0 && resp.StatusCode != want {
defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body)
return fmt.Errorf("GET %s, want %d, got %d\n%s", url, want, resp.StatusCode, string(body))
}
return nil
})
if err != nil {
t.Fatalf("phase %d: %v", i, err)
}
}
}
waitLogLine(t, 2*time.Second, cbOut, "Startup complete, waiting for shutdown signal")
@@ -955,6 +1211,12 @@ func (l *localAPI) ServeHTTP(w http.ResponseWriter, r *http.Request) {
if r.Method != "GET" {
panic(fmt.Sprintf("unsupported method %q", r.Method))
}
case "/localapi/v0/usermetrics":
if r.Method != "GET" {
panic(fmt.Sprintf("unsupported method %q", r.Method))
}
w.Write([]byte("fake metrics"))
return
default:
panic(fmt.Sprintf("unsupported path %q", r.URL.Path))
}

View File

@@ -8,8 +8,6 @@ package main
import (
"fmt"
"io"
"log"
"net"
"net/http"
"tailscale.com/client/tailscale"
@@ -64,28 +62,18 @@ func (m *metrics) handleDebug(w http.ResponseWriter, r *http.Request) {
proxy(w, r, debugURL, http.DefaultClient.Do)
}
// runMetrics runs a simple HTTP metrics endpoint at <addr>/metrics, forwarding
// metricsHandlers registers a simple HTTP metrics handler at /metrics, forwarding
// requests to tailscaled's /localapi/v0/usermetrics API.
//
// In 1.78.x and 1.80.x, it also proxies debug paths to tailscaled's debug
// endpoint if configured to ease migration for a breaking change serving user
// metrics instead of debug metrics on the "metrics" port.
func runMetrics(addr string, m *metrics) {
ln, err := net.Listen("tcp", addr)
if err != nil {
log.Fatalf("error listening on the provided metrics endpoint address %q: %v", addr, err)
func metricsHandlers(mux *http.ServeMux, lc *tailscale.LocalClient, debugAddrPort string) {
m := &metrics{
lc: lc,
debugEndpoint: debugAddrPort,
}
mux := http.NewServeMux()
mux.HandleFunc("GET /metrics", m.handleMetrics)
mux.HandleFunc("/debug/", m.handleDebug) // TODO(tomhjp): Remove for 1.82.0 release.
log.Printf("Running metrics endpoint at %s/metrics", addr)
ms := &http.Server{Handler: mux}
go func() {
if err := ms.Serve(ln); err != nil {
log.Fatalf("failed running metrics endpoint: %v", err)
}
}()
}

View File

@@ -19,6 +19,8 @@ import (
"github.com/fsnotify/fsnotify"
"tailscale.com/client/tailscale"
"tailscale.com/ipn"
"tailscale.com/kube/kubetypes"
"tailscale.com/types/netmap"
)
// watchServeConfigChanges watches path for changes, and when it sees one, reads
@@ -26,21 +28,21 @@ import (
// applies it to lc. It exits when ctx is canceled. cdChanged is a channel that
// is written to when the certDomain changes, causing the serve config to be
// re-read and applied.
func watchServeConfigChanges(ctx context.Context, path string, cdChanged <-chan bool, certDomainAtomic *atomic.Pointer[string], lc *tailscale.LocalClient) {
func watchServeConfigChanges(ctx context.Context, path string, cdChanged <-chan bool, certDomainAtomic *atomic.Pointer[string], lc *tailscale.LocalClient, kc *kubeClient) {
if certDomainAtomic == nil {
panic("cd must not be nil")
panic("certDomainAtomic must not be nil")
}
var tickChan <-chan time.Time
var eventChan <-chan fsnotify.Event
if w, err := fsnotify.NewWatcher(); err != nil {
log.Printf("failed to create fsnotify watcher, timer-only mode: %v", err)
log.Printf("serve proxy: failed to create fsnotify watcher, timer-only mode: %v", err)
ticker := time.NewTicker(5 * time.Second)
defer ticker.Stop()
tickChan = ticker.C
} else {
defer w.Close()
if err := w.Add(filepath.Dir(path)); err != nil {
log.Fatalf("failed to add fsnotify watch: %v", err)
log.Fatalf("serve proxy: failed to add fsnotify watch: %v", err)
}
eventChan = w.Events
}
@@ -59,24 +61,62 @@ func watchServeConfigChanges(ctx context.Context, path string, cdChanged <-chan
// k8s handles these mounts. So just re-read the file and apply it
// if it's changed.
}
if certDomain == "" {
continue
}
sc, err := readServeConfig(path, certDomain)
if err != nil {
log.Fatalf("failed to read serve config: %v", err)
log.Fatalf("serve proxy: failed to read serve config: %v", err)
}
if prevServeConfig != nil && reflect.DeepEqual(sc, prevServeConfig) {
continue
}
log.Printf("Applying serve config")
if err := lc.SetServeConfig(ctx, sc); err != nil {
log.Fatalf("failed to set serve config: %v", err)
validateHTTPSServe(certDomain, sc)
if err := updateServeConfig(ctx, sc, certDomain, lc); err != nil {
log.Fatalf("serve proxy: error updating serve config: %v", err)
}
if kc != nil && kc.canPatch {
if err := kc.storeHTTPSEndpoint(ctx, certDomain); err != nil {
log.Fatalf("serve proxy: error storing HTTPS endpoint: %v", err)
}
}
prevServeConfig = sc
}
}
func certDomainFromNetmap(nm *netmap.NetworkMap) string {
if len(nm.DNS.CertDomains) == 0 {
return ""
}
return nm.DNS.CertDomains[0]
}
func updateServeConfig(ctx context.Context, sc *ipn.ServeConfig, certDomain string, lc *tailscale.LocalClient) error {
// TODO(irbekrm): This means that serve config that does not expose HTTPS endpoint will not be set for a tailnet
// that does not have HTTPS enabled. We probably want to fix this.
if certDomain == kubetypes.ValueNoHTTPS {
return nil
}
log.Printf("serve proxy: applying serve config")
return lc.SetServeConfig(ctx, sc)
}
func validateHTTPSServe(certDomain string, sc *ipn.ServeConfig) {
if certDomain != kubetypes.ValueNoHTTPS || !hasHTTPSEndpoint(sc) {
return
}
log.Printf(
`serve proxy: this node is configured as a proxy that exposes an HTTPS endpoint to tailnet,
(perhaps a Kubernetes operator Ingress proxy) but it is not able to issue TLS certs, so this will likely not work.
To make it work, ensure that HTTPS is enabled for your tailnet, see https://tailscale.com/kb/1153/enabling-https for more details.`)
}
func hasHTTPSEndpoint(cfg *ipn.ServeConfig) bool {
for _, tcpCfg := range cfg.TCP {
if tcpCfg.HTTPS {
return true
}
}
return false
}
// readServeConfig reads the ipn.ServeConfig from path, replacing
// ${TS_CERT_DOMAIN} with certDomain.
func readServeConfig(path, certDomain string) (*ipn.ServeConfig, error) {

View File

@@ -44,6 +44,7 @@ type settings struct {
DaemonExtraArgs string
ExtraArgs string
InKubernetes bool
State string
UserspaceMode bool
StateDir string
AcceptDNS *bool
@@ -67,18 +68,16 @@ type settings struct {
PodIP string
PodIPv4 string
PodIPv6 string
HealthCheckAddrPort string // TODO(tomhjp): use the local addr/port instead.
PodUID string
HealthCheckAddrPort string
LocalAddrPort string
MetricsEnabled bool
HealthCheckEnabled bool
DebugAddrPort string
EgressSvcsCfgPath string
}
func configFromEnv() (*settings, error) {
defaultLocalAddrPort := ""
if v, ok := os.LookupEnv("POD_IP"); ok && v != "" {
defaultLocalAddrPort = fmt.Sprintf("%s:9002", v)
}
cfg := &settings{
AuthKey: defaultEnvs([]string{"TS_AUTHKEY", "TS_AUTH_KEY"}, ""),
Hostname: defaultEnv("TS_HOSTNAME", ""),
@@ -91,6 +90,7 @@ func configFromEnv() (*settings, error) {
DaemonExtraArgs: defaultEnv("TS_TAILSCALED_EXTRA_ARGS", ""),
ExtraArgs: defaultEnv("TS_EXTRA_ARGS", ""),
InKubernetes: os.Getenv("KUBERNETES_SERVICE_HOST") != "",
State: defaultEnv("TS_STATE", ""),
UserspaceMode: defaultBool("TS_USERSPACE", true),
StateDir: defaultEnv("TS_STATE_DIR", ""),
AcceptDNS: defaultEnvBoolPointer("TS_ACCEPT_DNS"),
@@ -105,11 +105,26 @@ func configFromEnv() (*settings, error) {
PodIP: defaultEnv("POD_IP", ""),
EnableForwardingOptimizations: defaultBool("TS_EXPERIMENTAL_ENABLE_FORWARDING_OPTIMIZATIONS", false),
HealthCheckAddrPort: defaultEnv("TS_HEALTHCHECK_ADDR_PORT", ""),
LocalAddrPort: defaultEnv("TS_LOCAL_ADDR_PORT", defaultLocalAddrPort),
MetricsEnabled: defaultBool("TS_METRICS_ENABLED", false),
LocalAddrPort: defaultEnv("TS_LOCAL_ADDR_PORT", "[::]:9002"),
MetricsEnabled: defaultBool("TS_ENABLE_METRICS", false),
HealthCheckEnabled: defaultBool("TS_ENABLE_HEALTH_CHECK", false),
DebugAddrPort: defaultEnv("TS_DEBUG_ADDR_PORT", ""),
EgressSvcsCfgPath: defaultEnv("TS_EGRESS_SERVICES_CONFIG_PATH", ""),
PodUID: defaultEnv("POD_UID", ""),
}
if cfg.State == "" {
if cfg.InKubernetes && cfg.KubeSecret != "" {
cfg.State = "kube:" + cfg.KubeSecret
} else {
cfg.State = "mem:"
}
}
if !strings.HasPrefix(cfg.State, "mem:") && !strings.HasPrefix(cfg.State, "kube:") && !strings.HasPrefix(cfg.State, "ssm:") {
return nil, fmt.Errorf("invalid TS_STATE value %q; must start with 'mem:', 'kube:', or 'ssm:'", cfg.State)
}
podIPs, ok := os.LookupEnv("POD_IPS")
if ok {
ips := strings.Split(podIPs, ",")
@@ -135,6 +150,35 @@ func configFromEnv() (*settings, error) {
}
func (s *settings) validate() error {
// Validate TS_STATE if set
if s.State != "" {
if !strings.HasPrefix(s.State, "mem:") &&
!strings.HasPrefix(s.State, "kube:") &&
!strings.HasPrefix(s.State, "ssm:") {
return fmt.Errorf("invalid TS_STATE value %q; must start with 'mem:', 'kube:', or 'ssm:'", s.State)
}
if strings.HasPrefix(s.State, "kube:") && !s.InKubernetes {
return fmt.Errorf("TS_STATE specifies Kubernetes state but the runtime environment is not Kubernetes")
}
}
// Check legacy settings and ensure no conflicts if TS_STATE is set
if s.State != "" {
if s.KubeSecret != "" {
log.Printf("[warning] TS_STATE is set; ignoring legacy TS_KUBE_SECRET")
}
if s.StateDir != "" {
log.Printf("[warning] TS_STATE is set; ignoring legacy TS_STATE_DIR")
}
} else {
// Fallback to legacy checks if TS_STATE is not set
if s.KubeSecret != "" && !s.InKubernetes {
return fmt.Errorf("TS_KUBE_SECRET is set but the runtime environment is not Kubernetes")
}
}
if s.TailscaledConfigFilePath != "" {
dir, file := path.Split(s.TailscaledConfigFilePath)
if _, err := os.Stat(dir); err != nil {
@@ -181,11 +225,12 @@ func (s *settings) validate() error {
return errors.New("TS_EXPERIMENTAL_ENABLE_FORWARDING_OPTIMIZATIONS is not supported in userspace mode")
}
if s.HealthCheckAddrPort != "" {
log.Printf("[warning] TS_HEALTHCHECK_ADDR_PORT is deprecated and will be removed in 1.82.0. Please use TS_ENABLE_HEALTH_CHECK and optionally TS_LOCAL_ADDR_PORT instead.")
if _, err := netip.ParseAddrPort(s.HealthCheckAddrPort); err != nil {
return fmt.Errorf("error parsing TS_HEALTH_CHECK_ADDR_PORT value %q: %w", s.HealthCheckAddrPort, err)
return fmt.Errorf("error parsing TS_HEALTHCHECK_ADDR_PORT value %q: %w", s.HealthCheckAddrPort, err)
}
}
if s.LocalAddrPort != "" {
if s.localMetricsEnabled() || s.localHealthEnabled() {
if _, err := netip.ParseAddrPort(s.LocalAddrPort); err != nil {
return fmt.Errorf("error parsing TS_LOCAL_ADDR_PORT value %q: %w", s.LocalAddrPort, err)
}
@@ -195,13 +240,20 @@ func (s *settings) validate() error {
return fmt.Errorf("error parsing TS_DEBUG_ADDR_PORT value %q: %w", s.DebugAddrPort, err)
}
}
if s.HealthCheckEnabled && s.HealthCheckAddrPort != "" {
return errors.New("TS_HEALTHCHECK_ADDR_PORT is deprecated and will be removed in 1.82.0, use TS_ENABLE_HEALTH_CHECK and optionally TS_LOCAL_ADDR_PORT")
}
if s.EgressSvcsCfgPath != "" && !(s.InKubernetes && s.KubeSecret != "") {
return errors.New("TS_EGRESS_SERVICES_CONFIG_PATH is only supported for Tailscale running on Kubernetes")
}
return nil
}
// setupKube is responsible for doing any necessary configuration and checks to
// ensure that tailscale state storage and authentication mechanism will work on
// Kubernetes.
func (cfg *settings) setupKube(ctx context.Context) error {
func (cfg *settings) setupKube(ctx context.Context, kc *kubeClient) error {
if cfg.KubeSecret == "" {
return nil
}
@@ -210,6 +262,7 @@ func (cfg *settings) setupKube(ctx context.Context) error {
return fmt.Errorf("some Kubernetes permissions are missing, please check your RBAC configuration: %v", err)
}
cfg.KubernetesCanPatch = canPatch
kc.canPatch = canPatch
s, err := kc.GetSecret(ctx, cfg.KubeSecret)
if err != nil {
@@ -292,6 +345,14 @@ func hasKubeStateStore(cfg *settings) bool {
return cfg.InKubernetes && cfg.KubernetesCanPatch && cfg.KubeSecret != ""
}
func (cfg *settings) localMetricsEnabled() bool {
return cfg.LocalAddrPort != "" && cfg.MetricsEnabled
}
func (cfg *settings) localHealthEnabled() bool {
return cfg.LocalAddrPort != "" && cfg.HealthCheckEnabled
}
// defaultEnv returns the value of the given envvar name, or defVal if
// unset.
func defaultEnv(name, defVal string) string {

View File

@@ -62,17 +62,22 @@ func startTailscaled(ctx context.Context, cfg *settings) (*tailscale.LocalClient
// tailscaledArgs uses cfg to construct the argv for tailscaled.
func tailscaledArgs(cfg *settings) []string {
args := []string{"--socket=" + cfg.Socket}
switch {
case cfg.InKubernetes && cfg.KubeSecret != "":
args = append(args, "--state=kube:"+cfg.KubeSecret)
if cfg.StateDir == "" {
cfg.StateDir = "/tmp"
if cfg.State != "" {
args = append(args, "--state="+cfg.State)
} else {
// Fallback logic for legacy state configuration
switch {
case cfg.InKubernetes && cfg.KubeSecret != "":
args = append(args, "--state=kube:"+cfg.KubeSecret)
if cfg.StateDir == "" {
cfg.StateDir = "/tmp"
}
fallthrough
case cfg.StateDir != "":
args = append(args, "--statedir="+cfg.StateDir)
default:
args = append(args, "--state=mem:", "--statedir=/tmp")
}
fallthrough
case cfg.StateDir != "":
args = append(args, "--statedir="+cfg.StateDir)
default:
args = append(args, "--state=mem:", "--statedir=/tmp")
}
if cfg.UserspaceMode {

View File

@@ -8,6 +8,7 @@ import (
"crypto/x509"
"errors"
"fmt"
"net"
"net/http"
"path/filepath"
"regexp"
@@ -53,8 +54,9 @@ func certProviderByCertMode(mode, dir, hostname string) (certProvider, error) {
}
type manualCertManager struct {
cert *tls.Certificate
hostname string
cert *tls.Certificate
hostname string // hostname or IP address of server
noHostname bool // whether hostname is an IP address
}
// NewManualCertManager returns a cert provider which read certificate by given hostname on create.
@@ -74,7 +76,11 @@ func NewManualCertManager(certdir, hostname string) (certProvider, error) {
if err := x509Cert.VerifyHostname(hostname); err != nil {
return nil, fmt.Errorf("cert invalid for hostname %q: %w", hostname, err)
}
return &manualCertManager{cert: &cert, hostname: hostname}, nil
return &manualCertManager{
cert: &cert,
hostname: hostname,
noHostname: net.ParseIP(hostname) != nil,
}, nil
}
func (m *manualCertManager) TLSConfig() *tls.Config {
@@ -88,7 +94,7 @@ func (m *manualCertManager) TLSConfig() *tls.Config {
}
func (m *manualCertManager) getCertificate(hi *tls.ClientHelloInfo) (*tls.Certificate, error) {
if hi.ServerName != m.hostname {
if hi.ServerName != m.hostname && !m.noHostname {
return nil, fmt.Errorf("cert mismatch with hostname: %q", hi.ServerName)
}

97
cmd/derper/cert_test.go Normal file
View File

@@ -0,0 +1,97 @@
// Copyright (c) Tailscale Inc & AUTHORS
// SPDX-License-Identifier: BSD-3-Clause
package main
import (
"crypto/ecdsa"
"crypto/elliptic"
"crypto/rand"
"crypto/tls"
"crypto/x509"
"crypto/x509/pkix"
"encoding/pem"
"math/big"
"net"
"os"
"path/filepath"
"testing"
"time"
)
// Verify that in --certmode=manual mode, we can use a bare IP address
// as the --hostname and that GetCertificate will return it.
func TestCertIP(t *testing.T) {
dir := t.TempDir()
const hostname = "1.2.3.4"
priv, err := ecdsa.GenerateKey(elliptic.P224(), rand.Reader)
if err != nil {
t.Fatal(err)
}
serialNumberLimit := new(big.Int).Lsh(big.NewInt(1), 128)
serialNumber, err := rand.Int(rand.Reader, serialNumberLimit)
if err != nil {
t.Fatal(err)
}
ip := net.ParseIP(hostname)
if ip == nil {
t.Fatalf("invalid IP address %q", hostname)
}
template := &x509.Certificate{
SerialNumber: serialNumber,
Subject: pkix.Name{
Organization: []string{"Tailscale Test Corp"},
},
NotBefore: time.Now(),
NotAfter: time.Now().Add(30 * 24 * time.Hour),
KeyUsage: x509.KeyUsageDigitalSignature,
ExtKeyUsage: []x509.ExtKeyUsage{x509.ExtKeyUsageServerAuth},
BasicConstraintsValid: true,
IPAddresses: []net.IP{ip},
}
derBytes, err := x509.CreateCertificate(rand.Reader, template, template, &priv.PublicKey, priv)
if err != nil {
t.Fatal(err)
}
certOut, err := os.Create(filepath.Join(dir, hostname+".crt"))
if err != nil {
t.Fatal(err)
}
if err := pem.Encode(certOut, &pem.Block{Type: "CERTIFICATE", Bytes: derBytes}); err != nil {
t.Fatalf("Failed to write data to cert.pem: %v", err)
}
if err := certOut.Close(); err != nil {
t.Fatalf("Error closing cert.pem: %v", err)
}
keyOut, err := os.OpenFile(filepath.Join(dir, hostname+".key"), os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0600)
if err != nil {
t.Fatal(err)
}
privBytes, err := x509.MarshalPKCS8PrivateKey(priv)
if err != nil {
t.Fatalf("Unable to marshal private key: %v", err)
}
if err := pem.Encode(keyOut, &pem.Block{Type: "PRIVATE KEY", Bytes: privBytes}); err != nil {
t.Fatalf("Failed to write data to key.pem: %v", err)
}
if err := keyOut.Close(); err != nil {
t.Fatalf("Error closing key.pem: %v", err)
}
cp, err := certProviderByCertMode("manual", dir, hostname)
if err != nil {
t.Fatal(err)
}
back, err := cp.TLSConfig().GetCertificate(&tls.ClientHelloInfo{
ServerName: "", // no SNI
})
if err != nil {
t.Fatalf("GetCertificate: %v", err)
}
if back == nil {
t.Fatalf("GetCertificate returned nil")
}
}

View File

@@ -58,7 +58,7 @@ var (
configPath = flag.String("c", "", "config file path")
certMode = flag.String("certmode", "letsencrypt", "mode for getting a cert. possible options: manual, letsencrypt")
certDir = flag.String("certdir", tsweb.DefaultCertDir("derper-certs"), "directory to store LetsEncrypt certs, if addr's port is :443")
hostname = flag.String("hostname", "derp.tailscale.com", "LetsEncrypt host name, if addr's port is :443")
hostname = flag.String("hostname", "derp.tailscale.com", "LetsEncrypt host name, if addr's port is :443. When --certmode=manual, this can be an IP address to avoid SNI checks")
runSTUN = flag.Bool("stun", true, "whether to run a STUN server. It will bind to the same IP (if any) as the --addr flag value.")
runDERP = flag.Bool("derp", true, "whether to run a DERP server. The only reason to set this false is if you're decommissioning a server but want to keep its bootstrap DNS functionality still running.")

View File

@@ -6,7 +6,6 @@ package main
import (
"bytes"
"context"
"fmt"
"net/http"
"net/http/httptest"
"strings"
@@ -138,5 +137,4 @@ func TestTemplate(t *testing.T) {
if !strings.Contains(str, "Debug info") {
t.Error("Output is missing debug info")
}
fmt.Println(buf.String())
}

View File

@@ -46,7 +46,6 @@ func main() {
ClientID: clientID,
ClientSecret: clientSecret,
TokenURL: baseURL + "/api/v2/oauth/token",
Scopes: []string{"device"},
}
ctx := context.Background()

View File

@@ -58,8 +58,8 @@ func apply(cache *Cache, client *http.Client, tailnet, apiKey string) func(conte
}
if cache.PrevETag == "" {
log.Println("no previous etag found, assuming local file is correct and recording that")
cache.PrevETag = localEtag
log.Println("no previous etag found, assuming the latest control etag")
cache.PrevETag = controlEtag
}
log.Printf("control: %s", controlEtag)
@@ -105,8 +105,8 @@ func test(cache *Cache, client *http.Client, tailnet, apiKey string) func(contex
}
if cache.PrevETag == "" {
log.Println("no previous etag found, assuming local file is correct and recording that")
cache.PrevETag = localEtag
log.Println("no previous etag found, assuming the latest control etag")
cache.PrevETag = controlEtag
}
log.Printf("control: %s", controlEtag)
@@ -148,8 +148,8 @@ func getChecksums(cache *Cache, client *http.Client, tailnet, apiKey string) fun
}
if cache.PrevETag == "" {
log.Println("no previous etag found, assuming local file is correct and recording that")
cache.PrevETag = Shuck(localEtag)
log.Println("no previous etag found, assuming control etag")
cache.PrevETag = Shuck(controlEtag)
}
log.Printf("control: %s", controlEtag)

View File

@@ -10,6 +10,7 @@ import (
"fmt"
"net/netip"
"slices"
"strings"
"sync"
"time"
@@ -35,6 +36,7 @@ import (
const (
reasonConnectorCreationFailed = "ConnectorCreationFailed"
reasonConnectorCreating = "ConnectorCreating"
reasonConnectorCreated = "ConnectorCreated"
reasonConnectorInvalid = "ConnectorInvalid"
@@ -113,7 +115,7 @@ func (a *ConnectorReconciler) Reconcile(ctx context.Context, req reconcile.Reque
setStatus := func(cn *tsapi.Connector, _ tsapi.ConditionType, status metav1.ConditionStatus, reason, message string) (reconcile.Result, error) {
tsoperator.SetConnectorCondition(cn, tsapi.ConnectorReady, status, reason, message, cn.Generation, a.clock, logger)
var updateErr error
if !apiequality.Semantic.DeepEqual(oldCnStatus, cn.Status) {
if !apiequality.Semantic.DeepEqual(oldCnStatus, &cn.Status) {
// An error encountered here should get returned by the Reconcile function.
updateErr = a.Client.Status().Update(ctx, cn)
}
@@ -134,17 +136,24 @@ func (a *ConnectorReconciler) Reconcile(ctx context.Context, req reconcile.Reque
}
if err := a.validate(cn); err != nil {
logger.Errorf("error validating Connector spec: %w", err)
message := fmt.Sprintf(messageConnectorInvalid, err)
a.recorder.Eventf(cn, corev1.EventTypeWarning, reasonConnectorInvalid, message)
return setStatus(cn, tsapi.ConnectorReady, metav1.ConditionFalse, reasonConnectorInvalid, message)
}
if err = a.maybeProvisionConnector(ctx, logger, cn); err != nil {
logger.Errorf("error creating Connector resources: %w", err)
reason := reasonConnectorCreationFailed
message := fmt.Sprintf(messageConnectorCreationFailed, err)
a.recorder.Eventf(cn, corev1.EventTypeWarning, reasonConnectorCreationFailed, message)
return setStatus(cn, tsapi.ConnectorReady, metav1.ConditionFalse, reasonConnectorCreationFailed, message)
if strings.Contains(err.Error(), optimisticLockErrorMsg) {
reason = reasonConnectorCreating
message = fmt.Sprintf("optimistic lock error, retrying: %s", err)
err = nil
logger.Info(message)
} else {
a.recorder.Eventf(cn, corev1.EventTypeWarning, reason, message)
}
return setStatus(cn, tsapi.ConnectorReady, metav1.ConditionFalse, reason, message)
}
logger.Info("Connector resources synced")
@@ -189,6 +198,7 @@ func (a *ConnectorReconciler) maybeProvisionConnector(ctx context.Context, logge
isExitNode: cn.Spec.ExitNode,
},
ProxyClassName: proxyClass,
proxyType: proxyTypeConnector,
}
if cn.Spec.SubnetRouter != nil && len(cn.Spec.SubnetRouter.AdvertiseRoutes) > 0 {
@@ -233,27 +243,27 @@ func (a *ConnectorReconciler) maybeProvisionConnector(ctx context.Context, logge
return err
}
_, tsHost, ips, err := a.ssr.DeviceInfo(ctx, crl)
dev, err := a.ssr.DeviceInfo(ctx, crl, logger)
if err != nil {
return err
}
if tsHost == "" {
logger.Debugf("no Tailscale hostname known yet, waiting for connector pod to finish auth")
if dev == nil || dev.hostname == "" {
logger.Debugf("no Tailscale hostname known yet, waiting for Connector Pod to finish auth")
// No hostname yet. Wait for the connector pod to auth.
cn.Status.TailnetIPs = nil
cn.Status.Hostname = ""
return nil
}
cn.Status.TailnetIPs = ips
cn.Status.Hostname = tsHost
cn.Status.TailnetIPs = dev.ips
cn.Status.Hostname = dev.hostname
return nil
}
func (a *ConnectorReconciler) maybeCleanupConnector(ctx context.Context, logger *zap.SugaredLogger, cn *tsapi.Connector) (bool, error) {
if done, err := a.ssr.Cleanup(ctx, logger, childResourceLabels(cn.Name, a.tsnamespace, "connector")); err != nil {
if done, err := a.ssr.Cleanup(ctx, logger, childResourceLabels(cn.Name, a.tsnamespace, "connector"), proxyTypeConnector); err != nil {
return false, fmt.Errorf("failed to cleanup Connector resources: %w", err)
} else if !done {
logger.Debugf("Connector cleanup not done yet, waiting for next reconcile")

View File

@@ -378,7 +378,7 @@ tailscale.com/cmd/k8s-operator dependencies: (generated by github.com/tailscale/
k8s.io/api/storage/v1beta1 from k8s.io/client-go/applyconfigurations/storage/v1beta1+
k8s.io/api/storagemigration/v1alpha1 from k8s.io/client-go/applyconfigurations/storagemigration/v1alpha1+
k8s.io/apiextensions-apiserver/pkg/apis/apiextensions from k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1
💣 k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1 from sigs.k8s.io/controller-runtime/pkg/webhook/conversion
💣 k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1 from sigs.k8s.io/controller-runtime/pkg/webhook/conversion+
k8s.io/apimachinery/pkg/api/equality from k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1+
k8s.io/apimachinery/pkg/api/errors from k8s.io/apimachinery/pkg/util/managedfields/internal+
k8s.io/apimachinery/pkg/api/meta from k8s.io/apimachinery/pkg/api/validation+

View File

@@ -35,9 +35,13 @@ spec:
{{- toYaml . | nindent 8 }}
{{- end }}
volumes:
- name: oauth
secret:
secretName: operator-oauth
- name: oauth
{{- with .Values.oauthSecretVolume }}
{{- toYaml . | nindent 10 }}
{{- else }}
secret:
secretName: operator-oauth
{{- end }}
containers:
- name: operator
{{- with .Values.operatorConfig.securityContext }}

View File

@@ -6,6 +6,10 @@ kind: ServiceAccount
metadata:
name: operator
namespace: {{ .Release.Namespace }}
{{- with .Values.operatorConfig.serviceAccountAnnotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
@@ -30,6 +34,10 @@ rules:
- apiGroups: ["tailscale.com"]
resources: ["recorders", "recorders/status"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: ["apiextensions.k8s.io"]
resources: ["customresourcedefinitions"]
verbs: ["get", "list", "watch"]
resourceNames: ["servicemonitors.monitoring.coreos.com"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
@@ -65,6 +73,9 @@ rules:
- apiGroups: ["rbac.authorization.k8s.io"]
resources: ["roles", "rolebindings"]
verbs: ["get", "create", "patch", "update", "list", "watch"]
- apiGroups: ["monitoring.coreos.com"]
resources: ["servicemonitors"]
verbs: ["get", "list", "update", "create", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding

View File

@@ -3,11 +3,26 @@
# Operator oauth credentials. If set a Kubernetes Secret with the provided
# values will be created in the operator namespace. If unset a Secret named
# operator-oauth must be precreated.
# operator-oauth must be precreated or oauthSecretVolume needs to be adjusted.
# This block will be overridden by oauthSecretVolume, if set.
oauth: {}
# clientId: ""
# clientSecret: ""
# Secret volume.
# If set it defines the volume the oauth secrets will be mounted from.
# The volume needs to contain two files named `client_id` and `client_secret`.
# If unset the volume will reference the Secret named operator-oauth.
# This block will override the oauth block.
oauthSecretVolume: {}
# csi:
# driver: secrets-store.csi.k8s.io
# readOnly: true
# volumeAttributes:
# secretProviderClass: tailscale-oauth
#
## NAME is pre-defined!
# installCRDs determines whether tailscale.com CRDs should be installed as part
# of chart installation. We do not use Helm's CRD installation mechanism as that
# does not allow for upgrading CRDs.
@@ -40,6 +55,9 @@ operatorConfig:
podAnnotations: {}
podLabels: {}
serviceAccountAnnotations: {}
# eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/tailscale-operator-role
tolerations: []
affinity: {}

View File

@@ -74,6 +74,8 @@ spec:
description: |-
Setting enable to true will make the proxy serve Tailscale metrics
at <pod-ip>:9002/metrics.
A metrics Service named <proxy-statefulset>-metrics will also be created in the operator's namespace and will
serve the metrics at <service-ip>:9002/metrics.
In 1.78.x and 1.80.x, this field also serves as the default value for
.spec.statefulSet.pod.tailscaleContainer.debug.enable. From 1.82.0, both
@@ -81,6 +83,25 @@ spec:
Defaults to false.
type: boolean
serviceMonitor:
description: |-
Enable to create a Prometheus ServiceMonitor for scraping the proxy's Tailscale metrics.
The ServiceMonitor will select the metrics Service that gets created when metrics are enabled.
The ingested metrics for each Service monitor will have labels to identify the proxy:
ts_proxy_type: ingress_service|ingress_resource|connector|proxygroup
ts_proxy_parent_name: name of the parent resource (i.e name of the Connector, Tailscale Ingress, Tailscale Service or ProxyGroup)
ts_proxy_parent_namespace: namespace of the parent resource (if the parent resource is not cluster scoped)
job: ts_<proxy type>_[<parent namespace>]_<parent_name>
type: object
required:
- enable
properties:
enable:
description: If Enable is set to true, a Prometheus ServiceMonitor will be created. Enable can only be set to true if metrics are enabled.
type: boolean
x-kubernetes-validations:
- rule: '!(has(self.serviceMonitor) && self.serviceMonitor.enable && !self.enable)'
message: ServiceMonitor can only be enabled if metrics are enabled
statefulSet:
description: |-
Configuration parameters for the proxy's StatefulSet. Tailscale
@@ -1384,11 +1405,12 @@ spec:
securityContext:
description: |-
Container security context.
Security context specified here will override the security context by the operator.
By default the operator:
- sets 'privileged: true' for the init container
- set NET_ADMIN capability for tailscale container for proxies that
are created for Services or Connector.
Security context specified here will override the security context set by the operator.
By default the operator sets the Tailscale container and the Tailscale init container to privileged
for proxies created for Tailscale ingress and egress Service, Connector and ProxyGroup.
You can reduce the permissions of the Tailscale container to cap NET_ADMIN by
installing device plugin in your cluster and configuring the proxies tun device to be created
by the device plugin, see https://github.com/tailscale/tailscale/issues/10814#issuecomment-2479977752
https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context
type: object
properties:
@@ -1707,11 +1729,12 @@ spec:
securityContext:
description: |-
Container security context.
Security context specified here will override the security context by the operator.
By default the operator:
- sets 'privileged: true' for the init container
- set NET_ADMIN capability for tailscale container for proxies that
are created for Services or Connector.
Security context specified here will override the security context set by the operator.
By default the operator sets the Tailscale container and the Tailscale init container to privileged
for proxies created for Tailscale ingress and egress Service, Connector and ProxyGroup.
You can reduce the permissions of the Tailscale container to cap NET_ADMIN by
installing device plugin in your cluster and configuring the proxies tun device to be created
by the device plugin, see https://github.com/tailscale/tailscale/issues/10814#issuecomment-2479977752
https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context
type: object
properties:

View File

@@ -541,6 +541,8 @@ spec:
description: |-
Setting enable to true will make the proxy serve Tailscale metrics
at <pod-ip>:9002/metrics.
A metrics Service named <proxy-statefulset>-metrics will also be created in the operator's namespace and will
serve the metrics at <service-ip>:9002/metrics.
In 1.78.x and 1.80.x, this field also serves as the default value for
.spec.statefulSet.pod.tailscaleContainer.debug.enable. From 1.82.0, both
@@ -548,9 +550,28 @@ spec:
Defaults to false.
type: boolean
serviceMonitor:
description: |-
Enable to create a Prometheus ServiceMonitor for scraping the proxy's Tailscale metrics.
The ServiceMonitor will select the metrics Service that gets created when metrics are enabled.
The ingested metrics for each Service monitor will have labels to identify the proxy:
ts_proxy_type: ingress_service|ingress_resource|connector|proxygroup
ts_proxy_parent_name: name of the parent resource (i.e name of the Connector, Tailscale Ingress, Tailscale Service or ProxyGroup)
ts_proxy_parent_namespace: namespace of the parent resource (if the parent resource is not cluster scoped)
job: ts_<proxy type>_[<parent namespace>]_<parent_name>
properties:
enable:
description: If Enable is set to true, a Prometheus ServiceMonitor will be created. Enable can only be set to true if metrics are enabled.
type: boolean
required:
- enable
type: object
required:
- enable
type: object
x-kubernetes-validations:
- message: ServiceMonitor can only be enabled if metrics are enabled
rule: '!(has(self.serviceMonitor) && self.serviceMonitor.enable && !self.enable)'
statefulSet:
description: |-
Configuration parameters for the proxy's StatefulSet. Tailscale
@@ -1851,11 +1872,12 @@ spec:
securityContext:
description: |-
Container security context.
Security context specified here will override the security context by the operator.
By default the operator:
- sets 'privileged: true' for the init container
- set NET_ADMIN capability for tailscale container for proxies that
are created for Services or Connector.
Security context specified here will override the security context set by the operator.
By default the operator sets the Tailscale container and the Tailscale init container to privileged
for proxies created for Tailscale ingress and egress Service, Connector and ProxyGroup.
You can reduce the permissions of the Tailscale container to cap NET_ADMIN by
installing device plugin in your cluster and configuring the proxies tun device to be created
by the device plugin, see https://github.com/tailscale/tailscale/issues/10814#issuecomment-2479977752
https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context
properties:
allowPrivilegeEscalation:
@@ -2174,11 +2196,12 @@ spec:
securityContext:
description: |-
Container security context.
Security context specified here will override the security context by the operator.
By default the operator:
- sets 'privileged: true' for the init container
- set NET_ADMIN capability for tailscale container for proxies that
are created for Services or Connector.
Security context specified here will override the security context set by the operator.
By default the operator sets the Tailscale container and the Tailscale init container to privileged
for proxies created for Tailscale ingress and egress Service, Connector and ProxyGroup.
You can reduce the permissions of the Tailscale container to cap NET_ADMIN by
installing device plugin in your cluster and configuring the proxies tun device to be created
by the device plugin, see https://github.com/tailscale/tailscale/issues/10814#issuecomment-2479977752
https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context
properties:
allowPrivilegeEscalation:
@@ -4646,6 +4669,16 @@ rules:
- list
- watch
- update
- apiGroups:
- apiextensions.k8s.io
resourceNames:
- servicemonitors.monitoring.coreos.com
resources:
- customresourcedefinitions
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
@@ -4726,6 +4759,16 @@ rules:
- update
- list
- watch
- apiGroups:
- monitoring.coreos.com
resources:
- servicemonitors
verbs:
- get
- list
- update
- create
- delete
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role

View File

@@ -39,6 +39,4 @@ spec:
fieldRef:
fieldPath: metadata.uid
securityContext:
capabilities:
add:
- NET_ADMIN
privileged: true

View File

@@ -10,6 +10,7 @@ import (
"encoding/json"
"fmt"
"slices"
"strings"
"go.uber.org/zap"
corev1 "k8s.io/api/core/v1"
@@ -98,7 +99,15 @@ func (dnsRR *dnsRecordsReconciler) Reconcile(ctx context.Context, req reconcile.
return reconcile.Result{}, nil
}
return reconcile.Result{}, dnsRR.maybeProvision(ctx, headlessSvc, logger)
if err := dnsRR.maybeProvision(ctx, headlessSvc, logger); err != nil {
if strings.Contains(err.Error(), optimisticLockErrorMsg) {
logger.Infof("optimistic lock error, retrying: %s", err)
} else {
return reconcile.Result{}, err
}
}
return reconcile.Result{}, nil
}
// maybeProvision ensures that dnsrecords ConfigMap contains a record for the

View File

@@ -0,0 +1,108 @@
// Copyright (c) Tailscale Inc & AUTHORS
// SPDX-License-Identifier: BSD-3-Clause
package e2e
import (
"context"
"fmt"
"net/http"
"testing"
"time"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/util/wait"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/client/config"
kube "tailscale.com/k8s-operator"
"tailscale.com/tstest"
)
// See [TestMain] for test requirements.
func TestIngress(t *testing.T) {
if tsClient == nil {
t.Skip("TestIngress requires credentials for a tailscale client")
}
ctx := context.Background()
cfg := config.GetConfigOrDie()
cl, err := client.New(cfg, client.Options{})
if err != nil {
t.Fatal(err)
}
// Apply nginx
createAndCleanup(t, ctx, cl, &corev1.Pod{
ObjectMeta: metav1.ObjectMeta{
Name: "nginx",
Namespace: "default",
Labels: map[string]string{
"app.kubernetes.io/name": "nginx",
},
},
Spec: corev1.PodSpec{
Containers: []corev1.Container{
{
Name: "nginx",
Image: "nginx",
},
},
},
})
// Apply service to expose it as ingress
svc := &corev1.Service{
ObjectMeta: metav1.ObjectMeta{
Name: "test-ingress",
Namespace: "default",
Annotations: map[string]string{
"tailscale.com/expose": "true",
},
},
Spec: corev1.ServiceSpec{
Selector: map[string]string{
"app.kubernetes.io/name": "nginx",
},
Ports: []corev1.ServicePort{
{
Name: "http",
Protocol: "TCP",
Port: 80,
},
},
},
}
createAndCleanup(t, ctx, cl, svc)
// TODO: instead of timing out only when test times out, cancel context after 60s or so.
if err := wait.PollUntilContextCancel(ctx, time.Millisecond*100, true, func(ctx context.Context) (done bool, err error) {
maybeReadySvc := &corev1.Service{ObjectMeta: objectMeta("default", "test-ingress")}
if err := get(ctx, cl, maybeReadySvc); err != nil {
return false, err
}
isReady := kube.SvcIsReady(maybeReadySvc)
if isReady {
t.Log("Service is ready")
}
return isReady, nil
}); err != nil {
t.Fatalf("error waiting for the Service to become Ready: %v", err)
}
var resp *http.Response
if err := tstest.WaitFor(time.Second*60, func() error {
// TODO(tomhjp): Get the tailnet DNS name from the associated secret instead.
// If we are not the first tailnet node with the requested name, we'll get
// a -N suffix.
resp, err = tsClient.HTTPClient.Get(fmt.Sprintf("http://%s-%s:80", svc.Namespace, svc.Name))
if err != nil {
return err
}
return nil
}); err != nil {
t.Fatalf("error trying to reach service: %v", err)
}
if resp.StatusCode != http.StatusOK {
t.Fatalf("unexpected status: %v; response body s", resp.StatusCode)
}
}

View File

@@ -0,0 +1,194 @@
// Copyright (c) Tailscale Inc & AUTHORS
// SPDX-License-Identifier: BSD-3-Clause
package e2e
import (
"context"
"errors"
"fmt"
"log"
"os"
"slices"
"strings"
"testing"
"github.com/go-logr/zapr"
"github.com/tailscale/hujson"
"go.uber.org/zap/zapcore"
"golang.org/x/oauth2/clientcredentials"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"sigs.k8s.io/controller-runtime/pkg/client"
logf "sigs.k8s.io/controller-runtime/pkg/log"
kzap "sigs.k8s.io/controller-runtime/pkg/log/zap"
"tailscale.com/client/tailscale"
)
const (
e2eManagedComment = "// This is managed by the k8s-operator e2e tests"
)
var (
tsClient *tailscale.Client
testGrants = map[string]string{
"test-proxy": `{
"src": ["tag:e2e-test-proxy"],
"dst": ["tag:k8s-operator"],
"app": {
"tailscale.com/cap/kubernetes": [{
"impersonate": {
"groups": ["ts:e2e-test-proxy"],
},
}],
},
}`,
}
)
// This test suite is currently not run in CI.
// It requires some setup not handled by this code:
// - Kubernetes cluster with tailscale operator installed
// - Current kubeconfig context set to connect to that cluster (directly, no operator proxy)
// - Operator installed with --set apiServerProxyConfig.mode="true"
// - ACLs that define tag:e2e-test-proxy tag. TODO(tomhjp): Can maybe replace this prereq onwards with an API key
// - OAuth client ID and secret in TS_API_CLIENT_ID and TS_API_CLIENT_SECRET env
// - OAuth client must have auth_keys and policy_file write for tag:e2e-test-proxy tag
func TestMain(m *testing.M) {
code, err := runTests(m)
if err != nil {
log.Fatal(err)
}
os.Exit(code)
}
func runTests(m *testing.M) (int, error) {
zlog := kzap.NewRaw([]kzap.Opts{kzap.UseDevMode(true), kzap.Level(zapcore.DebugLevel)}...).Sugar()
logf.SetLogger(zapr.NewLogger(zlog.Desugar()))
tailscale.I_Acknowledge_This_API_Is_Unstable = true
if clientID := os.Getenv("TS_API_CLIENT_ID"); clientID != "" {
cleanup, err := setupClientAndACLs()
if err != nil {
return 0, err
}
defer func() {
err = errors.Join(err, cleanup())
}()
}
return m.Run(), nil
}
func setupClientAndACLs() (cleanup func() error, _ error) {
ctx := context.Background()
credentials := clientcredentials.Config{
ClientID: os.Getenv("TS_API_CLIENT_ID"),
ClientSecret: os.Getenv("TS_API_CLIENT_SECRET"),
TokenURL: "https://login.tailscale.com/api/v2/oauth/token",
Scopes: []string{"auth_keys", "policy_file"},
}
tsClient = tailscale.NewClient("-", nil)
tsClient.HTTPClient = credentials.Client(ctx)
if err := patchACLs(ctx, tsClient, func(acls *hujson.Value) {
for test, grant := range testGrants {
deleteTestGrants(test, acls)
addTestGrant(test, grant, acls)
}
}); err != nil {
return nil, err
}
return func() error {
return patchACLs(ctx, tsClient, func(acls *hujson.Value) {
for test := range testGrants {
deleteTestGrants(test, acls)
}
})
}, nil
}
func patchACLs(ctx context.Context, tsClient *tailscale.Client, patchFn func(*hujson.Value)) error {
acls, err := tsClient.ACLHuJSON(ctx)
if err != nil {
return err
}
hj, err := hujson.Parse([]byte(acls.ACL))
if err != nil {
return err
}
patchFn(&hj)
hj.Format()
acls.ACL = hj.String()
if _, err := tsClient.SetACLHuJSON(ctx, *acls, true); err != nil {
return err
}
return nil
}
func addTestGrant(test, grant string, acls *hujson.Value) error {
v, err := hujson.Parse([]byte(grant))
if err != nil {
return err
}
// Add the managed comment to the first line of the grant object contents.
v.Value.(*hujson.Object).Members[0].Name.BeforeExtra = hujson.Extra(fmt.Sprintf("%s: %s\n", e2eManagedComment, test))
if err := acls.Patch([]byte(fmt.Sprintf(`[{"op": "add", "path": "/grants/-", "value": %s}]`, v.String()))); err != nil {
return err
}
return nil
}
func deleteTestGrants(test string, acls *hujson.Value) error {
grants := acls.Find("/grants")
var patches []string
for i, g := range grants.Value.(*hujson.Array).Elements {
members := g.Value.(*hujson.Object).Members
if len(members) == 0 {
continue
}
comment := strings.TrimSpace(string(members[0].Name.BeforeExtra))
if name, found := strings.CutPrefix(comment, e2eManagedComment+": "); found && name == test {
patches = append(patches, fmt.Sprintf(`{"op": "remove", "path": "/grants/%d"}`, i))
}
}
// Remove in reverse order so we don't affect the found indices as we mutate.
slices.Reverse(patches)
if err := acls.Patch([]byte(fmt.Sprintf("[%s]", strings.Join(patches, ",")))); err != nil {
return err
}
return nil
}
func objectMeta(namespace, name string) metav1.ObjectMeta {
return metav1.ObjectMeta{
Namespace: namespace,
Name: name,
}
}
func createAndCleanup(t *testing.T, ctx context.Context, cl client.Client, obj client.Object) {
t.Helper()
if err := cl.Create(ctx, obj); err != nil {
t.Fatal(err)
}
t.Cleanup(func() {
if err := cl.Delete(ctx, obj); err != nil {
t.Errorf("error cleaning up %s %s/%s: %s", obj.GetObjectKind().GroupVersionKind(), obj.GetNamespace(), obj.GetName(), err)
}
})
}
func get(ctx context.Context, cl client.Client, obj client.Object) error {
return cl.Get(ctx, client.ObjectKeyFromObject(obj), obj)
}

View File

@@ -0,0 +1,156 @@
// Copyright (c) Tailscale Inc & AUTHORS
// SPDX-License-Identifier: BSD-3-Clause
package e2e
import (
"context"
"encoding/json"
"fmt"
"strings"
"testing"
"time"
corev1 "k8s.io/api/core/v1"
rbacv1 "k8s.io/api/rbac/v1"
apierrors "k8s.io/apimachinery/pkg/api/errors"
"k8s.io/client-go/rest"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/client/config"
"tailscale.com/client/tailscale"
"tailscale.com/tsnet"
"tailscale.com/tstest"
)
// See [TestMain] for test requirements.
func TestProxy(t *testing.T) {
if tsClient == nil {
t.Skip("TestProxy requires credentials for a tailscale client")
}
ctx := context.Background()
cfg := config.GetConfigOrDie()
cl, err := client.New(cfg, client.Options{})
if err != nil {
t.Fatal(err)
}
// Create role and role binding to allow a group we'll impersonate to do stuff.
createAndCleanup(t, ctx, cl, &rbacv1.Role{
ObjectMeta: objectMeta("tailscale", "read-secrets"),
Rules: []rbacv1.PolicyRule{{
APIGroups: []string{""},
Verbs: []string{"get"},
Resources: []string{"secrets"},
}},
})
createAndCleanup(t, ctx, cl, &rbacv1.RoleBinding{
ObjectMeta: objectMeta("tailscale", "read-secrets"),
Subjects: []rbacv1.Subject{{
Kind: "Group",
Name: "ts:e2e-test-proxy",
}},
RoleRef: rbacv1.RoleRef{
Kind: "Role",
Name: "read-secrets",
},
})
// Get operator host name from kube secret.
operatorSecret := corev1.Secret{
ObjectMeta: objectMeta("tailscale", "operator"),
}
if err := get(ctx, cl, &operatorSecret); err != nil {
t.Fatal(err)
}
// Connect to tailnet with test-specific tag so we can use the
// [testGrants] ACLs when connecting to the API server proxy
ts := tsnetServerWithTag(t, ctx, "tag:e2e-test-proxy")
proxyCfg := &rest.Config{
Host: fmt.Sprintf("https://%s:443", hostNameFromOperatorSecret(t, operatorSecret)),
Dial: ts.Dial,
}
proxyCl, err := client.New(proxyCfg, client.Options{})
if err != nil {
t.Fatal(err)
}
// Expect success.
allowedSecret := corev1.Secret{
ObjectMeta: objectMeta("tailscale", "operator"),
}
// Wait for up to a minute the first time we use the proxy, to give it time
// to provision the TLS certs.
if err := tstest.WaitFor(time.Second*60, func() error {
return get(ctx, proxyCl, &allowedSecret)
}); err != nil {
t.Fatal(err)
}
// Expect forbidden.
forbiddenSecret := corev1.Secret{
ObjectMeta: objectMeta("default", "operator"),
}
if err := get(ctx, proxyCl, &forbiddenSecret); err == nil || !apierrors.IsForbidden(err) {
t.Fatalf("expected forbidden error fetching secret from default namespace: %s", err)
}
}
func tsnetServerWithTag(t *testing.T, ctx context.Context, tag string) *tsnet.Server {
caps := tailscale.KeyCapabilities{
Devices: tailscale.KeyDeviceCapabilities{
Create: tailscale.KeyDeviceCreateCapabilities{
Reusable: false,
Preauthorized: true,
Ephemeral: true,
Tags: []string{tag},
},
},
}
authKey, authKeyMeta, err := tsClient.CreateKey(ctx, caps)
if err != nil {
t.Fatal(err)
}
t.Cleanup(func() {
if err := tsClient.DeleteKey(ctx, authKeyMeta.ID); err != nil {
t.Errorf("error deleting auth key: %s", err)
}
})
ts := &tsnet.Server{
Hostname: "test-proxy",
Ephemeral: true,
Dir: t.TempDir(),
AuthKey: authKey,
}
_, err = ts.Up(ctx)
if err != nil {
t.Fatal(err)
}
t.Cleanup(func() {
if err := ts.Close(); err != nil {
t.Errorf("error shutting down tsnet.Server: %s", err)
}
})
return ts
}
func hostNameFromOperatorSecret(t *testing.T, s corev1.Secret) string {
profiles := map[string]any{}
if err := json.Unmarshal(s.Data["_profiles"], &profiles); err != nil {
t.Fatal(err)
}
key, ok := strings.CutPrefix(string(s.Data["_current-profile"]), "profile-")
if !ok {
t.Fatal(string(s.Data["_current-profile"]))
}
profile, ok := profiles[key]
if !ok {
t.Fatal(profiles)
}
return ((profile.(map[string]any))["Name"]).(string)
}

View File

@@ -64,7 +64,7 @@ func (esrr *egressSvcsReadinessReconciler) Reconcile(ctx context.Context, req re
oldStatus := svc.Status.DeepCopy()
defer func() {
tsoperator.SetServiceCondition(svc, tsapi.EgressSvcReady, st, reason, msg, esrr.clock, l)
if !apiequality.Semantic.DeepEqual(oldStatus, svc.Status) {
if !apiequality.Semantic.DeepEqual(oldStatus, &svc.Status) {
err = errors.Join(err, esrr.Status().Update(ctx, svc))
}
}()

View File

@@ -51,12 +51,12 @@ const (
labelSvcType = "tailscale.com/svc-type" // ingress or egress
typeEgress = "egress"
// maxPorts is the maximum number of ports that can be exposed on a
// container. In practice this will be ports in range [3000 - 4000). The
// container. In practice this will be ports in range [10000 - 11000). The
// high range should make it easier to distinguish container ports from
// the tailnet target ports for debugging purposes (i.e when reading
// netfilter rules). The limit of 10000 is somewhat arbitrary, the
// netfilter rules). The limit of 1000 is somewhat arbitrary, the
// assumption is that this would not be hit in practice.
maxPorts = 10000
maxPorts = 1000
indexEgressProxyGroup = ".metadata.annotations.egress-proxy-group"
)
@@ -123,7 +123,7 @@ func (esr *egressSvcsReconciler) Reconcile(ctx context.Context, req reconcile.Re
oldStatus := svc.Status.DeepCopy()
defer func() {
if !apiequality.Semantic.DeepEqual(oldStatus, svc.Status) {
if !apiequality.Semantic.DeepEqual(oldStatus, &svc.Status) {
err = errors.Join(err, esr.Status().Update(ctx, svc))
}
}()
@@ -136,9 +136,8 @@ func (esr *egressSvcsReconciler) Reconcile(ctx context.Context, req reconcile.Re
}
if !slices.Contains(svc.Finalizers, FinalizerName) {
l.Infof("configuring tailnet service") // logged exactly once
svc.Finalizers = append(svc.Finalizers, FinalizerName)
if err := esr.Update(ctx, svc); err != nil {
if err := esr.updateSvcSpec(ctx, svc); err != nil {
err := fmt.Errorf("failed to add finalizer: %w", err)
r := svcConfiguredReason(svc, false, l)
tsoperator.SetServiceCondition(svc, tsapi.EgressSvcConfigured, metav1.ConditionFalse, r, err.Error(), esr.clock, l)
@@ -157,7 +156,15 @@ func (esr *egressSvcsReconciler) Reconcile(ctx context.Context, req reconcile.Re
return res, err
}
return res, esr.maybeProvision(ctx, svc, l)
if err := esr.maybeProvision(ctx, svc, l); err != nil {
if strings.Contains(err.Error(), optimisticLockErrorMsg) {
l.Infof("optimistic lock error, retrying: %s", err)
} else {
return reconcile.Result{}, err
}
}
return res, nil
}
func (esr *egressSvcsReconciler) maybeProvision(ctx context.Context, svc *corev1.Service, l *zap.SugaredLogger) (err error) {
@@ -198,7 +205,7 @@ func (esr *egressSvcsReconciler) maybeProvision(ctx context.Context, svc *corev1
if svc.Spec.ExternalName != clusterIPSvcFQDN {
l.Infof("Configuring ExternalName Service to point to ClusterIP Service %s", clusterIPSvcFQDN)
svc.Spec.ExternalName = clusterIPSvcFQDN
if err = esr.Update(ctx, svc); err != nil {
if err = esr.updateSvcSpec(ctx, svc); err != nil {
err = fmt.Errorf("error updating ExternalName Service: %w", err)
return err
}
@@ -222,6 +229,15 @@ func (esr *egressSvcsReconciler) provision(ctx context.Context, proxyGroupName s
found := false
for _, wantsPM := range svc.Spec.Ports {
if wantsPM.Port == pm.Port && strings.EqualFold(string(wantsPM.Protocol), string(pm.Protocol)) {
// We don't use the port name to distinguish this port internally, but Kubernetes
// require that, for Service ports with more than one name each port is uniquely named.
// So we can always pick the port name from the ExternalName Service as at this point we
// know that those are valid names because Kuberentes already validated it once. Note
// that users could have changed an unnamed port to a named port and might have changed
// port names- this should still work.
// https://kubernetes.io/docs/concepts/services-networking/service/#multi-port-services
// See also https://github.com/tailscale/tailscale/issues/13406#issuecomment-2507230388
clusterIPSvc.Spec.Ports[i].Name = wantsPM.Name
found = true
break
}
@@ -246,7 +262,7 @@ func (esr *egressSvcsReconciler) provision(ctx context.Context, proxyGroupName s
if !found {
// Calculate a free port to expose on container and add
// a new PortMap to the ClusterIP Service.
if usedPorts.Len() == maxPorts {
if usedPorts.Len() >= maxPorts {
// TODO(irbekrm): refactor to avoid extra reconciles here. Low priority as in practice,
// the limit should not be hit.
return nil, false, fmt.Errorf("unable to allocate additional ports on ProxyGroup %s, %d ports already used. Create another ProxyGroup or open an issue if you believe this is unexpected.", proxyGroupName, maxPorts)
@@ -540,13 +556,13 @@ func svcNameBase(s string) string {
}
}
// unusedPort returns a port in range [3000 - 4000). The caller must ensure that
// usedPorts does not contain all ports in range [3000 - 4000).
// unusedPort returns a port in range [10000 - 11000). The caller must ensure that
// usedPorts does not contain all ports in range [10000 - 11000).
func unusedPort(usedPorts sets.Set[int32]) int32 {
foundFreePort := false
var suggestPort int32
for !foundFreePort {
suggestPort = rand.Int32N(maxPorts) + 3000
suggestPort = rand.Int32N(maxPorts) + 10000
if !usedPorts.Has(suggestPort) {
foundFreePort = true
}
@@ -714,3 +730,13 @@ func epsPortsFromSvc(svc *corev1.Service) (ep []discoveryv1.EndpointPort) {
}
return ep
}
// updateSvcSpec ensures that the given Service's spec is updated in cluster, but the local Service object still retains
// the not-yet-applied status.
// TODO(irbekrm): once we do SSA for these patch updates, this will no longer be needed.
func (esr *egressSvcsReconciler) updateSvcSpec(ctx context.Context, svc *corev1.Service) error {
st := svc.Status.DeepCopy()
err := esr.Update(ctx, svc)
svc.Status = *st
return err
}

View File

@@ -105,28 +105,40 @@ func TestTailscaleEgressServices(t *testing.T) {
condition(tsapi.ProxyGroupReady, metav1.ConditionTrue, "", "", clock),
}
})
// Quirks of the fake client.
mustUpdateStatus(t, fc, "default", "test", func(svc *corev1.Service) {
svc.Status.Conditions = []metav1.Condition{}
expectReconciled(t, esr, "default", "test")
validateReadyService(t, fc, esr, svc, clock, zl, cm)
})
t.Run("service_retain_one_unnamed_port", func(t *testing.T) {
svc.Spec.Ports = []corev1.ServicePort{{Protocol: "TCP", Port: 80}}
mustUpdate(t, fc, "default", "test", func(s *corev1.Service) {
s.Spec.Ports = svc.Spec.Ports
})
expectReconciled(t, esr, "default", "test")
// Verify that a ClusterIP Service has been created.
name := findGenNameForEgressSvcResources(t, fc, svc)
expectEqual(t, fc, clusterIPSvc(name, svc), removeTargetPortsFromSvc)
clusterSvc := mustGetClusterIPSvc(t, fc, name)
// Verify that an EndpointSlice has been created.
expectEqual(t, fc, endpointSlice(name, svc, clusterSvc), nil)
// Verify that ConfigMap contains configuration for the new egress service.
mustHaveConfigForSvc(t, fc, svc, clusterSvc, cm)
r := svcConfiguredReason(svc, true, zl.Sugar())
// Verify that the user-created ExternalName Service has Configured set to true and ExternalName pointing to the
// CluterIP Service.
svc.Status.Conditions = []metav1.Condition{
condition(tsapi.EgressSvcConfigured, metav1.ConditionTrue, r, r, clock),
}
svc.ObjectMeta.Finalizers = []string{"tailscale.com/finalizer"}
svc.Spec.ExternalName = fmt.Sprintf("%s.operator-ns.svc.cluster.local", name)
expectEqual(t, fc, svc, nil)
validateReadyService(t, fc, esr, svc, clock, zl, cm)
})
t.Run("service_add_two_named_ports", func(t *testing.T) {
svc.Spec.Ports = []corev1.ServicePort{{Protocol: "TCP", Port: 80, Name: "http"}, {Protocol: "TCP", Port: 443, Name: "https"}}
mustUpdate(t, fc, "default", "test", func(s *corev1.Service) {
s.Spec.Ports = svc.Spec.Ports
})
expectReconciled(t, esr, "default", "test")
validateReadyService(t, fc, esr, svc, clock, zl, cm)
})
t.Run("service_add_udp_port", func(t *testing.T) {
svc.Spec.Ports = append(svc.Spec.Ports, corev1.ServicePort{Port: 53, Protocol: "UDP", Name: "dns"})
mustUpdate(t, fc, "default", "test", func(s *corev1.Service) {
s.Spec.Ports = svc.Spec.Ports
})
expectReconciled(t, esr, "default", "test")
validateReadyService(t, fc, esr, svc, clock, zl, cm)
})
t.Run("service_change_protocol", func(t *testing.T) {
svc.Spec.Ports = []corev1.ServicePort{{Protocol: "TCP", Port: 80, Name: "http"}, {Protocol: "TCP", Port: 443, Name: "https"}, {Port: 53, Protocol: "TCP", Name: "tcp_dns"}}
mustUpdate(t, fc, "default", "test", func(s *corev1.Service) {
s.Spec.Ports = svc.Spec.Ports
})
expectReconciled(t, esr, "default", "test")
validateReadyService(t, fc, esr, svc, clock, zl, cm)
})
t.Run("delete_external_name_service", func(t *testing.T) {
@@ -143,6 +155,29 @@ func TestTailscaleEgressServices(t *testing.T) {
})
}
func validateReadyService(t *testing.T, fc client.WithWatch, esr *egressSvcsReconciler, svc *corev1.Service, clock *tstest.Clock, zl *zap.Logger, cm *corev1.ConfigMap) {
expectReconciled(t, esr, "default", "test")
// Verify that a ClusterIP Service has been created.
name := findGenNameForEgressSvcResources(t, fc, svc)
expectEqual(t, fc, clusterIPSvc(name, svc), removeTargetPortsFromSvc)
clusterSvc := mustGetClusterIPSvc(t, fc, name)
// Verify that an EndpointSlice has been created.
expectEqual(t, fc, endpointSlice(name, svc, clusterSvc), nil)
// Verify that ConfigMap contains configuration for the new egress service.
mustHaveConfigForSvc(t, fc, svc, clusterSvc, cm)
r := svcConfiguredReason(svc, true, zl.Sugar())
// Verify that the user-created ExternalName Service has Configured set to true and ExternalName pointing to the
// CluterIP Service.
svc.Status.Conditions = []metav1.Condition{
condition(tsapi.EgressSvcValid, metav1.ConditionTrue, "EgressSvcValid", "EgressSvcValid", clock),
condition(tsapi.EgressSvcConfigured, metav1.ConditionTrue, r, r, clock),
}
svc.ObjectMeta.Finalizers = []string{"tailscale.com/finalizer"}
svc.Spec.ExternalName = fmt.Sprintf("%s.operator-ns.svc.cluster.local", name)
expectEqual(t, fc, svc, nil)
}
func condition(typ tsapi.ConditionType, st metav1.ConditionStatus, r, msg string, clock tstime.Clock) metav1.Condition {
return metav1.Condition{
Type: string(typ),

View File

@@ -76,7 +76,15 @@ func (a *IngressReconciler) Reconcile(ctx context.Context, req reconcile.Request
return reconcile.Result{}, a.maybeCleanup(ctx, logger, ing)
}
return reconcile.Result{}, a.maybeProvision(ctx, logger, ing)
if err := a.maybeProvision(ctx, logger, ing); err != nil {
if strings.Contains(err.Error(), optimisticLockErrorMsg) {
logger.Infof("optimistic lock error, retrying: %s", err)
} else {
return reconcile.Result{}, err
}
}
return reconcile.Result{}, nil
}
func (a *IngressReconciler) maybeCleanup(ctx context.Context, logger *zap.SugaredLogger, ing *networkingv1.Ingress) error {
@@ -90,7 +98,7 @@ func (a *IngressReconciler) maybeCleanup(ctx context.Context, logger *zap.Sugare
return nil
}
if done, err := a.ssr.Cleanup(ctx, logger, childResourceLabels(ing.Name, ing.Namespace, "ingress")); err != nil {
if done, err := a.ssr.Cleanup(ctx, logger, childResourceLabels(ing.Name, ing.Namespace, "ingress"), proxyTypeIngressResource); err != nil {
return fmt.Errorf("failed to cleanup: %w", err)
} else if !done {
logger.Debugf("cleanup not done yet, waiting for next reconcile")
@@ -268,6 +276,7 @@ func (a *IngressReconciler) maybeProvision(ctx context.Context, logger *zap.Suga
Tags: tags,
ChildResourceLabels: crl,
ProxyClassName: proxyClass,
proxyType: proxyTypeIngressResource,
}
if val := ing.GetAnnotations()[AnnotationExperimentalForwardClusterTrafficViaL7IngresProxy]; val == "true" {
@@ -278,12 +287,12 @@ func (a *IngressReconciler) maybeProvision(ctx context.Context, logger *zap.Suga
return fmt.Errorf("failed to provision: %w", err)
}
_, tsHost, _, err := a.ssr.DeviceInfo(ctx, crl)
dev, err := a.ssr.DeviceInfo(ctx, crl, logger)
if err != nil {
return fmt.Errorf("failed to get device ID: %w", err)
return fmt.Errorf("failed to retrieve Ingress HTTPS endpoint status: %w", err)
}
if tsHost == "" {
logger.Debugf("no Tailscale hostname known yet, waiting for proxy pod to finish auth")
if dev == nil || dev.ingressDNSName == "" {
logger.Debugf("no Ingress DNS name known yet, waiting for proxy Pod initialize and start serving Ingress")
// No hostname yet. Wait for the proxy pod to auth.
ing.Status.LoadBalancer.Ingress = nil
if err := a.Status().Update(ctx, ing); err != nil {
@@ -292,10 +301,10 @@ func (a *IngressReconciler) maybeProvision(ctx context.Context, logger *zap.Suga
return nil
}
logger.Debugf("setting ingress hostname to %q", tsHost)
logger.Debugf("setting Ingress hostname to %q", dev.ingressDNSName)
ing.Status.LoadBalancer.Ingress = []networkingv1.IngressLoadBalancerIngress{
{
Hostname: tsHost,
Hostname: dev.ingressDNSName,
Ports: []networkingv1.IngressPortStatus{
{
Protocol: "TCP",

View File

@@ -12,6 +12,7 @@ import (
appsv1 "k8s.io/api/apps/v1"
corev1 "k8s.io/api/core/v1"
networkingv1 "k8s.io/api/networking/v1"
apiextensionsv1 "k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
"sigs.k8s.io/controller-runtime/pkg/client/fake"
@@ -141,6 +142,154 @@ func TestTailscaleIngress(t *testing.T) {
expectMissing[corev1.Secret](t, fc, "operator-ns", fullName)
}
func TestTailscaleIngressHostname(t *testing.T) {
tsIngressClass := &networkingv1.IngressClass{ObjectMeta: metav1.ObjectMeta{Name: "tailscale"}, Spec: networkingv1.IngressClassSpec{Controller: "tailscale.com/ts-ingress"}}
fc := fake.NewFakeClient(tsIngressClass)
ft := &fakeTSClient{}
fakeTsnetServer := &fakeTSNetServer{certDomains: []string{"foo.com"}}
zl, err := zap.NewDevelopment()
if err != nil {
t.Fatal(err)
}
ingR := &IngressReconciler{
Client: fc,
ssr: &tailscaleSTSReconciler{
Client: fc,
tsClient: ft,
tsnetServer: fakeTsnetServer,
defaultTags: []string{"tag:k8s"},
operatorNamespace: "operator-ns",
proxyImage: "tailscale/tailscale",
},
logger: zl.Sugar(),
}
// 1. Resources get created for regular Ingress
ing := &networkingv1.Ingress{
TypeMeta: metav1.TypeMeta{Kind: "Ingress", APIVersion: "networking.k8s.io/v1"},
ObjectMeta: metav1.ObjectMeta{
Name: "test",
Namespace: "default",
// The apiserver is supposed to set the UID, but the fake client
// doesn't. So, set it explicitly because other code later depends
// on it being set.
UID: types.UID("1234-UID"),
},
Spec: networkingv1.IngressSpec{
IngressClassName: ptr.To("tailscale"),
DefaultBackend: &networkingv1.IngressBackend{
Service: &networkingv1.IngressServiceBackend{
Name: "test",
Port: networkingv1.ServiceBackendPort{
Number: 8080,
},
},
},
TLS: []networkingv1.IngressTLS{
{Hosts: []string{"default-test"}},
},
},
}
mustCreate(t, fc, ing)
mustCreate(t, fc, &corev1.Service{
ObjectMeta: metav1.ObjectMeta{
Name: "test",
Namespace: "default",
},
Spec: corev1.ServiceSpec{
ClusterIP: "1.2.3.4",
Ports: []corev1.ServicePort{{
Port: 8080,
Name: "http"},
},
},
})
expectReconciled(t, ingR, "default", "test")
fullName, shortName := findGenName(t, fc, "default", "test", "ingress")
mustCreate(t, fc, &corev1.Pod{
ObjectMeta: metav1.ObjectMeta{
Name: fullName,
Namespace: "operator-ns",
UID: "test-uid",
},
})
opts := configOpts{
stsName: shortName,
secretName: fullName,
namespace: "default",
parentType: "ingress",
hostname: "default-test",
app: kubetypes.AppIngressResource,
}
serveConfig := &ipn.ServeConfig{
TCP: map[uint16]*ipn.TCPPortHandler{443: {HTTPS: true}},
Web: map[ipn.HostPort]*ipn.WebServerConfig{"${TS_CERT_DOMAIN}:443": {Handlers: map[string]*ipn.HTTPHandler{"/": {Proxy: "http://1.2.3.4:8080/"}}}},
}
opts.serveConfig = serveConfig
expectEqual(t, fc, expectedSecret(t, fc, opts), nil)
expectEqual(t, fc, expectedHeadlessService(shortName, "ingress"), nil)
expectEqual(t, fc, expectedSTSUserspace(t, fc, opts), removeHashAnnotation)
// 2. Ingress proxy with capability version >= 110 does not have an HTTPS endpoint set
mustUpdate(t, fc, "operator-ns", opts.secretName, func(secret *corev1.Secret) {
mak.Set(&secret.Data, "device_id", []byte("1234"))
mak.Set(&secret.Data, "tailscale_capver", []byte("110"))
mak.Set(&secret.Data, "pod_uid", []byte("test-uid"))
mak.Set(&secret.Data, "device_fqdn", []byte("foo.tailnetxyz.ts.net"))
})
expectReconciled(t, ingR, "default", "test")
ing.Finalizers = append(ing.Finalizers, "tailscale.com/finalizer")
expectEqual(t, fc, ing, nil)
// 3. Ingress proxy with capability version >= 110 advertises HTTPS endpoint
mustUpdate(t, fc, "operator-ns", opts.secretName, func(secret *corev1.Secret) {
mak.Set(&secret.Data, "device_id", []byte("1234"))
mak.Set(&secret.Data, "tailscale_capver", []byte("110"))
mak.Set(&secret.Data, "pod_uid", []byte("test-uid"))
mak.Set(&secret.Data, "device_fqdn", []byte("foo.tailnetxyz.ts.net"))
mak.Set(&secret.Data, "https_endpoint", []byte("foo.tailnetxyz.ts.net"))
})
expectReconciled(t, ingR, "default", "test")
ing.Status.LoadBalancer = networkingv1.IngressLoadBalancerStatus{
Ingress: []networkingv1.IngressLoadBalancerIngress{
{Hostname: "foo.tailnetxyz.ts.net", Ports: []networkingv1.IngressPortStatus{{Port: 443, Protocol: "TCP"}}},
},
}
expectEqual(t, fc, ing, nil)
// 4. Ingress proxy with capability version >= 110 does not have an HTTPS endpoint ready
mustUpdate(t, fc, "operator-ns", opts.secretName, func(secret *corev1.Secret) {
mak.Set(&secret.Data, "device_id", []byte("1234"))
mak.Set(&secret.Data, "tailscale_capver", []byte("110"))
mak.Set(&secret.Data, "pod_uid", []byte("test-uid"))
mak.Set(&secret.Data, "device_fqdn", []byte("foo.tailnetxyz.ts.net"))
mak.Set(&secret.Data, "https_endpoint", []byte("no-https"))
})
expectReconciled(t, ingR, "default", "test")
ing.Status.LoadBalancer.Ingress = nil
expectEqual(t, fc, ing, nil)
// 5. Ingress proxy's state has https_endpoints set, but its capver is not matching Pod UID (downgrade)
mustUpdate(t, fc, "operator-ns", opts.secretName, func(secret *corev1.Secret) {
mak.Set(&secret.Data, "device_id", []byte("1234"))
mak.Set(&secret.Data, "tailscale_capver", []byte("110"))
mak.Set(&secret.Data, "pod_uid", []byte("not-the-right-uid"))
mak.Set(&secret.Data, "device_fqdn", []byte("foo.tailnetxyz.ts.net"))
mak.Set(&secret.Data, "https_endpoint", []byte("bar.tailnetxyz.ts.net"))
})
ing.Status.LoadBalancer = networkingv1.IngressLoadBalancerStatus{
Ingress: []networkingv1.IngressLoadBalancerIngress{
{Hostname: "foo.tailnetxyz.ts.net", Ports: []networkingv1.IngressPortStatus{{Port: 443, Protocol: "TCP"}}},
},
}
expectReconciled(t, ingR, "default", "test")
expectEqual(t, fc, ing, nil)
}
func TestTailscaleIngressWithProxyClass(t *testing.T) {
// Setup
pc := &tsapi.ProxyClass{
@@ -271,3 +420,124 @@ func TestTailscaleIngressWithProxyClass(t *testing.T) {
opts.proxyClass = ""
expectEqual(t, fc, expectedSTSUserspace(t, fc, opts), removeHashAnnotation)
}
func TestTailscaleIngressWithServiceMonitor(t *testing.T) {
pc := &tsapi.ProxyClass{
ObjectMeta: metav1.ObjectMeta{Name: "metrics", Generation: 1},
Spec: tsapi.ProxyClassSpec{
Metrics: &tsapi.Metrics{
Enable: true,
ServiceMonitor: &tsapi.ServiceMonitor{Enable: true},
},
},
Status: tsapi.ProxyClassStatus{
Conditions: []metav1.Condition{{
Status: metav1.ConditionTrue,
Type: string(tsapi.ProxyClassReady),
ObservedGeneration: 1,
}}},
}
crd := &apiextensionsv1.CustomResourceDefinition{ObjectMeta: metav1.ObjectMeta{Name: serviceMonitorCRD}}
tsIngressClass := &networkingv1.IngressClass{ObjectMeta: metav1.ObjectMeta{Name: "tailscale"}, Spec: networkingv1.IngressClassSpec{Controller: "tailscale.com/ts-ingress"}}
fc := fake.NewClientBuilder().
WithScheme(tsapi.GlobalScheme).
WithObjects(pc, tsIngressClass).
WithStatusSubresource(pc).
Build()
ft := &fakeTSClient{}
fakeTsnetServer := &fakeTSNetServer{certDomains: []string{"foo.com"}}
zl, err := zap.NewDevelopment()
if err != nil {
t.Fatal(err)
}
ingR := &IngressReconciler{
Client: fc,
ssr: &tailscaleSTSReconciler{
Client: fc,
tsClient: ft,
tsnetServer: fakeTsnetServer,
defaultTags: []string{"tag:k8s"},
operatorNamespace: "operator-ns",
proxyImage: "tailscale/tailscale",
},
logger: zl.Sugar(),
}
// 1. Enable metrics- expect metrics Service to be created
ing := &networkingv1.Ingress{
TypeMeta: metav1.TypeMeta{Kind: "Ingress", APIVersion: "networking.k8s.io/v1"},
ObjectMeta: metav1.ObjectMeta{
Name: "test",
Namespace: "default",
// The apiserver is supposed to set the UID, but the fake client
// doesn't. So, set it explicitly because other code later depends
// on it being set.
UID: types.UID("1234-UID"),
Labels: map[string]string{
"tailscale.com/proxy-class": "metrics",
},
},
Spec: networkingv1.IngressSpec{
IngressClassName: ptr.To("tailscale"),
DefaultBackend: &networkingv1.IngressBackend{
Service: &networkingv1.IngressServiceBackend{
Name: "test",
Port: networkingv1.ServiceBackendPort{
Number: 8080,
},
},
},
TLS: []networkingv1.IngressTLS{
{Hosts: []string{"default-test"}},
},
},
}
mustCreate(t, fc, ing)
mustCreate(t, fc, &corev1.Service{
ObjectMeta: metav1.ObjectMeta{
Name: "test",
Namespace: "default",
},
Spec: corev1.ServiceSpec{
ClusterIP: "1.2.3.4",
Ports: []corev1.ServicePort{{
Port: 8080,
Name: "http"},
},
},
})
expectReconciled(t, ingR, "default", "test")
fullName, shortName := findGenName(t, fc, "default", "test", "ingress")
opts := configOpts{
stsName: shortName,
secretName: fullName,
namespace: "default",
tailscaleNamespace: "operator-ns",
parentType: "ingress",
hostname: "default-test",
app: kubetypes.AppIngressResource,
enableMetrics: true,
namespaced: true,
proxyType: proxyTypeIngressResource,
}
serveConfig := &ipn.ServeConfig{
TCP: map[uint16]*ipn.TCPPortHandler{443: {HTTPS: true}},
Web: map[ipn.HostPort]*ipn.WebServerConfig{"${TS_CERT_DOMAIN}:443": {Handlers: map[string]*ipn.HTTPHandler{"/": {Proxy: "http://1.2.3.4:8080/"}}}},
}
opts.serveConfig = serveConfig
expectEqual(t, fc, expectedSecret(t, fc, opts), nil)
expectEqual(t, fc, expectedHeadlessService(shortName, "ingress"), nil)
expectEqual(t, fc, expectedMetricsService(opts), nil)
expectEqual(t, fc, expectedSTSUserspace(t, fc, opts), removeHashAnnotation)
// 2. Enable ServiceMonitor - should not error when there is no ServiceMonitor CRD in cluster
mustUpdate(t, fc, "", "metrics", func(pc *tsapi.ProxyClass) {
pc.Spec.Metrics.ServiceMonitor = &tsapi.ServiceMonitor{Enable: true}
})
expectReconciled(t, ingR, "default", "test")
// 3. Create ServiceMonitor CRD and reconcile- ServiceMonitor should get created
mustCreate(t, fc, crd)
expectReconciled(t, ingR, "default", "test")
expectEqualUnstructured(t, fc, expectedServiceMonitor(t, opts))
}

View File

@@ -0,0 +1,272 @@
// Copyright (c) Tailscale Inc & AUTHORS
// SPDX-License-Identifier: BSD-3-Clause
//go:build !plan9
package main
import (
"context"
"fmt"
"go.uber.org/zap"
corev1 "k8s.io/api/core/v1"
apierrors "k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/apis/meta/v1/unstructured"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/types"
"sigs.k8s.io/controller-runtime/pkg/client"
tsapi "tailscale.com/k8s-operator/apis/v1alpha1"
)
const (
labelMetricsTarget = "tailscale.com/metrics-target"
// These labels get transferred from the metrics Service to the ingested Prometheus metrics.
labelPromProxyType = "ts_proxy_type"
labelPromProxyParentName = "ts_proxy_parent_name"
labelPromProxyParentNamespace = "ts_proxy_parent_namespace"
labelPromJob = "ts_prom_job"
serviceMonitorCRD = "servicemonitors.monitoring.coreos.com"
)
// ServiceMonitor contains a subset of fields of servicemonitors.monitoring.coreos.com Custom Resource Definition.
// Duplicating it here allows us to avoid importing prometheus-operator library.
// https://github.com/prometheus-operator/prometheus-operator/blob/bb4514e0d5d69f20270e29cfd4ad39b87865ccdf/pkg/apis/monitoring/v1/servicemonitor_types.go#L40
type ServiceMonitor struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata"`
Spec ServiceMonitorSpec `json:"spec"`
}
// https://github.com/prometheus-operator/prometheus-operator/blob/bb4514e0d5d69f20270e29cfd4ad39b87865ccdf/pkg/apis/monitoring/v1/servicemonitor_types.go#L55
type ServiceMonitorSpec struct {
// Endpoints defines the endpoints to be scraped on the selected Service(s).
// https://github.com/prometheus-operator/prometheus-operator/blob/bb4514e0d5d69f20270e29cfd4ad39b87865ccdf/pkg/apis/monitoring/v1/servicemonitor_types.go#L82
Endpoints []ServiceMonitorEndpoint `json:"endpoints"`
// JobLabel is the label on the Service whose value will become the value of the Prometheus job label for the metrics ingested via this ServiceMonitor.
// https://github.com/prometheus-operator/prometheus-operator/blob/bb4514e0d5d69f20270e29cfd4ad39b87865ccdf/pkg/apis/monitoring/v1/servicemonitor_types.go#L66
JobLabel string `json:"jobLabel"`
// NamespaceSelector selects the namespace of Service(s) that this ServiceMonitor allows to scrape.
// https://github.com/prometheus-operator/prometheus-operator/blob/bb4514e0d5d69f20270e29cfd4ad39b87865ccdf/pkg/apis/monitoring/v1/servicemonitor_types.go#L88
NamespaceSelector ServiceMonitorNamespaceSelector `json:"namespaceSelector,omitempty"`
// Selector is the label selector for Service(s) that this ServiceMonitor allows to scrape.
// https://github.com/prometheus-operator/prometheus-operator/blob/bb4514e0d5d69f20270e29cfd4ad39b87865ccdf/pkg/apis/monitoring/v1/servicemonitor_types.go#L85
Selector metav1.LabelSelector `json:"selector"`
// TargetLabels are labels on the selected Service that should be applied as Prometheus labels to the ingested metrics.
// https://github.com/prometheus-operator/prometheus-operator/blob/bb4514e0d5d69f20270e29cfd4ad39b87865ccdf/pkg/apis/monitoring/v1/servicemonitor_types.go#L72
TargetLabels []string `json:"targetLabels"`
}
// ServiceMonitorNamespaceSelector selects namespaces in which Prometheus operator will attempt to find Services for
// this ServiceMonitor.
// https://github.com/prometheus-operator/prometheus-operator/blob/bb4514e0d5d69f20270e29cfd4ad39b87865ccdf/pkg/apis/monitoring/v1/servicemonitor_types.go#L88
type ServiceMonitorNamespaceSelector struct {
MatchNames []string `json:"matchNames,omitempty"`
}
// ServiceMonitorEndpoint defines an endpoint of Service to scrape. We only define port here. Prometheus by default
// scrapes /metrics path, which is what we want.
type ServiceMonitorEndpoint struct {
// Port is the name of the Service port that Prometheus will scrape.
Port string `json:"port,omitempty"`
}
func reconcileMetricsResources(ctx context.Context, logger *zap.SugaredLogger, opts *metricsOpts, pc *tsapi.ProxyClass, cl client.Client) error {
if opts.proxyType == proxyTypeEgress {
// Metrics are currently not being enabled for standalone egress proxies.
return nil
}
if pc == nil || pc.Spec.Metrics == nil || !pc.Spec.Metrics.Enable {
return maybeCleanupMetricsResources(ctx, opts, cl)
}
metricsSvc := &corev1.Service{
ObjectMeta: metav1.ObjectMeta{
Name: metricsResourceName(opts.proxyStsName),
Namespace: opts.tsNamespace,
Labels: metricsResourceLabels(opts),
},
Spec: corev1.ServiceSpec{
Selector: opts.proxyLabels,
Type: corev1.ServiceTypeClusterIP,
Ports: []corev1.ServicePort{{Protocol: "TCP", Port: 9002, Name: "metrics"}},
},
}
var err error
metricsSvc, err = createOrUpdate(ctx, cl, opts.tsNamespace, metricsSvc, func(svc *corev1.Service) {
svc.Spec.Ports = metricsSvc.Spec.Ports
svc.Spec.Selector = metricsSvc.Spec.Selector
})
if err != nil {
return fmt.Errorf("error ensuring metrics Service: %w", err)
}
crdExists, err := hasServiceMonitorCRD(ctx, cl)
if err != nil {
return fmt.Errorf("error verifying that %q CRD exists: %w", serviceMonitorCRD, err)
}
if !crdExists {
return nil
}
if pc.Spec.Metrics.ServiceMonitor == nil || !pc.Spec.Metrics.ServiceMonitor.Enable {
return maybeCleanupServiceMonitor(ctx, cl, opts.proxyStsName, opts.tsNamespace)
}
logger.Info("ensuring ServiceMonitor for metrics Service %s/%s", metricsSvc.Namespace, metricsSvc.Name)
svcMonitor, err := newServiceMonitor(metricsSvc)
if err != nil {
return fmt.Errorf("error creating ServiceMonitor: %w", err)
}
// We don't use createOrUpdate here because that does not work with unstructured types. We also do not update
// the ServiceMonitor because it is not expected that any of its fields would change. Currently this is good
// enough, but in future we might want to add logic to create-or-update unstructured types.
err = cl.Get(ctx, client.ObjectKeyFromObject(metricsSvc), svcMonitor.DeepCopy())
if apierrors.IsNotFound(err) {
if err := cl.Create(ctx, svcMonitor); err != nil {
return fmt.Errorf("error creating ServiceMonitor: %w", err)
}
return nil
}
if err != nil {
return fmt.Errorf("error getting ServiceMonitor: %w", err)
}
return nil
}
// maybeCleanupMetricsResources ensures that any metrics resources created for a proxy are deleted. Only metrics Service
// gets deleted explicitly because the ServiceMonitor has Service's owner reference, so gets garbage collected
// automatically.
func maybeCleanupMetricsResources(ctx context.Context, opts *metricsOpts, cl client.Client) error {
sel := metricsSvcSelector(opts.proxyLabels, opts.proxyType)
return cl.DeleteAllOf(ctx, &corev1.Service{}, client.InNamespace(opts.tsNamespace), client.MatchingLabels(sel))
}
// maybeCleanupServiceMonitor cleans up any ServiceMonitor created for the named proxy StatefulSet.
func maybeCleanupServiceMonitor(ctx context.Context, cl client.Client, stsName, ns string) error {
smName := metricsResourceName(stsName)
sm := serviceMonitorTemplate(smName, ns)
u, err := serviceMonitorToUnstructured(sm)
if err != nil {
return fmt.Errorf("error building ServiceMonitor: %w", err)
}
err = cl.Get(ctx, types.NamespacedName{Name: smName, Namespace: ns}, u)
if apierrors.IsNotFound(err) {
return nil // nothing to do
}
if err != nil {
return fmt.Errorf("error verifying if ServiceMonitor %s/%s exists: %w", ns, stsName, err)
}
return cl.Delete(ctx, u)
}
// newServiceMonitor takes a metrics Service created for a proxy and constructs and returns a ServiceMonitor for that
// proxy that can be applied to the kube API server.
// The ServiceMonitor is returned as Unstructured type - this allows us to avoid importing prometheus-operator API server client/schema.
func newServiceMonitor(metricsSvc *corev1.Service) (*unstructured.Unstructured, error) {
sm := serviceMonitorTemplate(metricsSvc.Name, metricsSvc.Namespace)
sm.ObjectMeta.Labels = metricsSvc.Labels
sm.ObjectMeta.OwnerReferences = []metav1.OwnerReference{*metav1.NewControllerRef(metricsSvc, corev1.SchemeGroupVersion.WithKind("Service"))}
sm.Spec = ServiceMonitorSpec{
Selector: metav1.LabelSelector{MatchLabels: metricsSvc.Labels},
Endpoints: []ServiceMonitorEndpoint{{
Port: "metrics",
}},
NamespaceSelector: ServiceMonitorNamespaceSelector{
MatchNames: []string{metricsSvc.Namespace},
},
JobLabel: labelPromJob,
TargetLabels: []string{
labelPromProxyParentName,
labelPromProxyParentNamespace,
labelPromProxyType,
},
}
return serviceMonitorToUnstructured(sm)
}
// serviceMonitorToUnstructured takes a ServiceMonitor and converts it to Unstructured type that can be used by the c/r
// client in Kubernetes API server calls.
func serviceMonitorToUnstructured(sm *ServiceMonitor) (*unstructured.Unstructured, error) {
contents, err := runtime.DefaultUnstructuredConverter.ToUnstructured(sm)
if err != nil {
return nil, fmt.Errorf("error converting ServiceMonitor to Unstructured: %w", err)
}
u := &unstructured.Unstructured{}
u.SetUnstructuredContent(contents)
u.SetGroupVersionKind(sm.GroupVersionKind())
return u, nil
}
// metricsResourceName returns name for metrics Service and ServiceMonitor for a proxy StatefulSet.
func metricsResourceName(stsName string) string {
// Maximum length of StatefulSet name if 52 chars, so this is fine.
return fmt.Sprintf("%s-metrics", stsName)
}
// metricsResourceLabels constructs labels that will be applied to metrics Service and metrics ServiceMonitor for a
// proxy.
func metricsResourceLabels(opts *metricsOpts) map[string]string {
lbls := map[string]string{
LabelManaged: "true",
labelMetricsTarget: opts.proxyStsName,
labelPromProxyType: opts.proxyType,
labelPromProxyParentName: opts.proxyLabels[LabelParentName],
}
// Include namespace label for proxies created for a namespaced type.
if isNamespacedProxyType(opts.proxyType) {
lbls[labelPromProxyParentNamespace] = opts.proxyLabels[LabelParentNamespace]
}
lbls[labelPromJob] = promJobName(opts)
return lbls
}
// promJobName constructs the value of the Prometheus job label that will apply to all metrics for a ServiceMonitor.
func promJobName(opts *metricsOpts) string {
// Include parent resource namespace for proxies created for namespaced types.
if opts.proxyType == proxyTypeIngressResource || opts.proxyType == proxyTypeIngressService {
return fmt.Sprintf("ts_%s_%s_%s", opts.proxyType, opts.proxyLabels[LabelParentNamespace], opts.proxyLabels[LabelParentName])
}
return fmt.Sprintf("ts_%s_%s", opts.proxyType, opts.proxyLabels[LabelParentName])
}
// metricsSvcSelector returns the minimum label set to uniquely identify a metrics Service for a proxy.
func metricsSvcSelector(proxyLabels map[string]string, proxyType string) map[string]string {
sel := map[string]string{
labelPromProxyType: proxyType,
labelPromProxyParentName: proxyLabels[LabelParentName],
}
// Include namespace label for proxies created for a namespaced type.
if isNamespacedProxyType(proxyType) {
sel[labelPromProxyParentNamespace] = proxyLabels[LabelParentNamespace]
}
return sel
}
// serviceMonitorTemplate returns a base ServiceMonitor type that, when converted to Unstructured, is a valid type that
// can be used in kube API server calls via the c/r client.
func serviceMonitorTemplate(name, ns string) *ServiceMonitor {
return &ServiceMonitor{
TypeMeta: metav1.TypeMeta{
Kind: "ServiceMonitor",
APIVersion: "monitoring.coreos.com/v1",
},
ObjectMeta: metav1.ObjectMeta{
Name: name,
Namespace: ns,
},
}
}
type metricsOpts struct {
proxyStsName string // name of StatefulSet for proxy
tsNamespace string // namespace in which Tailscale is installed
proxyLabels map[string]string // labels of the proxy StatefulSet
proxyType string
}
func isNamespacedProxyType(typ string) bool {
return typ == proxyTypeIngressResource || typ == proxyTypeIngressService
}

View File

@@ -9,6 +9,7 @@ import (
"context"
"fmt"
"slices"
"strings"
"sync"
_ "embed"
@@ -86,7 +87,7 @@ func (a *NameserverReconciler) Reconcile(ctx context.Context, req reconcile.Requ
return reconcile.Result{}, nil
}
logger.Info("Cleaning up DNSConfig resources")
if err := a.maybeCleanup(ctx, &dnsCfg, logger); err != nil {
if err := a.maybeCleanup(&dnsCfg); err != nil {
logger.Errorf("error cleaning up reconciler resource: %v", err)
return res, err
}
@@ -100,9 +101,9 @@ func (a *NameserverReconciler) Reconcile(ctx context.Context, req reconcile.Requ
}
oldCnStatus := dnsCfg.Status.DeepCopy()
setStatus := func(dnsCfg *tsapi.DNSConfig, conditionType tsapi.ConditionType, status metav1.ConditionStatus, reason, message string) (reconcile.Result, error) {
setStatus := func(dnsCfg *tsapi.DNSConfig, status metav1.ConditionStatus, reason, message string) (reconcile.Result, error) {
tsoperator.SetDNSConfigCondition(dnsCfg, tsapi.NameserverReady, status, reason, message, dnsCfg.Generation, a.clock, logger)
if !apiequality.Semantic.DeepEqual(oldCnStatus, dnsCfg.Status) {
if !apiequality.Semantic.DeepEqual(oldCnStatus, &dnsCfg.Status) {
// An error encountered here should get returned by the Reconcile function.
if updateErr := a.Client.Status().Update(ctx, dnsCfg); updateErr != nil {
err = errors.Wrap(err, updateErr.Error())
@@ -118,7 +119,7 @@ func (a *NameserverReconciler) Reconcile(ctx context.Context, req reconcile.Requ
msg := "invalid cluster configuration: more than one tailscale.com/dnsconfigs found. Please ensure that no more than one is created."
logger.Error(msg)
a.recorder.Event(&dnsCfg, corev1.EventTypeWarning, reasonMultipleDNSConfigsPresent, messageMultipleDNSConfigsPresent)
setStatus(&dnsCfg, tsapi.NameserverReady, metav1.ConditionFalse, reasonMultipleDNSConfigsPresent, messageMultipleDNSConfigsPresent)
setStatus(&dnsCfg, metav1.ConditionFalse, reasonMultipleDNSConfigsPresent, messageMultipleDNSConfigsPresent)
}
if !slices.Contains(dnsCfg.Finalizers, FinalizerName) {
@@ -127,11 +128,16 @@ func (a *NameserverReconciler) Reconcile(ctx context.Context, req reconcile.Requ
if err := a.Update(ctx, &dnsCfg); err != nil {
msg := fmt.Sprintf(messageNameserverCreationFailed, err)
logger.Error(msg)
return setStatus(&dnsCfg, tsapi.NameserverReady, metav1.ConditionFalse, reasonNameserverCreationFailed, msg)
return setStatus(&dnsCfg, metav1.ConditionFalse, reasonNameserverCreationFailed, msg)
}
}
if err := a.maybeProvision(ctx, &dnsCfg, logger); err != nil {
return reconcile.Result{}, fmt.Errorf("error provisioning nameserver resources: %w", err)
if strings.Contains(err.Error(), optimisticLockErrorMsg) {
logger.Infof("optimistic lock error, retrying: %s", err)
return reconcile.Result{}, nil
} else {
return reconcile.Result{}, fmt.Errorf("error provisioning nameserver resources: %w", err)
}
}
a.mu.Lock()
@@ -149,7 +155,7 @@ func (a *NameserverReconciler) Reconcile(ctx context.Context, req reconcile.Requ
dnsCfg.Status.Nameserver = &tsapi.NameserverStatus{
IP: ip,
}
return setStatus(&dnsCfg, tsapi.NameserverReady, metav1.ConditionTrue, reasonNameserverCreated, reasonNameserverCreated)
return setStatus(&dnsCfg, metav1.ConditionTrue, reasonNameserverCreated, reasonNameserverCreated)
}
logger.Info("nameserver Service does not have an IP address allocated, waiting...")
return reconcile.Result{}, nil
@@ -188,7 +194,7 @@ func (a *NameserverReconciler) maybeProvision(ctx context.Context, tsDNSCfg *tsa
// maybeCleanup removes DNSConfig from being tracked. The cluster resources
// created, will be automatically garbage collected as they are owned by the
// DNSConfig.
func (a *NameserverReconciler) maybeCleanup(ctx context.Context, dnsCfg *tsapi.DNSConfig, logger *zap.SugaredLogger) error {
func (a *NameserverReconciler) maybeCleanup(dnsCfg *tsapi.DNSConfig) error {
a.mu.Lock()
a.managedNameservers.Remove(dnsCfg.UID)
a.mu.Unlock()

View File

@@ -24,8 +24,11 @@ import (
discoveryv1 "k8s.io/api/discovery/v1"
networkingv1 "k8s.io/api/networking/v1"
rbacv1 "k8s.io/api/rbac/v1"
apiextensionsv1 "k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1"
"k8s.io/apimachinery/pkg/fields"
"k8s.io/apimachinery/pkg/types"
"k8s.io/client-go/rest"
toolscache "k8s.io/client-go/tools/cache"
"sigs.k8s.io/controller-runtime/pkg/builder"
"sigs.k8s.io/controller-runtime/pkg/cache"
"sigs.k8s.io/controller-runtime/pkg/client"
@@ -239,21 +242,29 @@ func runReconcilers(opts reconcilerOpts) {
nsFilter := cache.ByObject{
Field: client.InNamespace(opts.tailscaleNamespace).AsSelector(),
}
// We watch the ServiceMonitor CRD to ensure that reconcilers are re-triggered if user's workflows result in the
// ServiceMonitor CRD applied after some of our resources that define ServiceMonitor creation. This selector
// ensures that we only watch the ServiceMonitor CRD and that we don't cache full contents of it.
serviceMonitorSelector := cache.ByObject{
Field: fields.SelectorFromSet(fields.Set{"metadata.name": serviceMonitorCRD}),
Transform: crdTransformer(startlog),
}
mgrOpts := manager.Options{
// TODO (irbekrm): stricter filtering what we watch/cache/call
// reconcilers on. c/r by default starts a watch on any
// resources that we GET via the controller manager's client.
Cache: cache.Options{
ByObject: map[client.Object]cache.ByObject{
&corev1.Secret{}: nsFilter,
&corev1.ServiceAccount{}: nsFilter,
&corev1.Pod{}: nsFilter,
&corev1.ConfigMap{}: nsFilter,
&appsv1.StatefulSet{}: nsFilter,
&appsv1.Deployment{}: nsFilter,
&discoveryv1.EndpointSlice{}: nsFilter,
&rbacv1.Role{}: nsFilter,
&rbacv1.RoleBinding{}: nsFilter,
&corev1.Secret{}: nsFilter,
&corev1.ServiceAccount{}: nsFilter,
&corev1.Pod{}: nsFilter,
&corev1.ConfigMap{}: nsFilter,
&appsv1.StatefulSet{}: nsFilter,
&appsv1.Deployment{}: nsFilter,
&discoveryv1.EndpointSlice{}: nsFilter,
&rbacv1.Role{}: nsFilter,
&rbacv1.RoleBinding{}: nsFilter,
&apiextensionsv1.CustomResourceDefinition{}: serviceMonitorSelector,
},
},
Scheme: tsapi.GlobalScheme,
@@ -422,8 +433,13 @@ func runReconcilers(opts reconcilerOpts) {
startlog.Fatalf("could not create egress EndpointSlices reconciler: %v", err)
}
// ProxyClass reconciler gets triggered on ServiceMonitor CRD changes to ensure that any ProxyClasses, that
// define that a ServiceMonitor should be created, were set to invalid because the CRD did not exist get
// reconciled if the CRD is applied at a later point.
serviceMonitorFilter := handler.EnqueueRequestsFromMapFunc(proxyClassesWithServiceMonitor(mgr.GetClient(), opts.log))
err = builder.ControllerManagedBy(mgr).
For(&tsapi.ProxyClass{}).
Watches(&apiextensionsv1.CustomResourceDefinition{}, serviceMonitorFilter).
Complete(&ProxyClassReconciler{
Client: mgr.GetClient(),
recorder: eventRecorder,
@@ -1018,6 +1034,49 @@ func epsFromExternalNameService(cl client.Client, logger *zap.SugaredLogger, ns
}
}
// proxyClassesWithServiceMonitor returns an event handler that, given that the event is for the Prometheus
// ServiceMonitor CRD, returns all ProxyClasses that define that a ServiceMonitor should be created.
func proxyClassesWithServiceMonitor(cl client.Client, logger *zap.SugaredLogger) handler.MapFunc {
return func(ctx context.Context, o client.Object) []reconcile.Request {
crd, ok := o.(*apiextensionsv1.CustomResourceDefinition)
if !ok {
logger.Debugf("[unexpected] ServiceMonitor CRD handler received an object that is not a CustomResourceDefinition")
return nil
}
if crd.Name != serviceMonitorCRD {
logger.Debugf("[unexpected] ServiceMonitor CRD handler received an unexpected CRD %q", crd.Name)
return nil
}
pcl := &tsapi.ProxyClassList{}
if err := cl.List(ctx, pcl); err != nil {
logger.Debugf("[unexpected] error listing ProxyClasses: %v", err)
return nil
}
reqs := make([]reconcile.Request, 0)
for _, pc := range pcl.Items {
if pc.Spec.Metrics != nil && pc.Spec.Metrics.ServiceMonitor != nil && pc.Spec.Metrics.ServiceMonitor.Enable {
reqs = append(reqs, reconcile.Request{
NamespacedName: types.NamespacedName{Namespace: pc.Namespace, Name: pc.Name},
})
}
}
return reqs
}
}
// crdTransformer gets called before a CRD is stored to c/r cache, it removes the CRD spec to reduce memory consumption.
func crdTransformer(log *zap.SugaredLogger) toolscache.TransformFunc {
return func(o any) (any, error) {
crd, ok := o.(*apiextensionsv1.CustomResourceDefinition)
if !ok {
log.Infof("[unexpected] CRD transformer called for a non-CRD type")
return crd, nil
}
crd.Spec = apiextensionsv1.CustomResourceDefinitionSpec{}
return crd, nil
}
}
// indexEgressServices adds a local index to a cached Tailscale egress Services meant to be exposed on a ProxyGroup. The
// index is used a list filter.
func indexEgressServices(o client.Object) []string {

View File

@@ -1388,7 +1388,7 @@ func TestTailscaledConfigfileHash(t *testing.T) {
parentType: "svc",
hostname: "default-test",
clusterTargetIP: "10.20.30.40",
confFileHash: "a67b5ad3ff605531c822327e8f1a23dd0846e1075b722c13402f7d5d0ba32ba2",
confFileHash: "acf3467364b0a3ba9b8ee0dd772cb7c2f0bf585e288fa99b7fe4566009ed6041",
app: kubetypes.AppIngressProxy,
}
expectEqual(t, fc, expectedSTS(t, fc, o), nil)
@@ -1399,7 +1399,7 @@ func TestTailscaledConfigfileHash(t *testing.T) {
mak.Set(&svc.Annotations, AnnotationHostname, "another-test")
})
o.hostname = "another-test"
o.confFileHash = "888a993ebee20ad6be99623b45015339de117946850cf1252bede0b570e04293"
o.confFileHash = "d4cc13f09f55f4f6775689004f9a466723325b84d2b590692796bfe22aeaa389"
expectReconciled(t, sr, "default", "test")
expectEqual(t, fc, expectedSTS(t, fc, o), nil)
}

View File

@@ -15,6 +15,7 @@ import (
dockerref "github.com/distribution/reference"
"go.uber.org/zap"
corev1 "k8s.io/api/core/v1"
apiextensionsv1 "k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1"
apiequality "k8s.io/apimachinery/pkg/api/equality"
apierrors "k8s.io/apimachinery/pkg/api/errors"
apivalidation "k8s.io/apimachinery/pkg/api/validation"
@@ -95,14 +96,14 @@ func (pcr *ProxyClassReconciler) Reconcile(ctx context.Context, req reconcile.Re
pcr.mu.Unlock()
oldPCStatus := pc.Status.DeepCopy()
if errs := pcr.validate(pc); errs != nil {
if errs := pcr.validate(ctx, pc); errs != nil {
msg := fmt.Sprintf(messageProxyClassInvalid, errs.ToAggregate().Error())
pcr.recorder.Event(pc, corev1.EventTypeWarning, reasonProxyClassInvalid, msg)
tsoperator.SetProxyClassCondition(pc, tsapi.ProxyClassReady, metav1.ConditionFalse, reasonProxyClassInvalid, msg, pc.Generation, pcr.clock, logger)
} else {
tsoperator.SetProxyClassCondition(pc, tsapi.ProxyClassReady, metav1.ConditionTrue, reasonProxyClassValid, reasonProxyClassValid, pc.Generation, pcr.clock, logger)
}
if !apiequality.Semantic.DeepEqual(oldPCStatus, pc.Status) {
if !apiequality.Semantic.DeepEqual(oldPCStatus, &pc.Status) {
if err := pcr.Client.Status().Update(ctx, pc); err != nil {
logger.Errorf("error updating ProxyClass status: %v", err)
return reconcile.Result{}, err
@@ -111,7 +112,7 @@ func (pcr *ProxyClassReconciler) Reconcile(ctx context.Context, req reconcile.Re
return reconcile.Result{}, nil
}
func (pcr *ProxyClassReconciler) validate(pc *tsapi.ProxyClass) (violations field.ErrorList) {
func (pcr *ProxyClassReconciler) validate(ctx context.Context, pc *tsapi.ProxyClass) (violations field.ErrorList) {
if sts := pc.Spec.StatefulSet; sts != nil {
if len(sts.Labels) > 0 {
if errs := metavalidation.ValidateLabels(sts.Labels, field.NewPath(".spec.statefulSet.labels")); errs != nil {
@@ -167,6 +168,16 @@ func (pcr *ProxyClassReconciler) validate(pc *tsapi.ProxyClass) (violations fiel
}
}
}
if pc.Spec.Metrics != nil && pc.Spec.Metrics.ServiceMonitor != nil && pc.Spec.Metrics.ServiceMonitor.Enable {
found, err := hasServiceMonitorCRD(ctx, pcr.Client)
if err != nil {
pcr.logger.Infof("[unexpected]: error retrieving %q CRD: %v", serviceMonitorCRD, err)
// best effort validation - don't error out here
} else if !found {
msg := fmt.Sprintf("ProxyClass defines that a ServiceMonitor custom resource should be created, but %q CRD was not found", serviceMonitorCRD)
violations = append(violations, field.TypeInvalid(field.NewPath("spec", "metrics", "serviceMonitor"), "enable", msg))
}
}
// We do not validate embedded fields (security context, resource
// requirements etc) as we inherit upstream validation for those fields.
// Invalid values would get rejected by upstream validations at apply
@@ -174,6 +185,16 @@ func (pcr *ProxyClassReconciler) validate(pc *tsapi.ProxyClass) (violations fiel
return violations
}
func hasServiceMonitorCRD(ctx context.Context, cl client.Client) (bool, error) {
sm := &apiextensionsv1.CustomResourceDefinition{}
if err := cl.Get(ctx, types.NamespacedName{Name: serviceMonitorCRD}, sm); apierrors.IsNotFound(err) {
return false, nil
} else if err != nil {
return false, err
}
return true, nil
}
// maybeCleanup removes tailscale.com finalizer and ensures that the ProxyClass
// is no longer counted towards k8s_proxyclass_resources.
func (pcr *ProxyClassReconciler) maybeCleanup(ctx context.Context, logger *zap.SugaredLogger, pc *tsapi.ProxyClass) error {

View File

@@ -8,10 +8,12 @@
package main
import (
"context"
"testing"
"time"
"go.uber.org/zap"
apiextensionsv1 "k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
"k8s.io/client-go/tools/record"
@@ -134,6 +136,25 @@ func TestProxyClass(t *testing.T) {
"Warning CustomTSEnvVar ProxyClass overrides the default value for EXPERIMENTAL_ALLOW_PROXYING_CLUSTER_TRAFFIC_VIA_INGRESS env var for tailscale container. Running with custom values for Tailscale env vars is not recommended and might break in the future."}
expectReconciled(t, pcr, "", "test")
expectEvents(t, fr, expectedEvents)
// 6. A ProxyClass with ServiceMonitor enabled and in a cluster that has not ServiceMonitor CRD is invalid
pc.Spec.Metrics = &tsapi.Metrics{Enable: true, ServiceMonitor: &tsapi.ServiceMonitor{Enable: true}}
mustUpdate(t, fc, "", "test", func(proxyClass *tsapi.ProxyClass) {
proxyClass.Spec = pc.Spec
})
expectReconciled(t, pcr, "", "test")
msg = `ProxyClass is not valid: spec.metrics.serviceMonitor: Invalid value: "enable": ProxyClass defines that a ServiceMonitor custom resource should be created, but "servicemonitors.monitoring.coreos.com" CRD was not found`
tsoperator.SetProxyClassCondition(pc, tsapi.ProxyClassReady, metav1.ConditionFalse, reasonProxyClassInvalid, msg, 0, cl, zl.Sugar())
expectEqual(t, fc, pc, nil)
expectedEvent = "Warning ProxyClassInvalid " + msg
expectEvents(t, fr, []string{expectedEvent})
// 7. A ProxyClass with ServiceMonitor enabled and in a cluster that does have the ServiceMonitor CRD is valid
crd := &apiextensionsv1.CustomResourceDefinition{ObjectMeta: metav1.ObjectMeta{Name: serviceMonitorCRD}}
mustCreate(t, fc, crd)
expectReconciled(t, pcr, "", "test")
tsoperator.SetProxyClassCondition(pc, tsapi.ProxyClassReady, metav1.ConditionTrue, reasonProxyClassValid, reasonProxyClassValid, 0, cl, zl.Sugar())
expectEqual(t, fc, pc, nil)
}
func TestValidateProxyClass(t *testing.T) {
@@ -180,7 +201,7 @@ func TestValidateProxyClass(t *testing.T) {
} {
t.Run(name, func(t *testing.T) {
pcr := &ProxyClassReconciler{}
err := pcr.validate(tc.pc)
err := pcr.validate(context.Background(), tc.pc)
valid := err == nil
if valid != tc.valid {
t.Errorf("expected valid=%v, got valid=%v, err=%v", tc.valid, valid, err)

View File

@@ -12,6 +12,7 @@ import (
"fmt"
"net/http"
"slices"
"strings"
"sync"
"github.com/pkg/errors"
@@ -45,6 +46,9 @@ const (
reasonProxyGroupReady = "ProxyGroupReady"
reasonProxyGroupCreating = "ProxyGroupCreating"
reasonProxyGroupInvalid = "ProxyGroupInvalid"
// Copied from k8s.io/apiserver/pkg/registry/generic/registry/store.go@cccad306d649184bf2a0e319ba830c53f65c445c
optimisticLockErrorMsg = "the object has been modified; please apply your changes to the latest version and try again"
)
var gaugeProxyGroupResources = clientmetric.NewGauge(kubetypes.MetricProxyGroupEgressCount)
@@ -110,7 +114,7 @@ func (r *ProxyGroupReconciler) Reconcile(ctx context.Context, req reconcile.Requ
oldPGStatus := pg.Status.DeepCopy()
setStatusReady := func(pg *tsapi.ProxyGroup, status metav1.ConditionStatus, reason, message string) (reconcile.Result, error) {
tsoperator.SetProxyGroupCondition(pg, tsapi.ProxyGroupReady, status, reason, message, pg.Generation, r.clock, logger)
if !apiequality.Semantic.DeepEqual(oldPGStatus, pg.Status) {
if !apiequality.Semantic.DeepEqual(oldPGStatus, &pg.Status) {
// An error encountered here should get returned by the Reconcile function.
if updateErr := r.Client.Status().Update(ctx, pg); updateErr != nil {
err = errors.Wrap(err, updateErr.Error())
@@ -166,9 +170,17 @@ func (r *ProxyGroupReconciler) Reconcile(ctx context.Context, req reconcile.Requ
}
if err = r.maybeProvision(ctx, pg, proxyClass); err != nil {
err = fmt.Errorf("error provisioning ProxyGroup resources: %w", err)
r.recorder.Eventf(pg, corev1.EventTypeWarning, reasonProxyGroupCreationFailed, err.Error())
return setStatusReady(pg, metav1.ConditionFalse, reasonProxyGroupCreationFailed, err.Error())
reason := reasonProxyGroupCreationFailed
msg := fmt.Sprintf("error provisioning ProxyGroup resources: %s", err)
if strings.Contains(err.Error(), optimisticLockErrorMsg) {
reason = reasonProxyGroupCreating
msg = fmt.Sprintf("optimistic lock error, retrying: %s", err)
err = nil
logger.Info(msg)
} else {
r.recorder.Eventf(pg, corev1.EventTypeWarning, reason, msg)
}
return setStatusReady(pg, metav1.ConditionFalse, reason, msg)
}
desiredReplicas := int(pgReplicas(pg))
@@ -259,6 +271,15 @@ func (r *ProxyGroupReconciler) maybeProvision(ctx context.Context, pg *tsapi.Pro
}); err != nil {
return fmt.Errorf("error provisioning StatefulSet: %w", err)
}
mo := &metricsOpts{
tsNamespace: r.tsNamespace,
proxyStsName: pg.Name,
proxyLabels: pgLabels(pg.Name, nil),
proxyType: "proxygroup",
}
if err := reconcileMetricsResources(ctx, logger, mo, proxyClass, r.Client); err != nil {
return fmt.Errorf("error reconciling metrics resources: %w", err)
}
if err := r.cleanupDanglingResources(ctx, pg); err != nil {
return fmt.Errorf("error cleaning up dangling resources: %w", err)
@@ -327,6 +348,14 @@ func (r *ProxyGroupReconciler) maybeCleanup(ctx context.Context, pg *tsapi.Proxy
}
}
mo := &metricsOpts{
proxyLabels: pgLabels(pg.Name, nil),
tsNamespace: r.tsNamespace,
proxyType: "proxygroup"}
if err := maybeCleanupMetricsResources(ctx, mo, r.Client); err != nil {
return false, fmt.Errorf("error cleaning up metrics resources: %w", err)
}
logger.Infof("cleaned up ProxyGroup resources")
r.mu.Lock()
r.proxyGroups.Remove(pg.UID)

View File

@@ -17,6 +17,7 @@ import (
appsv1 "k8s.io/api/apps/v1"
corev1 "k8s.io/api/core/v1"
rbacv1 "k8s.io/api/rbac/v1"
apiextensionsv1 "k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/client-go/tools/record"
"sigs.k8s.io/controller-runtime/pkg/client"
@@ -76,6 +77,13 @@ func TestProxyGroup(t *testing.T) {
l: zl.Sugar(),
clock: cl,
}
crd := &apiextensionsv1.CustomResourceDefinition{ObjectMeta: metav1.ObjectMeta{Name: serviceMonitorCRD}}
opts := configOpts{
proxyType: "proxygroup",
stsName: pg.Name,
parentType: "proxygroup",
tailscaleNamespace: "tailscale",
}
t.Run("proxyclass_not_ready", func(t *testing.T) {
expectReconciled(t, reconciler, "", pg.Name)
@@ -190,6 +198,27 @@ func TestProxyGroup(t *testing.T) {
expectProxyGroupResources(t, fc, pg, true, "518a86e9fae64f270f8e0ec2a2ea6ca06c10f725035d3d6caca132cd61e42a74")
})
t.Run("enable_metrics", func(t *testing.T) {
pc.Spec.Metrics = &tsapi.Metrics{Enable: true}
mustUpdate(t, fc, "", pc.Name, func(p *tsapi.ProxyClass) {
p.Spec = pc.Spec
})
expectReconciled(t, reconciler, "", pg.Name)
expectEqual(t, fc, expectedMetricsService(opts), nil)
})
t.Run("enable_service_monitor_no_crd", func(t *testing.T) {
pc.Spec.Metrics.ServiceMonitor = &tsapi.ServiceMonitor{Enable: true}
mustUpdate(t, fc, "", pc.Name, func(p *tsapi.ProxyClass) {
p.Spec.Metrics = pc.Spec.Metrics
})
expectReconciled(t, reconciler, "", pg.Name)
})
t.Run("create_crd_expect_service_monitor", func(t *testing.T) {
mustCreate(t, fc, crd)
expectReconciled(t, reconciler, "", pg.Name)
expectEqualUnstructured(t, fc, expectedServiceMonitor(t, opts))
})
t.Run("delete_and_cleanup", func(t *testing.T) {
if err := fc.Delete(context.Background(), pg); err != nil {
t.Fatal(err)
@@ -197,7 +226,7 @@ func TestProxyGroup(t *testing.T) {
expectReconciled(t, reconciler, "", pg.Name)
expectMissing[tsapi.Recorder](t, fc, "", pg.Name)
expectMissing[tsapi.ProxyGroup](t, fc, "", pg.Name)
if expected := 0; reconciler.proxyGroups.Len() != expected {
t.Fatalf("expected %d ProxyGroups, got %d", expected, reconciler.proxyGroups.Len())
}
@@ -206,6 +235,7 @@ func TestProxyGroup(t *testing.T) {
if diff := cmp.Diff(tsClient.deleted, []string{"nodeid-1", "nodeid-2", "nodeid-0"}); diff != "" {
t.Fatalf("unexpected deleted devices (-got +want):\n%s", diff)
}
expectMissing[corev1.Service](t, reconciler, "tailscale", metricsResourceName(pg.Name))
// The fake client does not clean up objects whose owner has been
// deleted, so we can't test for the owned resources getting deleted.
})

View File

@@ -15,6 +15,7 @@ import (
"net/http"
"os"
"slices"
"strconv"
"strings"
"go.uber.org/zap"
@@ -94,6 +95,12 @@ const (
podAnnotationLastSetTailnetTargetFQDN = "tailscale.com/operator-last-set-ts-tailnet-target-fqdn"
// podAnnotationLastSetConfigFileHash is sha256 hash of the current tailscaled configuration contents.
podAnnotationLastSetConfigFileHash = "tailscale.com/operator-last-set-config-file-hash"
proxyTypeEgress = "egress_service"
proxyTypeIngressService = "ingress_service"
proxyTypeIngressResource = "ingress_resource"
proxyTypeConnector = "connector"
proxyTypeProxyGroup = "proxygroup"
)
var (
@@ -122,6 +129,8 @@ type tailscaleSTSConfig struct {
Hostname string
Tags []string // if empty, use defaultTags
proxyType string
// Connector specifies a configuration of a Connector instance if that's
// what this StatefulSet should be created for.
Connector *connector
@@ -189,22 +198,30 @@ func (a *tailscaleSTSReconciler) Provision(ctx context.Context, logger *zap.Suga
}
sts.ProxyClass = proxyClass
secretName, tsConfigHash, configs, err := a.createOrGetSecret(ctx, logger, sts, hsvc)
secretName, tsConfigHash, _, err := a.createOrGetSecret(ctx, logger, sts, hsvc)
if err != nil {
return nil, fmt.Errorf("failed to create or get API key secret: %w", err)
}
_, err = a.reconcileSTS(ctx, logger, sts, hsvc, secretName, tsConfigHash, configs)
_, err = a.reconcileSTS(ctx, logger, sts, hsvc, secretName, tsConfigHash)
if err != nil {
return nil, fmt.Errorf("failed to reconcile statefulset: %w", err)
}
mo := &metricsOpts{
proxyStsName: hsvc.Name,
tsNamespace: hsvc.Namespace,
proxyLabels: hsvc.Labels,
proxyType: sts.proxyType,
}
if err = reconcileMetricsResources(ctx, logger, mo, sts.ProxyClass, a.Client); err != nil {
return nil, fmt.Errorf("failed to ensure metrics resources: %w", err)
}
return hsvc, nil
}
// Cleanup removes all resources associated that were created by Provision with
// the given labels. It returns true when all resources have been removed,
// otherwise it returns false and the caller should retry later.
func (a *tailscaleSTSReconciler) Cleanup(ctx context.Context, logger *zap.SugaredLogger, labels map[string]string) (done bool, _ error) {
func (a *tailscaleSTSReconciler) Cleanup(ctx context.Context, logger *zap.SugaredLogger, labels map[string]string, typ string) (done bool, _ error) {
// Need to delete the StatefulSet first, and delete it with foreground
// cascading deletion. That way, the pod that's writing to the Secret will
// stop running before we start looking at the Secret's contents, and
@@ -230,21 +247,21 @@ func (a *tailscaleSTSReconciler) Cleanup(ctx context.Context, logger *zap.Sugare
return false, nil
}
id, _, _, err := a.DeviceInfo(ctx, labels)
dev, err := a.DeviceInfo(ctx, labels, logger)
if err != nil {
return false, fmt.Errorf("getting device info: %w", err)
}
if id != "" {
logger.Debugf("deleting device %s from control", string(id))
if err := a.tsClient.DeleteDevice(ctx, string(id)); err != nil {
if dev != nil && dev.id != "" {
logger.Debugf("deleting device %s from control", string(dev.id))
if err := a.tsClient.DeleteDevice(ctx, string(dev.id)); err != nil {
errResp := &tailscale.ErrResponse{}
if ok := errors.As(err, errResp); ok && errResp.Status == http.StatusNotFound {
logger.Debugf("device %s not found, likely because it has already been deleted from control", string(id))
logger.Debugf("device %s not found, likely because it has already been deleted from control", string(dev.id))
} else {
return false, fmt.Errorf("deleting device: %w", err)
}
} else {
logger.Debugf("device %s deleted from control", string(id))
logger.Debugf("device %s deleted from control", string(dev.id))
}
}
@@ -257,6 +274,14 @@ func (a *tailscaleSTSReconciler) Cleanup(ctx context.Context, logger *zap.Sugare
return false, err
}
}
mo := &metricsOpts{
proxyLabels: labels,
tsNamespace: a.operatorNamespace,
proxyType: typ,
}
if err := maybeCleanupMetricsResources(ctx, mo, a.Client); err != nil {
return false, fmt.Errorf("error cleaning up metrics resources: %w", err)
}
return true, nil
}
@@ -416,40 +441,66 @@ func sanitizeConfigBytes(c ipn.ConfigVAlpha) string {
// that acts as an operator proxy. It retrieves info from a Kubernetes Secret
// labeled with the provided labels.
// Either of device ID, hostname and IPs can be empty string if not found in the Secret.
func (a *tailscaleSTSReconciler) DeviceInfo(ctx context.Context, childLabels map[string]string) (id tailcfg.StableNodeID, hostname string, ips []string, err error) {
func (a *tailscaleSTSReconciler) DeviceInfo(ctx context.Context, childLabels map[string]string, logger *zap.SugaredLogger) (dev *device, err error) {
sec, err := getSingleObject[corev1.Secret](ctx, a.Client, a.operatorNamespace, childLabels)
if err != nil {
return "", "", nil, err
return dev, err
}
if sec == nil {
return "", "", nil, nil
return dev, nil
}
pod := new(corev1.Pod)
if err := a.Get(ctx, types.NamespacedName{Namespace: sec.Namespace, Name: sec.Name}, pod); err != nil && !apierrors.IsNotFound(err) {
return dev, nil
}
return deviceInfo(sec)
return deviceInfo(sec, pod, logger)
}
func deviceInfo(sec *corev1.Secret) (id tailcfg.StableNodeID, hostname string, ips []string, err error) {
id = tailcfg.StableNodeID(sec.Data["device_id"])
// device contains tailscale state of a proxy device as gathered from its tailscale state Secret.
type device struct {
id tailcfg.StableNodeID // device's stable ID
hostname string // MagicDNS name of the device
ips []string // Tailscale IPs of the device
// ingressDNSName is the L7 Ingress DNS name. In practice this will be the same value as hostname, but only set
// when the device has been configured to serve traffic on it via 'tailscale serve'.
ingressDNSName string
}
func deviceInfo(sec *corev1.Secret, pod *corev1.Pod, log *zap.SugaredLogger) (dev *device, err error) {
id := tailcfg.StableNodeID(sec.Data[kubetypes.KeyDeviceID])
if id == "" {
return "", "", nil, nil
return dev, nil
}
dev = &device{id: id}
// Kubernetes chokes on well-formed FQDNs with the trailing dot, so we have
// to remove it.
hostname = strings.TrimSuffix(string(sec.Data["device_fqdn"]), ".")
if hostname == "" {
dev.hostname = strings.TrimSuffix(string(sec.Data[kubetypes.KeyDeviceFQDN]), ".")
if dev.hostname == "" {
// Device ID gets stored and retrieved in a different flow than
// FQDN and IPs. A device that acts as Kubernetes operator
// proxy, but whose route setup has failed might have an device
// proxy, but whose route setup has failed might have a device
// ID, but no FQDN/IPs. If so, return the ID, to allow the
// operator to clean up such devices.
return id, "", nil, nil
return dev, nil
}
if rawDeviceIPs, ok := sec.Data["device_ips"]; ok {
if err := json.Unmarshal(rawDeviceIPs, &ips); err != nil {
return "", "", nil, err
// TODO(irbekrm): we fall back to using the hostname field to determine Ingress's hostname to ensure backwards
// compatibility. In 1.82 we can remove this fallback mechanism.
dev.ingressDNSName = dev.hostname
if proxyCapVer(sec, pod, log) >= 109 {
dev.ingressDNSName = strings.TrimSuffix(string(sec.Data[kubetypes.KeyHTTPSEndpoint]), ".")
if strings.EqualFold(dev.ingressDNSName, kubetypes.ValueNoHTTPS) {
dev.ingressDNSName = ""
}
}
return id, hostname, ips, nil
if rawDeviceIPs, ok := sec.Data[kubetypes.KeyDeviceIPs]; ok {
ips := make([]string, 0)
if err := json.Unmarshal(rawDeviceIPs, &ips); err != nil {
return nil, err
}
dev.ips = ips
}
return dev, nil
}
func newAuthKey(ctx context.Context, tsClient tsClient, tags []string) (string, error) {
@@ -476,7 +527,7 @@ var proxyYaml []byte
//go:embed deploy/manifests/userspace-proxy.yaml
var userspaceProxyYaml []byte
func (a *tailscaleSTSReconciler) reconcileSTS(ctx context.Context, logger *zap.SugaredLogger, sts *tailscaleSTSConfig, headlessSvc *corev1.Service, proxySecret, tsConfigHash string, _ map[tailcfg.CapabilityVersion]ipn.ConfigVAlpha) (*appsv1.StatefulSet, error) {
func (a *tailscaleSTSReconciler) reconcileSTS(ctx context.Context, logger *zap.SugaredLogger, sts *tailscaleSTSConfig, headlessSvc *corev1.Service, proxySecret, tsConfigHash string) (*appsv1.StatefulSet, error) {
ss := new(appsv1.StatefulSet)
if sts.ServeConfig != nil && sts.ForwardClusterTrafficViaL7IngressProxy != true { // If forwarding cluster traffic via is required we need non-userspace + NET_ADMIN + forwarding
if err := yaml.Unmarshal(userspaceProxyYaml, &ss); err != nil {
@@ -818,7 +869,7 @@ func enableEndpoints(ss *appsv1.StatefulSet, metrics, debug bool) {
Value: "$(POD_IP):9002",
},
corev1.EnvVar{
Name: "TS_METRICS_ENABLED",
Name: "TS_ENABLE_METRICS",
Value: "true",
},
)
@@ -854,17 +905,10 @@ func tailscaledConfig(stsC *tailscaleSTSConfig, newAuthkey string, oldSecret *co
AcceptRoutes: "false", // AcceptRoutes defaults to true
Locked: "false",
Hostname: &stsC.Hostname,
NoStatefulFiltering: "false",
NoStatefulFiltering: "true", // Explicitly enforce default value, see #14216
AppConnector: &ipn.AppConnectorPrefs{Advertise: false},
}
// For egress proxies only, we need to ensure that stateful filtering is
// not in place so that traffic from cluster can be forwarded via
// Tailscale IPs.
// TODO (irbekrm): set it to true always as this is now the default in core.
if stsC.TailnetTargetFQDN != "" || stsC.TailnetTargetIP != "" {
conf.NoStatefulFiltering = "true"
}
if stsC.Connector != nil {
routes, err := netutil.CalcAdvertiseRoutes(stsC.Connector.routes, stsC.Connector.isExitNode)
if err != nil {
@@ -1067,3 +1111,23 @@ func nameForService(svc *corev1.Service) string {
func isValidFirewallMode(m string) bool {
return m == "auto" || m == "nftables" || m == "iptables"
}
// proxyCapVer accepts a proxy state Secret and a proxy Pod returns the capability version of a proxy Pod.
// This is best effort - if the capability version can not (currently) be determined, it returns -1.
func proxyCapVer(sec *corev1.Secret, pod *corev1.Pod, log *zap.SugaredLogger) tailcfg.CapabilityVersion {
if sec == nil || pod == nil {
return tailcfg.CapabilityVersion(-1)
}
if len(sec.Data[kubetypes.KeyCapVer]) == 0 || len(sec.Data[kubetypes.KeyPodUID]) == 0 {
return tailcfg.CapabilityVersion(-1)
}
capVer, err := strconv.Atoi(string(sec.Data[kubetypes.KeyCapVer]))
if err != nil {
log.Infof("[unexpected]: unexpected capability version in proxy's state Secret, expected an integer, got %q", string(sec.Data[kubetypes.KeyCapVer]))
return tailcfg.CapabilityVersion(-1)
}
if !strings.EqualFold(string(pod.ObjectMeta.UID), string(sec.Data[kubetypes.KeyPodUID])) {
return tailcfg.CapabilityVersion(-1)
}
return tailcfg.CapabilityVersion(capVer)
}

View File

@@ -258,7 +258,7 @@ func Test_applyProxyClassToStatefulSet(t *testing.T) {
corev1.EnvVar{Name: "TS_DEBUG_ADDR_PORT", Value: "$(POD_IP):9001"},
corev1.EnvVar{Name: "TS_TAILSCALED_EXTRA_ARGS", Value: "--debug=$(TS_DEBUG_ADDR_PORT)"},
corev1.EnvVar{Name: "TS_LOCAL_ADDR_PORT", Value: "$(POD_IP):9002"},
corev1.EnvVar{Name: "TS_METRICS_ENABLED", Value: "true"},
corev1.EnvVar{Name: "TS_ENABLE_METRICS", Value: "true"},
)
wantSS.Spec.Template.Spec.Containers[0].Ports = []corev1.ContainerPort{
{Name: "debug", Protocol: "TCP", ContainerPort: 9001},
@@ -273,7 +273,7 @@ func Test_applyProxyClassToStatefulSet(t *testing.T) {
wantSS = nonUserspaceProxySS.DeepCopy()
wantSS.Spec.Template.Spec.Containers[0].Env = append(wantSS.Spec.Template.Spec.Containers[0].Env,
corev1.EnvVar{Name: "TS_LOCAL_ADDR_PORT", Value: "$(POD_IP):9002"},
corev1.EnvVar{Name: "TS_METRICS_ENABLED", Value: "true"},
corev1.EnvVar{Name: "TS_ENABLE_METRICS", Value: "true"},
)
wantSS.Spec.Template.Spec.Containers[0].Ports = []corev1.ContainerPort{{Name: "metrics", Protocol: "TCP", ContainerPort: 9002}}
gotSS = applyProxyClassToStatefulSet(proxyClassWithMetricsDebug(true, ptr.To(false)), nonUserspaceProxySS.DeepCopy(), new(tailscaleSTSConfig), zl.Sugar())

View File

@@ -121,7 +121,15 @@ func (a *ServiceReconciler) Reconcile(ctx context.Context, req reconcile.Request
return reconcile.Result{}, a.maybeCleanup(ctx, logger, svc)
}
return reconcile.Result{}, a.maybeProvision(ctx, logger, svc)
if err := a.maybeProvision(ctx, logger, svc); err != nil {
if strings.Contains(err.Error(), optimisticLockErrorMsg) {
logger.Infof("optimistic lock error, retrying: %s", err)
} else {
return reconcile.Result{}, err
}
}
return reconcile.Result{}, nil
}
// maybeCleanup removes any existing resources related to serving svc over tailscale.
@@ -131,7 +139,7 @@ func (a *ServiceReconciler) Reconcile(ctx context.Context, req reconcile.Request
func (a *ServiceReconciler) maybeCleanup(ctx context.Context, logger *zap.SugaredLogger, svc *corev1.Service) (err error) {
oldSvcStatus := svc.Status.DeepCopy()
defer func() {
if !apiequality.Semantic.DeepEqual(oldSvcStatus, svc.Status) {
if !apiequality.Semantic.DeepEqual(oldSvcStatus, &svc.Status) {
// An error encountered here should get returned by the Reconcile function.
err = errors.Join(err, a.Client.Status().Update(ctx, svc))
}
@@ -152,7 +160,12 @@ func (a *ServiceReconciler) maybeCleanup(ctx context.Context, logger *zap.Sugare
return nil
}
if done, err := a.ssr.Cleanup(ctx, logger, childResourceLabels(svc.Name, svc.Namespace, "svc")); err != nil {
proxyTyp := proxyTypeEgress
if a.shouldExpose(svc) {
proxyTyp = proxyTypeIngressService
}
if done, err := a.ssr.Cleanup(ctx, logger, childResourceLabels(svc.Name, svc.Namespace, "svc"), proxyTyp); err != nil {
return fmt.Errorf("failed to cleanup: %w", err)
} else if !done {
logger.Debugf("cleanup not done yet, waiting for next reconcile")
@@ -191,7 +204,7 @@ func (a *ServiceReconciler) maybeCleanup(ctx context.Context, logger *zap.Sugare
func (a *ServiceReconciler) maybeProvision(ctx context.Context, logger *zap.SugaredLogger, svc *corev1.Service) (err error) {
oldSvcStatus := svc.Status.DeepCopy()
defer func() {
if !apiequality.Semantic.DeepEqual(oldSvcStatus, svc.Status) {
if !apiequality.Semantic.DeepEqual(oldSvcStatus, &svc.Status) {
// An error encountered here should get returned by the Reconcile function.
err = errors.Join(err, a.Client.Status().Update(ctx, svc))
}
@@ -256,6 +269,10 @@ func (a *ServiceReconciler) maybeProvision(ctx context.Context, logger *zap.Suga
ChildResourceLabels: crl,
ProxyClassName: proxyClass,
}
sts.proxyType = proxyTypeEgress
if a.shouldExpose(svc) {
sts.proxyType = proxyTypeIngressService
}
a.mu.Lock()
if a.shouldExposeClusterIP(svc) {
@@ -311,11 +328,11 @@ func (a *ServiceReconciler) maybeProvision(ctx context.Context, logger *zap.Suga
return nil
}
_, tsHost, tsIPs, err := a.ssr.DeviceInfo(ctx, crl)
dev, err := a.ssr.DeviceInfo(ctx, crl, logger)
if err != nil {
return fmt.Errorf("failed to get device ID: %w", err)
}
if tsHost == "" {
if dev == nil || dev.hostname == "" {
msg := "no Tailscale hostname known yet, waiting for proxy pod to finish auth"
logger.Debug(msg)
// No hostname yet. Wait for the proxy pod to auth.
@@ -324,9 +341,9 @@ func (a *ServiceReconciler) maybeProvision(ctx context.Context, logger *zap.Suga
return nil
}
logger.Debugf("setting Service LoadBalancer status to %q, %s", tsHost, strings.Join(tsIPs, ", "))
logger.Debugf("setting Service LoadBalancer status to %q, %s", dev.hostname, strings.Join(dev.ips, ", "))
ingress := []corev1.LoadBalancerIngress{
{Hostname: tsHost},
{Hostname: dev.hostname},
}
clusterIPAddr, err := netip.ParseAddr(svc.Spec.ClusterIP)
if err != nil {
@@ -334,7 +351,7 @@ func (a *ServiceReconciler) maybeProvision(ctx context.Context, logger *zap.Suga
tsoperator.SetServiceCondition(svc, tsapi.ProxyReady, metav1.ConditionFalse, reasonProxyFailed, msg, a.clock, logger)
return errors.New(msg)
}
for _, ip := range tsIPs {
for _, ip := range dev.ips {
addr, err := netip.ParseAddr(ip)
if err != nil {
continue

View File

@@ -8,6 +8,7 @@ package main
import (
"context"
"encoding/json"
"fmt"
"net/netip"
"reflect"
"strings"
@@ -21,6 +22,7 @@ import (
corev1 "k8s.io/api/core/v1"
apierrors "k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/apis/meta/v1/unstructured"
"k8s.io/apimachinery/pkg/types"
"k8s.io/client-go/tools/record"
"sigs.k8s.io/controller-runtime/pkg/client"
@@ -39,7 +41,10 @@ type configOpts struct {
secretName string
hostname string
namespace string
tailscaleNamespace string
namespaced bool
parentType string
proxyType string
priorityClassName string
firewallMode string
tailnetTargetIP string
@@ -56,6 +61,7 @@ type configOpts struct {
app string
shouldRemoveAuthKey bool
secretExtraData map[string][]byte
enableMetrics bool
}
func expectedSTS(t *testing.T, cl client.Client, opts configOpts) *appsv1.StatefulSet {
@@ -76,9 +82,7 @@ func expectedSTS(t *testing.T, cl client.Client, opts configOpts) *appsv1.Statef
{Name: "TS_EXPERIMENTAL_VERSIONED_CONFIG_DIR", Value: "/etc/tsconfig"},
},
SecurityContext: &corev1.SecurityContext{
Capabilities: &corev1.Capabilities{
Add: []corev1.Capability{"NET_ADMIN"},
},
Privileged: ptr.To(true),
},
ImagePullPolicy: "Always",
}
@@ -152,6 +156,29 @@ func expectedSTS(t *testing.T, cl client.Client, opts configOpts) *appsv1.Statef
Name: "TS_INTERNAL_APP",
Value: opts.app,
})
if opts.enableMetrics {
tsContainer.Env = append(tsContainer.Env,
corev1.EnvVar{
Name: "TS_DEBUG_ADDR_PORT",
Value: "$(POD_IP):9001"},
corev1.EnvVar{
Name: "TS_TAILSCALED_EXTRA_ARGS",
Value: "--debug=$(TS_DEBUG_ADDR_PORT)",
},
corev1.EnvVar{
Name: "TS_LOCAL_ADDR_PORT",
Value: "$(POD_IP):9002",
},
corev1.EnvVar{
Name: "TS_ENABLE_METRICS",
Value: "true",
},
)
tsContainer.Ports = append(tsContainer.Ports,
corev1.ContainerPort{Name: "debug", ContainerPort: 9001, Protocol: "TCP"},
corev1.ContainerPort{Name: "metrics", ContainerPort: 9002, Protocol: "TCP"},
)
}
ss := &appsv1.StatefulSet{
TypeMeta: metav1.TypeMeta{
Kind: "StatefulSet",
@@ -243,6 +270,29 @@ func expectedSTSUserspace(t *testing.T, cl client.Client, opts configOpts) *apps
{Name: "serve-config", ReadOnly: true, MountPath: "/etc/tailscaled"},
},
}
if opts.enableMetrics {
tsContainer.Env = append(tsContainer.Env,
corev1.EnvVar{
Name: "TS_DEBUG_ADDR_PORT",
Value: "$(POD_IP):9001"},
corev1.EnvVar{
Name: "TS_TAILSCALED_EXTRA_ARGS",
Value: "--debug=$(TS_DEBUG_ADDR_PORT)",
},
corev1.EnvVar{
Name: "TS_LOCAL_ADDR_PORT",
Value: "$(POD_IP):9002",
},
corev1.EnvVar{
Name: "TS_ENABLE_METRICS",
Value: "true",
},
)
tsContainer.Ports = append(tsContainer.Ports, corev1.ContainerPort{
Name: "debug", ContainerPort: 9001, Protocol: "TCP"},
corev1.ContainerPort{Name: "metrics", ContainerPort: 9002, Protocol: "TCP"},
)
}
volumes := []corev1.Volume{
{
Name: "tailscaledconfig",
@@ -337,6 +387,87 @@ func expectedHeadlessService(name string, parentType string) *corev1.Service {
}
}
func expectedMetricsService(opts configOpts) *corev1.Service {
labels := metricsLabels(opts)
selector := map[string]string{
"tailscale.com/managed": "true",
"tailscale.com/parent-resource": "test",
"tailscale.com/parent-resource-type": opts.parentType,
}
if opts.namespaced {
selector["tailscale.com/parent-resource-ns"] = opts.namespace
}
return &corev1.Service{
ObjectMeta: metav1.ObjectMeta{
Name: metricsResourceName(opts.stsName),
Namespace: opts.tailscaleNamespace,
Labels: labels,
},
Spec: corev1.ServiceSpec{
Selector: selector,
Type: corev1.ServiceTypeClusterIP,
Ports: []corev1.ServicePort{{Protocol: "TCP", Port: 9002, Name: "metrics"}},
},
}
}
func metricsLabels(opts configOpts) map[string]string {
promJob := fmt.Sprintf("ts_%s_default_test", opts.proxyType)
if !opts.namespaced {
promJob = fmt.Sprintf("ts_%s_test", opts.proxyType)
}
labels := map[string]string{
"tailscale.com/managed": "true",
"tailscale.com/metrics-target": opts.stsName,
"ts_prom_job": promJob,
"ts_proxy_type": opts.proxyType,
"ts_proxy_parent_name": "test",
}
if opts.namespaced {
labels["ts_proxy_parent_namespace"] = "default"
}
return labels
}
func expectedServiceMonitor(t *testing.T, opts configOpts) *unstructured.Unstructured {
t.Helper()
labels := metricsLabels(opts)
name := metricsResourceName(opts.stsName)
sm := &ServiceMonitor{
ObjectMeta: metav1.ObjectMeta{
Name: name,
Namespace: opts.tailscaleNamespace,
Labels: labels,
ResourceVersion: "1",
OwnerReferences: []metav1.OwnerReference{{APIVersion: "v1", Kind: "Service", Name: name, BlockOwnerDeletion: ptr.To(true), Controller: ptr.To(true)}},
},
TypeMeta: metav1.TypeMeta{
Kind: "ServiceMonitor",
APIVersion: "monitoring.coreos.com/v1",
},
Spec: ServiceMonitorSpec{
Selector: metav1.LabelSelector{MatchLabels: labels},
Endpoints: []ServiceMonitorEndpoint{{
Port: "metrics",
}},
NamespaceSelector: ServiceMonitorNamespaceSelector{
MatchNames: []string{opts.tailscaleNamespace},
},
JobLabel: "ts_prom_job",
TargetLabels: []string{
"ts_proxy_parent_name",
"ts_proxy_parent_namespace",
"ts_proxy_type",
},
},
}
u, err := serviceMonitorToUnstructured(sm)
if err != nil {
t.Fatalf("error converting ServiceMonitor to unstructured: %v", err)
}
return u
}
func expectedSecret(t *testing.T, cl client.Client, opts configOpts) *corev1.Secret {
t.Helper()
s := &corev1.Secret{
@@ -353,13 +484,14 @@ func expectedSecret(t *testing.T, cl client.Client, opts configOpts) *corev1.Sec
mak.Set(&s.StringData, "serve-config", string(serveConfigBs))
}
conf := &ipn.ConfigVAlpha{
Version: "alpha0",
AcceptDNS: "false",
Hostname: &opts.hostname,
Locked: "false",
AuthKey: ptr.To("secret-authkey"),
AcceptRoutes: "false",
AppConnector: &ipn.AppConnectorPrefs{Advertise: false},
Version: "alpha0",
AcceptDNS: "false",
Hostname: &opts.hostname,
Locked: "false",
AuthKey: ptr.To("secret-authkey"),
AcceptRoutes: "false",
AppConnector: &ipn.AppConnectorPrefs{Advertise: false},
NoStatefulFiltering: "true",
}
if opts.proxyClass != "" {
t.Logf("applying configuration from ProxyClass %s", opts.proxyClass)
@@ -391,11 +523,6 @@ func expectedSecret(t *testing.T, cl client.Client, opts configOpts) *corev1.Sec
routes = append(routes, prefix)
}
}
if opts.tailnetTargetFQDN != "" || opts.tailnetTargetIP != "" {
conf.NoStatefulFiltering = "true"
} else {
conf.NoStatefulFiltering = "false"
}
conf.AdvertiseRoutes = routes
bnn, err := json.Marshal(conf)
if err != nil {
@@ -508,6 +635,21 @@ func expectEqual[T any, O ptrObject[T]](t *testing.T, client client.Client, want
}
}
func expectEqualUnstructured(t *testing.T, client client.Client, want *unstructured.Unstructured) {
t.Helper()
got := &unstructured.Unstructured{}
got.SetGroupVersionKind(want.GroupVersionKind())
if err := client.Get(context.Background(), types.NamespacedName{
Name: want.GetName(),
Namespace: want.GetNamespace(),
}, got); err != nil {
t.Fatalf("getting %q: %v", want.GetName(), err)
}
if diff := cmp.Diff(got, want); diff != "" {
t.Fatalf("unexpected contents of Unstructured (-got +want):\n%s", diff)
}
}
func expectMissing[T any, O ptrObject[T]](t *testing.T, client client.Client, ns, name string) {
t.Helper()
obj := O(new(T))
@@ -650,7 +792,7 @@ func removeHashAnnotation(sts *appsv1.StatefulSet) {
func removeTargetPortsFromSvc(svc *corev1.Service) {
newPorts := make([]corev1.ServicePort, 0)
for _, p := range svc.Spec.Ports {
newPorts = append(newPorts, corev1.ServicePort{Protocol: p.Protocol, Port: p.Port})
newPorts = append(newPorts, corev1.ServicePort{Protocol: p.Protocol, Port: p.Port, Name: p.Name})
}
svc.Spec.Ports = newPorts
}

View File

@@ -11,6 +11,7 @@ import (
"fmt"
"net/http"
"slices"
"strings"
"sync"
"github.com/pkg/errors"
@@ -38,6 +39,7 @@ import (
const (
reasonRecorderCreationFailed = "RecorderCreationFailed"
reasonRecorderCreating = "RecorderCreating"
reasonRecorderCreated = "RecorderCreated"
reasonRecorderInvalid = "RecorderInvalid"
@@ -102,7 +104,7 @@ func (r *RecorderReconciler) Reconcile(ctx context.Context, req reconcile.Reques
oldTSRStatus := tsr.Status.DeepCopy()
setStatusReady := func(tsr *tsapi.Recorder, status metav1.ConditionStatus, reason, message string) (reconcile.Result, error) {
tsoperator.SetRecorderCondition(tsr, tsapi.RecorderReady, status, reason, message, tsr.Generation, r.clock, logger)
if !apiequality.Semantic.DeepEqual(oldTSRStatus, tsr.Status) {
if !apiequality.Semantic.DeepEqual(oldTSRStatus, &tsr.Status) {
// An error encountered here should get returned by the Reconcile function.
if updateErr := r.Client.Status().Update(ctx, tsr); updateErr != nil {
err = errors.Wrap(err, updateErr.Error())
@@ -119,23 +121,28 @@ func (r *RecorderReconciler) Reconcile(ctx context.Context, req reconcile.Reques
logger.Infof("ensuring Recorder is set up")
tsr.Finalizers = append(tsr.Finalizers, FinalizerName)
if err := r.Update(ctx, tsr); err != nil {
logger.Errorf("error adding finalizer: %w", err)
return setStatusReady(tsr, metav1.ConditionFalse, reasonRecorderCreationFailed, reasonRecorderCreationFailed)
}
}
if err := r.validate(tsr); err != nil {
logger.Errorf("error validating Recorder spec: %w", err)
message := fmt.Sprintf("Recorder is invalid: %s", err)
r.recorder.Eventf(tsr, corev1.EventTypeWarning, reasonRecorderInvalid, message)
return setStatusReady(tsr, metav1.ConditionFalse, reasonRecorderInvalid, message)
}
if err = r.maybeProvision(ctx, tsr); err != nil {
logger.Errorf("error creating Recorder resources: %w", err)
reason := reasonRecorderCreationFailed
message := fmt.Sprintf("failed creating Recorder: %s", err)
r.recorder.Eventf(tsr, corev1.EventTypeWarning, reasonRecorderCreationFailed, message)
return setStatusReady(tsr, metav1.ConditionFalse, reasonRecorderCreationFailed, message)
if strings.Contains(err.Error(), optimisticLockErrorMsg) {
reason = reasonRecorderCreating
message = fmt.Sprintf("optimistic lock error, retrying: %s", err)
err = nil
logger.Info(message)
} else {
r.recorder.Eventf(tsr, corev1.EventTypeWarning, reasonRecorderCreationFailed, message)
}
return setStatusReady(tsr, metav1.ConditionFalse, reason, message)
}
logger.Info("Recorder resources synced")

View File

@@ -5,24 +5,40 @@
package main
import (
"flag"
"log"
"net"
"os"
"strconv"
"time"
"tailscale.com/net/stun"
)
func main() {
log.SetFlags(0)
if len(os.Args) < 2 || len(os.Args) > 3 {
log.Fatalf("usage: %s <hostname> [port]", os.Args[0])
}
host := os.Args[1]
var host string
port := "3478"
if len(os.Args) == 3 {
port = os.Args[2]
var readTimeout time.Duration
flag.DurationVar(&readTimeout, "timeout", 3*time.Second, "response wait timeout")
flag.Parse()
values := flag.Args()
if len(values) < 1 || len(values) > 2 {
log.Printf("usage: %s <hostname> [port]", os.Args[0])
flag.PrintDefaults()
os.Exit(1)
} else {
for i, value := range values {
switch i {
case 0:
host = value
case 1:
port = value
}
}
}
_, err := strconv.ParseUint(port, 10, 16)
if err != nil {
@@ -46,6 +62,10 @@ func main() {
log.Fatal(err)
}
err = c.SetReadDeadline(time.Now().Add(readTimeout))
if err != nil {
log.Fatal(err)
}
var buf [1024]byte
n, raddr, err := c.ReadFromUDPAddrPort(buf[:])
if err != nil {

View File

@@ -175,6 +175,12 @@ var debugCmd = &ffcli.Command{
Exec: localAPIAction("pick-new-derp"),
ShortHelp: "Switch to some other random DERP home region for a short time",
},
{
Name: "force-prefer-derp",
ShortUsage: "tailscale debug force-prefer-derp",
Exec: forcePreferDERP,
ShortHelp: "Prefer the given region ID if reachable (until restart, or 0 to clear)",
},
{
Name: "force-netmap-update",
ShortUsage: "tailscale debug force-netmap-update",
@@ -577,6 +583,25 @@ func runDERPMap(ctx context.Context, args []string) error {
return nil
}
func forcePreferDERP(ctx context.Context, args []string) error {
var n int
if len(args) != 1 {
return errors.New("expected exactly one integer argument")
}
n, err := strconv.Atoi(args[0])
if err != nil {
return fmt.Errorf("expected exactly one integer argument: %w", err)
}
b, err := json.Marshal(n)
if err != nil {
return fmt.Errorf("failed to marshal DERP region: %w", err)
}
if err := localClient.DebugActionBody(ctx, "force-prefer-derp", bytes.NewReader(b)); err != nil {
return fmt.Errorf("failed to force preferred DERP: %w", err)
}
return nil
}
func localAPIAction(action string) func(context.Context, []string) error {
return func(ctx context.Context, args []string) error {
if len(args) > 0 {

View File

@@ -1157,7 +1157,6 @@ func resolveAuthKey(ctx context.Context, v, tags string) (string, error) {
ClientID: "some-client-id", // ignored
ClientSecret: clientSecret,
TokenURL: baseURL + "/api/v2/oauth/token",
Scopes: []string{"device"},
}
tsClient := tailscale.NewClient("-", nil)

View File

@@ -246,7 +246,7 @@ func (a *Dialer) dial(ctx context.Context) (*ClientConn, error) {
results[i].conn = nil // so we don't close it in the defer
return conn, nil
}
merr := multierr.New(multierr.DeduplicateContextErrors(errs)...)
merr := multierr.New(errs...)
// If we get here, then we didn't get anywhere with our dial plan; fall back to just using DNS.
a.logf("controlhttp: failed dialing using DialPlan, falling back to DNS; errs=%s", merr.Error())

View File

@@ -84,11 +84,19 @@ func init() {
}
const (
perClientSendQueueDepth = 32 // packets buffered for sending
writeTimeout = 2 * time.Second
privilegedWriteTimeout = 30 * time.Second // for clients with the mesh key
defaultPerClientSendQueueDepth = 32 // default packets buffered for sending
writeTimeout = 2 * time.Second
privilegedWriteTimeout = 30 * time.Second // for clients with the mesh key
)
func getPerClientSendQueueDepth() int {
if v, ok := envknob.LookupInt("TS_DEBUG_DERP_PER_CLIENT_SEND_QUEUE_DEPTH"); ok {
return v
}
return defaultPerClientSendQueueDepth
}
// dupPolicy is a temporary (2021-08-30) mechanism to change the policy
// of how duplicate connection for the same key are handled.
type dupPolicy int8
@@ -190,6 +198,9 @@ type Server struct {
// maps from netip.AddrPort to a client's public key
keyOfAddr map[netip.AddrPort]key.NodePublic
// Sets the client send queue depth for the server.
perClientSendQueueDepth int
clock tstime.Clock
}
@@ -377,6 +388,8 @@ func NewServer(privateKey key.NodePrivate, logf logger.Logf) *Server {
s.packetsDroppedTypeDisco = s.packetsDroppedType.Get("disco")
s.packetsDroppedTypeOther = s.packetsDroppedType.Get("other")
s.perClientSendQueueDepth = getPerClientSendQueueDepth()
return s
}
@@ -849,8 +862,8 @@ func (s *Server) accept(ctx context.Context, nc Conn, brw *bufio.ReadWriter, rem
done: ctx.Done(),
remoteIPPort: remoteIPPort,
connectedAt: s.clock.Now(),
sendQueue: make(chan pkt, perClientSendQueueDepth),
discoSendQueue: make(chan pkt, perClientSendQueueDepth),
sendQueue: make(chan pkt, s.perClientSendQueueDepth),
discoSendQueue: make(chan pkt, s.perClientSendQueueDepth),
sendPongCh: make(chan [8]byte, 1),
peerGone: make(chan peerGoneMsg),
canMesh: s.isMeshPeer(clientInfo),

View File

@@ -6,6 +6,7 @@ package derp
import (
"bufio"
"bytes"
"cmp"
"context"
"crypto/x509"
"encoding/asn1"
@@ -23,6 +24,7 @@ import (
"testing"
"time"
qt "github.com/frankban/quicktest"
"go4.org/mem"
"golang.org/x/time/rate"
"tailscale.com/disco"
@@ -1598,3 +1600,29 @@ func TestServerRepliesToPing(t *testing.T) {
}
}
}
func TestGetPerClientSendQueueDepth(t *testing.T) {
c := qt.New(t)
envKey := "TS_DEBUG_DERP_PER_CLIENT_SEND_QUEUE_DEPTH"
testCases := []struct {
envVal string
want int
}{
// Empty case, envknob treats empty as missing also.
{
"", defaultPerClientSendQueueDepth,
},
{
"64", 64,
},
}
for _, tc := range testCases {
t.Run(cmp.Or(tc.envVal, "empty"), func(t *testing.T) {
t.Setenv(envKey, tc.envVal)
val := getPerClientSendQueueDepth()
c.Assert(val, qt.Equals, tc.want)
})
}
}

View File

@@ -757,6 +757,9 @@ func (c *Client) dialNode(ctx context.Context, n *tailcfg.DERPNode) (net.Conn, e
}
dst := cmp.Or(dstPrimary, n.HostName)
port := "443"
if !c.useHTTPS() {
port = "3340"
}
if n.DERPPort != 0 {
port = fmt.Sprint(n.DERPPort)
}

View File

@@ -53,6 +53,4 @@ spec:
fieldRef:
fieldPath: metadata.uid
securityContext:
capabilities:
add:
- NET_ADMIN
privileged: true

View File

@@ -35,6 +35,4 @@ spec:
fieldRef:
fieldPath: metadata.uid
securityContext:
capabilities:
add:
- NET_ADMIN
privileged: true

View File

@@ -37,6 +37,4 @@ spec:
fieldRef:
fieldPath: metadata.uid
securityContext:
capabilities:
add:
- NET_ADMIN
privileged: true

14
go.mod
View File

@@ -95,14 +95,14 @@ require (
go.uber.org/zap v1.27.0
go4.org/mem v0.0.0-20220726221520-4f986261bf13
go4.org/netipx v0.0.0-20231129151722-fdeea329fbba
golang.org/x/crypto v0.25.0
golang.org/x/crypto v0.30.0
golang.org/x/exp v0.0.0-20240119083558-1b970713d09a
golang.org/x/mod v0.19.0
golang.org/x/net v0.27.0
golang.org/x/net v0.32.0
golang.org/x/oauth2 v0.16.0
golang.org/x/sync v0.9.0
golang.org/x/sys v0.27.0
golang.org/x/term v0.22.0
golang.org/x/sync v0.10.0
golang.org/x/sys v0.28.0
golang.org/x/term v0.27.0
golang.org/x/time v0.5.0
golang.org/x/tools v0.23.0
golang.zx2c4.com/wintun v0.0.0-20230126152724-0fa3db229ce2
@@ -386,7 +386,7 @@ require (
go.uber.org/multierr v1.11.0 // indirect
golang.org/x/exp/typeparams v0.0.0-20240314144324-c7f7c6466f7f // indirect
golang.org/x/image v0.18.0 // indirect
golang.org/x/text v0.16.0 // indirect
golang.org/x/text v0.21.0 // indirect
gomodules.xyz/jsonpatch/v2 v2.4.0 // indirect
google.golang.org/appengine v1.6.8 // indirect
google.golang.org/protobuf v1.33.0 // indirect
@@ -396,7 +396,7 @@ require (
gopkg.in/yaml.v2 v2.4.0 // indirect
gopkg.in/yaml.v3 v3.0.1
howett.net/plist v1.0.0 // indirect
k8s.io/apiextensions-apiserver v0.30.3 // indirect
k8s.io/apiextensions-apiserver v0.30.3
k8s.io/klog/v2 v2.130.1 // indirect
k8s.io/kube-openapi v0.0.0-20240228011516-70dd3763d340 // indirect
k8s.io/utils v0.0.0-20240711033017-18e509b52bc8

24
go.sum
View File

@@ -1062,8 +1062,8 @@ golang.org/x/crypto v0.1.0/go.mod h1:RecgLatLF4+eUMCP1PoPZQb+cVrJcOPbHkTkbkB9sbw
golang.org/x/crypto v0.3.0/go.mod h1:hebNnKkNXi2UzZN1eVRvBB7co0a+JxK6XbPiWVs/3J4=
golang.org/x/crypto v0.3.1-0.20221117191849-2c476679df9a/go.mod h1:hebNnKkNXi2UzZN1eVRvBB7co0a+JxK6XbPiWVs/3J4=
golang.org/x/crypto v0.7.0/go.mod h1:pYwdfH91IfpZVANVyUOhSIPZaFoJGxTFbZhFTx+dXZU=
golang.org/x/crypto v0.25.0 h1:ypSNr+bnYL2YhwoMt2zPxHFmbAN1KZs/njMG3hxUp30=
golang.org/x/crypto v0.25.0/go.mod h1:T+wALwcMOSE0kXgUAnPAHqTLW+XHgcELELW8VaDgm/M=
golang.org/x/crypto v0.30.0 h1:RwoQn3GkWiMkzlX562cLB7OxWvjH1L8xutO2WoJcRoY=
golang.org/x/crypto v0.30.0/go.mod h1:kDsLvtWBEx7MV9tJOj9bnXsPbxwJQ6csT/x4KIN4Ssk=
golang.org/x/exp v0.0.0-20190121172915-509febef88a4/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA=
golang.org/x/exp v0.0.0-20190306152737-a1d7652674e8/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA=
golang.org/x/exp v0.0.0-20190510132918-efd6b22b2522/go.mod h1:ZjyILWgesfNpC6sMxTJOJm9Kp84zZh5NQWvqDGG3Qr8=
@@ -1153,8 +1153,8 @@ golang.org/x/net v0.2.0/go.mod h1:KqCZLdyyvdV855qA2rE3GC2aiw5xGR5TEjj8smXukLY=
golang.org/x/net v0.5.0/go.mod h1:DivGGAXEgPSlEBzxGzZI+ZLohi+xUj054jfeKui00ws=
golang.org/x/net v0.6.0/go.mod h1:2Tu9+aMcznHK/AK1HMvgo6xiTLG5rD5rZLDS+rp2Bjs=
golang.org/x/net v0.8.0/go.mod h1:QVkue5JL9kW//ek3r6jTKnTFis1tRmNAW2P1shuFdJc=
golang.org/x/net v0.27.0 h1:5K3Njcw06/l2y9vpGCSdcxWOYHOUk3dVNGDXN+FvAys=
golang.org/x/net v0.27.0/go.mod h1:dDi0PyhWNoiUOrAS8uXv/vnScO4wnHQO4mj9fn/RytE=
golang.org/x/net v0.32.0 h1:ZqPmj8Kzc+Y6e0+skZsuACbx+wzMgo5MQsJh9Qd6aYI=
golang.org/x/net v0.32.0/go.mod h1:CwU0IoeOlnQQWJ6ioyFrfRuomB8GKF6KbYXZVyeXNfs=
golang.org/x/oauth2 v0.0.0-20180821212333-d2e6202438be/go.mod h1:N/0e6XlmueqKjAGxoOufVs8QHGRruUQn6yWY3a++T0U=
golang.org/x/oauth2 v0.0.0-20190226205417-e64efc72b421/go.mod h1:gOpvHmFTYa4IltrdGE7lF6nIHvwfUNPOp7c8zoXwtLw=
golang.org/x/oauth2 v0.0.0-20190604053449-0f29369cfe45/go.mod h1:gOpvHmFTYa4IltrdGE7lF6nIHvwfUNPOp7c8zoXwtLw=
@@ -1176,8 +1176,8 @@ golang.org/x/sync v0.0.0-20201207232520-09787c993a3a/go.mod h1:RxMgew5VJxzue5/jJ
golang.org/x/sync v0.0.0-20210220032951-036812b2e83c/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20220722155255-886fb9371eb4/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.1.0/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.9.0 h1:fEo0HyrW1GIgZdpbhCRO0PkJajUS5H9IFUztCgEo2jQ=
golang.org/x/sync v0.9.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
golang.org/x/sync v0.10.0 h1:3NQrjDixjgGwUOCaF8w2+VYHv0Ve/vGYSbdkTa98gmQ=
golang.org/x/sync v0.10.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
golang.org/x/sys v0.0.0-20180830151530-49385e6e1522/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20180905080454-ebe1bf3edb33/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20181116152217-5ac8a444bdc5/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
@@ -1239,8 +1239,8 @@ golang.org/x/sys v0.4.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.4.1-0.20230131160137-e7d7f63158de/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.5.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.27.0 h1:wBqf8DvsY9Y/2P8gAfPDEYNuS30J4lPHJxXSb/nJZ+s=
golang.org/x/sys v0.27.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
golang.org/x/sys v0.28.0 h1:Fksou7UEQUWlKvIdsqzJmUmCX3cZuD2+P3XyyzwMhlA=
golang.org/x/sys v0.28.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo=
golang.org/x/term v0.0.0-20210927222741-03fcf44c2211/go.mod h1:jbD1KX2456YbFQfuXm/mYQcufACuNUgVhRMnK/tPxf8=
golang.org/x/term v0.1.0/go.mod h1:jbD1KX2456YbFQfuXm/mYQcufACuNUgVhRMnK/tPxf8=
@@ -1248,8 +1248,8 @@ golang.org/x/term v0.2.0/go.mod h1:TVmDHMZPmdnySmBfhjOoOdhjzdE1h4u1VwSiw2l1Nuc=
golang.org/x/term v0.4.0/go.mod h1:9P2UbLfCdcvo3p/nzKvsmas4TnlujnuoV9hGgYzW1lQ=
golang.org/x/term v0.5.0/go.mod h1:jMB1sMXY+tzblOD4FWmEbocvup2/aLOaQEp7JmGp78k=
golang.org/x/term v0.6.0/go.mod h1:m6U89DPEgQRMq3DNkDClhWw02AUbt2daBVO4cn4Hv9U=
golang.org/x/term v0.22.0 h1:BbsgPEJULsl2fV/AT3v15Mjva5yXKQDyKf+TbDz7QJk=
golang.org/x/term v0.22.0/go.mod h1:F3qCibpT5AMpCRfhfT53vVJwhLtIVHhB9XDjfFvnMI4=
golang.org/x/term v0.27.0 h1:WP60Sv1nlK1T6SupCHbXzSaN0b9wUmsPoRS9b61A23Q=
golang.org/x/term v0.27.0/go.mod h1:iMsnZpn0cago0GOrHO2+Y7u7JPn5AylBrcoWkElMTSM=
golang.org/x/text v0.0.0-20170915032832-14c0d48ead0c/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.3.1-0.20180807135948-17ff2d5776d2/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
@@ -1262,8 +1262,8 @@ golang.org/x/text v0.4.0/go.mod h1:mrYo+phRRbMaCq/xk9113O4dZlRixOauAjOtrjsXDZ8=
golang.org/x/text v0.6.0/go.mod h1:mrYo+phRRbMaCq/xk9113O4dZlRixOauAjOtrjsXDZ8=
golang.org/x/text v0.7.0/go.mod h1:mrYo+phRRbMaCq/xk9113O4dZlRixOauAjOtrjsXDZ8=
golang.org/x/text v0.8.0/go.mod h1:e1OnstbJyHTd6l/uOt8jFFHp6TRDWZR/bV3emEE/zU8=
golang.org/x/text v0.16.0 h1:a94ExnEXNtEwYLGJSIUxnWoxoRz/ZcCsV63ROupILh4=
golang.org/x/text v0.16.0/go.mod h1:GhwF1Be+LQoKShO3cGOHzqOgRrGaYc9AvblQOmPVHnI=
golang.org/x/text v0.21.0 h1:zyQAAkrwaneQ066sspRyJaG9VNi/YJ1NfzcGB3hZ/qo=
golang.org/x/text v0.21.0/go.mod h1:4IBbMaMmOPCJ8SecivzSH54+73PCFmPWxNTLm+vZkEQ=
golang.org/x/time v0.0.0-20181108054448-85acf8d2951c/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ=
golang.org/x/time v0.0.0-20190308202827-9d24e82272b4/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ=
golang.org/x/time v0.0.0-20191024005414-555d28b269f0/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ=

View File

@@ -331,7 +331,7 @@ func (t *Tracker) SetMetricsRegistry(reg *usermetric.Registry) {
)
t.metricHealthMessage.Set(metricHealthMessageLabel{
Type: "warning",
Type: MetricLabelWarning,
}, expvar.Func(func() any {
if t.nil() {
return 0
@@ -1283,6 +1283,8 @@ func (t *Tracker) LastNoiseDialWasRecent() bool {
return dur < 2*time.Minute
}
const MetricLabelWarning = "warning"
type metricHealthMessageLabel struct {
// TODO: break down by warnable.severity as well?
Type string

View File

@@ -7,11 +7,14 @@ import (
"fmt"
"reflect"
"slices"
"strconv"
"testing"
"time"
"tailscale.com/tailcfg"
"tailscale.com/types/opt"
"tailscale.com/util/usermetric"
"tailscale.com/version"
)
func TestAppendWarnableDebugFlags(t *testing.T) {
@@ -273,7 +276,7 @@ func TestShowUpdateWarnable(t *testing.T) {
wantShow bool
}{
{
desc: "nil CientVersion",
desc: "nil ClientVersion",
check: true,
cv: nil,
wantWarnable: nil,
@@ -348,3 +351,52 @@ func TestShowUpdateWarnable(t *testing.T) {
})
}
}
func TestHealthMetric(t *testing.T) {
unstableBuildWarning := 0
if version.IsUnstableBuild() {
unstableBuildWarning = 1
}
tests := []struct {
desc string
check bool
apply opt.Bool
cv *tailcfg.ClientVersion
wantMetricCount int
}{
// When running in dev, and not initialising the client, there will be two warnings
// by default:
// - is-using-unstable-version (except on the release branch)
// - wantrunning-false
{
desc: "base-warnings",
check: true,
cv: nil,
wantMetricCount: unstableBuildWarning + 1,
},
// with: update-available
{
desc: "update-warning",
check: true,
cv: &tailcfg.ClientVersion{RunningLatest: false, LatestVersion: "1.2.3"},
wantMetricCount: unstableBuildWarning + 2,
},
}
for _, tt := range tests {
t.Run(tt.desc, func(t *testing.T) {
tr := &Tracker{
checkForUpdates: tt.check,
applyUpdates: tt.apply,
latestVersion: tt.cv,
}
tr.SetMetricsRegistry(&usermetric.Registry{})
if val := tr.metricHealthMessage.Get(metricHealthMessageLabel{Type: MetricLabelWarning}).String(); val != strconv.Itoa(tt.wantMetricCount) {
t.Fatalf("metric value: %q, want: %q", val, strconv.Itoa(tt.wantMetricCount))
}
for _, w := range tr.CurrentState().Warnings {
t.Logf("warning: %v", w)
}
})
}
}

View File

@@ -35,8 +35,12 @@ remotes/origin/QTSFW_5.0.0`
}
}
func TestInContainer(t *testing.T) {
if got := inContainer(); !got.EqualBool(false) {
t.Errorf("inContainer = %v; want false due to absence of ts_package_container build tag", got)
func TestPackageTypeNotContainer(t *testing.T) {
var got string
if packageType != nil {
got = packageType()
}
if got == "container" {
t.Fatal("packageType = container; should only happen if build tag ts_package_container is set")
}
}

View File

@@ -1,7 +1,7 @@
// Copyright (c) Tailscale Inc & AUTHORS
// SPDX-License-Identifier: BSD-3-Clause
//go:generate go run tailscale.com/cmd/viewer -type=Prefs,ServeConfig,TCPPortHandler,HTTPHandler,WebServerConfig
//go:generate go run tailscale.com/cmd/viewer -type=Prefs,ServeConfig,ServiceConfig,TCPPortHandler,HTTPHandler,WebServerConfig
// Package ipn implements the interactions between the Tailscale cloud
// control plane and the local network stack.

View File

@@ -105,6 +105,16 @@ func (src *ServeConfig) Clone() *ServeConfig {
}
}
}
if dst.Services != nil {
dst.Services = map[string]*ServiceConfig{}
for k, v := range src.Services {
if v == nil {
dst.Services[k] = nil
} else {
dst.Services[k] = v.Clone()
}
}
}
dst.AllowFunnel = maps.Clone(src.AllowFunnel)
if dst.Foreground != nil {
dst.Foreground = map[string]*ServeConfig{}
@@ -123,11 +133,50 @@ func (src *ServeConfig) Clone() *ServeConfig {
var _ServeConfigCloneNeedsRegeneration = ServeConfig(struct {
TCP map[uint16]*TCPPortHandler
Web map[HostPort]*WebServerConfig
Services map[string]*ServiceConfig
AllowFunnel map[HostPort]bool
Foreground map[string]*ServeConfig
ETag string
}{})
// Clone makes a deep copy of ServiceConfig.
// The result aliases no memory with the original.
func (src *ServiceConfig) Clone() *ServiceConfig {
if src == nil {
return nil
}
dst := new(ServiceConfig)
*dst = *src
if dst.TCP != nil {
dst.TCP = map[uint16]*TCPPortHandler{}
for k, v := range src.TCP {
if v == nil {
dst.TCP[k] = nil
} else {
dst.TCP[k] = ptr.To(*v)
}
}
}
if dst.Web != nil {
dst.Web = map[HostPort]*WebServerConfig{}
for k, v := range src.Web {
if v == nil {
dst.Web[k] = nil
} else {
dst.Web[k] = v.Clone()
}
}
}
return dst
}
// A compilation failure here means this code must be regenerated, with the command at the top of this file.
var _ServiceConfigCloneNeedsRegeneration = ServiceConfig(struct {
TCP map[uint16]*TCPPortHandler
Web map[HostPort]*WebServerConfig
Tun bool
}{})
// Clone makes a deep copy of TCPPortHandler.
// The result aliases no memory with the original.
func (src *TCPPortHandler) Clone() *TCPPortHandler {

View File

@@ -18,7 +18,7 @@ import (
"tailscale.com/types/views"
)
//go:generate go run tailscale.com/cmd/cloner -clonefunc=false -type=Prefs,ServeConfig,TCPPortHandler,HTTPHandler,WebServerConfig
//go:generate go run tailscale.com/cmd/cloner -clonefunc=false -type=Prefs,ServeConfig,ServiceConfig,TCPPortHandler,HTTPHandler,WebServerConfig
// View returns a readonly view of Prefs.
func (p *Prefs) View() PrefsView {
@@ -195,6 +195,12 @@ func (v ServeConfigView) Web() views.MapFn[HostPort, *WebServerConfig, WebServer
})
}
func (v ServeConfigView) Services() views.MapFn[string, *ServiceConfig, ServiceConfigView] {
return views.MapFnOf(v.ж.Services, func(t *ServiceConfig) ServiceConfigView {
return t.View()
})
}
func (v ServeConfigView) AllowFunnel() views.Map[HostPort, bool] {
return views.MapOf(v.ж.AllowFunnel)
}
@@ -210,11 +216,77 @@ func (v ServeConfigView) ETag() string { return v.ж.ETag }
var _ServeConfigViewNeedsRegeneration = ServeConfig(struct {
TCP map[uint16]*TCPPortHandler
Web map[HostPort]*WebServerConfig
Services map[string]*ServiceConfig
AllowFunnel map[HostPort]bool
Foreground map[string]*ServeConfig
ETag string
}{})
// View returns a readonly view of ServiceConfig.
func (p *ServiceConfig) View() ServiceConfigView {
return ServiceConfigView{ж: p}
}
// ServiceConfigView provides a read-only view over ServiceConfig.
//
// Its methods should only be called if `Valid()` returns true.
type ServiceConfigView struct {
// ж is the underlying mutable value, named with a hard-to-type
// character that looks pointy like a pointer.
// It is named distinctively to make you think of how dangerous it is to escape
// to callers. You must not let callers be able to mutate it.
ж *ServiceConfig
}
// Valid reports whether underlying value is non-nil.
func (v ServiceConfigView) Valid() bool { return v.ж != nil }
// AsStruct returns a clone of the underlying value which aliases no memory with
// the original.
func (v ServiceConfigView) AsStruct() *ServiceConfig {
if v.ж == nil {
return nil
}
return v.ж.Clone()
}
func (v ServiceConfigView) MarshalJSON() ([]byte, error) { return json.Marshal(v.ж) }
func (v *ServiceConfigView) UnmarshalJSON(b []byte) error {
if v.ж != nil {
return errors.New("already initialized")
}
if len(b) == 0 {
return nil
}
var x ServiceConfig
if err := json.Unmarshal(b, &x); err != nil {
return err
}
v.ж = &x
return nil
}
func (v ServiceConfigView) TCP() views.MapFn[uint16, *TCPPortHandler, TCPPortHandlerView] {
return views.MapFnOf(v.ж.TCP, func(t *TCPPortHandler) TCPPortHandlerView {
return t.View()
})
}
func (v ServiceConfigView) Web() views.MapFn[HostPort, *WebServerConfig, WebServerConfigView] {
return views.MapFnOf(v.ж.Web, func(t *WebServerConfig) WebServerConfigView {
return t.View()
})
}
func (v ServiceConfigView) Tun() bool { return v.ж.Tun }
// A compilation failure here means this code must be regenerated, with the command at the top of this file.
var _ServiceConfigViewNeedsRegeneration = ServiceConfig(struct {
TCP map[uint16]*TCPPortHandler
Web map[HostPort]*WebServerConfig
Tun bool
}{})
// View returns a readonly view of TCPPortHandler.
func (p *TCPPortHandler) View() TCPPortHandlerView {
return TCPPortHandlerView{ж: p}

View File

@@ -2920,6 +2920,12 @@ func (b *LocalBackend) DebugPickNewDERP() error {
return b.sys.MagicSock.Get().DebugPickNewDERP()
}
// DebugForcePreferDERP forwards to netcheck.DebugForcePreferDERP.
// See its docs.
func (b *LocalBackend) DebugForcePreferDERP(n int) {
b.sys.MagicSock.Get().DebugForcePreferDERP(n)
}
// send delivers n to the connected frontend and any API watchers from
// LocalBackend.WatchNotifications (via the LocalAPI).
//

View File

@@ -563,6 +563,7 @@ func (h *Handler) serveLogTap(w http.ResponseWriter, r *http.Request) {
}
func (h *Handler) serveMetrics(w http.ResponseWriter, r *http.Request) {
metricDebugMetricsCalls.Add(1)
// Require write access out of paranoia that the metrics
// might contain something sensitive.
if !h.PermitWrite {
@@ -576,6 +577,7 @@ func (h *Handler) serveMetrics(w http.ResponseWriter, r *http.Request) {
// serveUserMetrics returns user-facing metrics in Prometheus text
// exposition format.
func (h *Handler) serveUserMetrics(w http.ResponseWriter, r *http.Request) {
metricUserMetricsCalls.Add(1)
h.b.UserMetricsRegistry().Handler(w, r)
}
@@ -632,6 +634,13 @@ func (h *Handler) serveDebug(w http.ResponseWriter, r *http.Request) {
}
case "pick-new-derp":
err = h.b.DebugPickNewDERP()
case "force-prefer-derp":
var n int
err = json.NewDecoder(r.Body).Decode(&n)
if err != nil {
break
}
h.b.DebugForcePreferDERP(n)
case "":
err = fmt.Errorf("missing parameter 'action'")
default:
@@ -2972,7 +2981,9 @@ var (
metricInvalidRequests = clientmetric.NewCounter("localapi_invalid_requests")
// User-visible LocalAPI endpoints.
metricFilePutCalls = clientmetric.NewCounter("localapi_file_put")
metricFilePutCalls = clientmetric.NewCounter("localapi_file_put")
metricDebugMetricsCalls = clientmetric.NewCounter("localapi_debugmetric_requests")
metricUserMetricsCalls = clientmetric.NewCounter("localapi_usermetric_requests")
)
// serveSuggestExitNode serves a POST endpoint for returning a suggested exit node.

View File

@@ -24,6 +24,23 @@ func ServeConfigKey(profileID ProfileID) StateKey {
return StateKey("_serve/" + profileID)
}
// ServiceConfig contains the config information for a single service.
// it contains a bool to indicate if the service is in Tun mode (L3 forwarding).
// If the service is not in Tun mode, the service is configured by the L4 forwarding
// (TCP ports) and/or the L7 forwarding (http handlers) information.
type ServiceConfig struct {
// TCP are the list of TCP port numbers that tailscaled should handle for
// the Tailscale IP addresses. (not subnet routers, etc)
TCP map[uint16]*TCPPortHandler `json:",omitempty"`
// Web maps from "$SNI_NAME:$PORT" to a set of HTTP handlers
// keyed by mount point ("/", "/foo", etc)
Web map[HostPort]*WebServerConfig `json:",omitempty"`
// Tun determines if the service should be using L3 forwarding (Tun mode).
Tun bool `json:",omitempty"`
}
// ServeConfig is the JSON type stored in the StateStore for
// StateKey "_serve/$PROFILE_ID" as returned by ServeConfigKey.
type ServeConfig struct {
@@ -35,6 +52,10 @@ type ServeConfig struct {
// keyed by mount point ("/", "/foo", etc)
Web map[HostPort]*WebServerConfig `json:",omitempty"`
// Services maps from service name to a ServiceConfig. Which describes the
// L3, L4, and L7 forwarding information for the service.
Services map[string]*ServiceConfig `json:",omitempty"`
// AllowFunnel is the set of SNI:port values for which funnel
// traffic is allowed, from trusted ingress peers.
AllowFunnel map[HostPort]bool `json:",omitempty"`

View File

@@ -145,7 +145,7 @@ _Appears in:_
| `image` _string_ | Container image name. By default images are pulled from<br />docker.io/tailscale/tailscale, but the official images are also<br />available at ghcr.io/tailscale/tailscale. Specifying image name here<br />will override any proxy image values specified via the Kubernetes<br />operator's Helm chart values or PROXY_IMAGE env var in the operator<br />Deployment.<br />https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#image | | |
| `imagePullPolicy` _[PullPolicy](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.3/#pullpolicy-v1-core)_ | Image pull policy. One of Always, Never, IfNotPresent. Defaults to Always.<br />https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#image | | Enum: [Always Never IfNotPresent] <br /> |
| `resources` _[ResourceRequirements](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.3/#resourcerequirements-v1-core)_ | Container resource requirements.<br />By default Tailscale Kubernetes operator does not apply any resource<br />requirements. The amount of resources required wil depend on the<br />amount of resources the operator needs to parse, usage patterns and<br />cluster size.<br />https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#resources | | |
| `securityContext` _[SecurityContext](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.3/#securitycontext-v1-core)_ | Container security context.<br />Security context specified here will override the security context by the operator.<br />By default the operator:<br />- sets 'privileged: true' for the init container<br />- set NET_ADMIN capability for tailscale container for proxies that<br />are created for Services or Connector.<br />https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context | | |
| `securityContext` _[SecurityContext](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.3/#securitycontext-v1-core)_ | Container security context.<br />Security context specified here will override the security context set by the operator.<br />By default the operator sets the Tailscale container and the Tailscale init container to privileged<br />for proxies created for Tailscale ingress and egress Service, Connector and ProxyGroup.<br />You can reduce the permissions of the Tailscale container to cap NET_ADMIN by<br />installing device plugin in your cluster and configuring the proxies tun device to be created<br />by the device plugin, see https://github.com/tailscale/tailscale/issues/10814#issuecomment-2479977752<br />https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context | | |
| `debug` _[Debug](#debug)_ | Configuration for enabling extra debug information in the container.<br />Not recommended for production use. | | |
@@ -326,7 +326,8 @@ _Appears in:_
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `enable` _boolean_ | Setting enable to true will make the proxy serve Tailscale metrics<br />at <pod-ip>:9002/metrics.<br />In 1.78.x and 1.80.x, this field also serves as the default value for<br />.spec.statefulSet.pod.tailscaleContainer.debug.enable. From 1.82.0, both<br />fields will independently default to false.<br />Defaults to false. | | |
| `enable` _boolean_ | Setting enable to true will make the proxy serve Tailscale metrics<br />at <pod-ip>:9002/metrics.<br />A metrics Service named <proxy-statefulset>-metrics will also be created in the operator's namespace and will<br />serve the metrics at <service-ip>:9002/metrics.<br />In 1.78.x and 1.80.x, this field also serves as the default value for<br />.spec.statefulSet.pod.tailscaleContainer.debug.enable. From 1.82.0, both<br />fields will independently default to false.<br />Defaults to false. | | |
| `serviceMonitor` _[ServiceMonitor](#servicemonitor)_ | Enable to create a Prometheus ServiceMonitor for scraping the proxy's Tailscale metrics.<br />The ServiceMonitor will select the metrics Service that gets created when metrics are enabled.<br />The ingested metrics for each Service monitor will have labels to identify the proxy:<br />ts_proxy_type: ingress_service\|ingress_resource\|connector\|proxygroup<br />ts_proxy_parent_name: name of the parent resource (i.e name of the Connector, Tailscale Ingress, Tailscale Service or ProxyGroup)<br />ts_proxy_parent_namespace: namespace of the parent resource (if the parent resource is not cluster scoped)<br />job: ts_<proxy type>_[<parent namespace>]_<parent_name> | | |
#### Name
@@ -836,6 +837,22 @@ _Appears in:_
| `name` _string_ | The name of a Kubernetes Secret in the operator's namespace that contains<br />credentials for writing to the configured bucket. Each key-value pair<br />from the secret's data will be mounted as an environment variable. It<br />should include keys for AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY if<br />using a static access key. | | |
#### ServiceMonitor
_Appears in:_
- [Metrics](#metrics)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `enable` _boolean_ | If Enable is set to true, a Prometheus ServiceMonitor will be created. Enable can only be set to true if metrics are enabled. | | |
#### StatefulSet

View File

@@ -10,6 +10,7 @@ import (
"tailscale.com/k8s-operator/apis"
apiextensionsv1 "k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/runtime/schema"
@@ -39,12 +40,18 @@ func init() {
localSchemeBuilder.Register(addKnownTypes)
GlobalScheme = runtime.NewScheme()
// Add core types
if err := scheme.AddToScheme(GlobalScheme); err != nil {
panic(fmt.Sprintf("failed to add k8s.io scheme: %s", err))
}
// Add tailscale.com types
if err := AddToScheme(GlobalScheme); err != nil {
panic(fmt.Sprintf("failed to add tailscale.com scheme: %s", err))
}
// Add apiextensions types (CustomResourceDefinitions/CustomResourceDefinitionLists)
if err := apiextensionsv1.AddToScheme(GlobalScheme); err != nil {
panic(fmt.Sprintf("failed to add apiextensions.k8s.io scheme: %s", err))
}
}
// Adds the list of known types to api.Scheme.

View File

@@ -161,9 +161,12 @@ type Pod struct {
TopologySpreadConstraints []corev1.TopologySpreadConstraint `json:"topologySpreadConstraints,omitempty"`
}
// +kubebuilder:validation:XValidation:rule="!(has(self.serviceMonitor) && self.serviceMonitor.enable && !self.enable)",message="ServiceMonitor can only be enabled if metrics are enabled"
type Metrics struct {
// Setting enable to true will make the proxy serve Tailscale metrics
// at <pod-ip>:9002/metrics.
// A metrics Service named <proxy-statefulset>-metrics will also be created in the operator's namespace and will
// serve the metrics at <service-ip>:9002/metrics.
//
// In 1.78.x and 1.80.x, this field also serves as the default value for
// .spec.statefulSet.pod.tailscaleContainer.debug.enable. From 1.82.0, both
@@ -171,6 +174,20 @@ type Metrics struct {
//
// Defaults to false.
Enable bool `json:"enable"`
// Enable to create a Prometheus ServiceMonitor for scraping the proxy's Tailscale metrics.
// The ServiceMonitor will select the metrics Service that gets created when metrics are enabled.
// The ingested metrics for each Service monitor will have labels to identify the proxy:
// ts_proxy_type: ingress_service|ingress_resource|connector|proxygroup
// ts_proxy_parent_name: name of the parent resource (i.e name of the Connector, Tailscale Ingress, Tailscale Service or ProxyGroup)
// ts_proxy_parent_namespace: namespace of the parent resource (if the parent resource is not cluster scoped)
// job: ts_<proxy type>_[<parent namespace>]_<parent_name>
// +optional
ServiceMonitor *ServiceMonitor `json:"serviceMonitor"`
}
type ServiceMonitor struct {
// If Enable is set to true, a Prometheus ServiceMonitor will be created. Enable can only be set to true if metrics are enabled.
Enable bool `json:"enable"`
}
type Container struct {
@@ -206,11 +223,12 @@ type Container struct {
// +optional
Resources corev1.ResourceRequirements `json:"resources,omitempty"`
// Container security context.
// Security context specified here will override the security context by the operator.
// By default the operator:
// - sets 'privileged: true' for the init container
// - set NET_ADMIN capability for tailscale container for proxies that
// are created for Services or Connector.
// Security context specified here will override the security context set by the operator.
// By default the operator sets the Tailscale container and the Tailscale init container to privileged
// for proxies created for Tailscale ingress and egress Service, Connector and ProxyGroup.
// You can reduce the permissions of the Tailscale container to cap NET_ADMIN by
// installing device plugin in your cluster and configuring the proxies tun device to be created
// by the device plugin, see https://github.com/tailscale/tailscale/issues/10814#issuecomment-2479977752
// https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context
// +optional
SecurityContext *corev1.SecurityContext `json:"securityContext,omitempty"`

View File

@@ -319,6 +319,11 @@ func (in *Env) DeepCopy() *Env {
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *Metrics) DeepCopyInto(out *Metrics) {
*out = *in
if in.ServiceMonitor != nil {
in, out := &in.ServiceMonitor, &out.ServiceMonitor
*out = new(ServiceMonitor)
**out = **in
}
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new Metrics.
@@ -526,7 +531,7 @@ func (in *ProxyClassSpec) DeepCopyInto(out *ProxyClassSpec) {
if in.Metrics != nil {
in, out := &in.Metrics, &out.Metrics
*out = new(Metrics)
**out = **in
(*in).DeepCopyInto(*out)
}
if in.TailscaleConfig != nil {
in, out := &in.TailscaleConfig, &out.TailscaleConfig
@@ -991,6 +996,21 @@ func (in *S3Secret) DeepCopy() *S3Secret {
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *ServiceMonitor) DeepCopyInto(out *ServiceMonitor) {
*out = *in
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ServiceMonitor.
func (in *ServiceMonitor) DeepCopy() *ServiceMonitor {
if in == nil {
return nil
}
out := new(ServiceMonitor)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *StatefulSet) DeepCopyInto(out *StatefulSet) {
*out = *in

View File

@@ -167,3 +167,14 @@ func DNSCfgIsReady(cfg *tsapi.DNSConfig) bool {
cond := cfg.Status.Conditions[idx]
return cond.Status == metav1.ConditionTrue && cond.ObservedGeneration == cfg.Generation
}
func SvcIsReady(svc *corev1.Service) bool {
idx := xslices.IndexFunc(svc.Status.Conditions, func(cond metav1.Condition) bool {
return cond.Type == string(tsapi.ProxyReady)
})
if idx == -1 {
return false
}
cond := svc.Status.Conditions[idx]
return cond.Status == metav1.ConditionTrue
}

View File

@@ -27,4 +27,19 @@ const (
MetricEgressServiceCount = "k8s_egress_service_resources"
MetricProxyGroupEgressCount = "k8s_proxygroup_egress_resources"
MetricProxyGroupIngressCount = "k8s_proxygroup_ingress_resources"
// Keys that containerboot writes to state file that can be used to determine its state.
// fields set in Tailscale state Secret. These are mostly used by the Tailscale Kubernetes operator to determine
// the state of this tailscale device.
KeyDeviceID string = "device_id" // node stable ID of the device
KeyDeviceFQDN string = "device_fqdn" // device's tailnet hostname
KeyDeviceIPs string = "device_ips" // device's tailnet IPs
KeyPodUID string = "pod_uid" // Pod UID
// KeyCapVer contains Tailscale capability version of this proxy instance.
KeyCapVer string = "tailscale_capver"
// KeyHTTPSEndpoint is a name of a field that can be set to the value of any HTTPS endpoint currently exposed by
// this device to the tailnet. This is used by the Kubernetes operator Ingress proxy to communicate to the operator
// that cluster workloads behind the Ingress can now be accessed via the given DNS name over HTTPS.
KeyHTTPSEndpoint string = "https_endpoint"
ValueNoHTTPS string = "no-https"
)

View File

@@ -9,6 +9,7 @@ package logpolicy
import (
"bufio"
"bytes"
"cmp"
"context"
"crypto/tls"
"encoding/json"
@@ -449,25 +450,63 @@ func tryFixLogStateLocation(dir, cmdname string, logf logger.Logf) {
}
}
// New returns a new log policy (a logger and its instance ID) for a given
// collection name.
//
// The netMon parameter is optional. It should be specified in environments where
// Tailscaled is manipulating the routing table.
//
// The logf parameter is optional; if non-nil, information logs (e.g. when
// migrating state) are sent to that logger, and global changes to the log
// package are avoided. If nil, logs will be printed using log.Printf.
// Deprecated: Use [Options.New] instead.
func New(collection string, netMon *netmon.Monitor, health *health.Tracker, logf logger.Logf) *Policy {
return NewWithConfigPath(collection, "", "", netMon, health, logf)
return Options{
Collection: collection,
NetMon: netMon,
Health: health,
Logf: logf,
}.New()
}
// NewWithConfigPath is identical to New, but uses the specified directory and
// command name. If either is empty, it derives them automatically.
//
// The netMon parameter is optional. It should be specified in environments where
// Tailscaled is manipulating the routing table.
// Deprecated: Use [Options.New] instead.
func NewWithConfigPath(collection, dir, cmdName string, netMon *netmon.Monitor, health *health.Tracker, logf logger.Logf) *Policy {
return Options{
Collection: collection,
Dir: dir,
CmdName: cmdName,
NetMon: netMon,
Health: health,
Logf: logf,
}.New()
}
// Options is used to construct a [Policy].
type Options struct {
// Collection is a required collection to upload logs under.
// Collection is a namespace for the type logs.
// For example, logs for a node use "tailnode.log.tailscale.io".
Collection string
// Dir is an optional directory to store the log configuration.
// If empty, [LogsDir] is used.
Dir string
// CmdName is an optional name of the current binary.
// If empty, [version.CmdName] is used.
CmdName string
// NetMon is an optional parameter for monitoring.
// If non-nil, it's used to do faster interface lookups.
NetMon *netmon.Monitor
// Health is an optional parameter for health status.
// If non-nil, it's used to construct the default HTTP client.
Health *health.Tracker
// Logf is an optional logger to use.
// If nil, [log.Printf] will be used instead.
Logf logger.Logf
// HTTPC is an optional client to use upload logs.
// If nil, [TransportOptions.New] is used to construct a new client
// with that particular transport sending logs to the default logs server.
HTTPC *http.Client
}
// New returns a new log policy (a logger and its instance ID).
func (opts Options) New() *Policy {
if hostinfo.IsNATLabGuestVM() {
// In NATLab Gokrazy instances, tailscaled comes up concurently with
// DHCP and the doesn't have DNS for a while. Wait for DHCP first.
@@ -495,23 +534,23 @@ func NewWithConfigPath(collection, dir, cmdName string, netMon *netmon.Monitor,
earlyErrBuf.WriteByte('\n')
}
if dir == "" {
dir = LogsDir(earlyLogf)
if opts.Dir == "" {
opts.Dir = LogsDir(earlyLogf)
}
if cmdName == "" {
cmdName = version.CmdName()
if opts.CmdName == "" {
opts.CmdName = version.CmdName()
}
useStdLogger := logf == nil
useStdLogger := opts.Logf == nil
if useStdLogger {
logf = log.Printf
opts.Logf = log.Printf
}
tryFixLogStateLocation(dir, cmdName, logf)
tryFixLogStateLocation(opts.Dir, opts.CmdName, opts.Logf)
cfgPath := filepath.Join(dir, fmt.Sprintf("%s.log.conf", cmdName))
cfgPath := filepath.Join(opts.Dir, fmt.Sprintf("%s.log.conf", opts.CmdName))
if runtime.GOOS == "windows" {
switch cmdName {
switch opts.CmdName {
case "tailscaled":
// Tailscale 1.14 and before stored state under %LocalAppData%
// (usually "C:\WINDOWS\system32\config\systemprofile\AppData\Local"
@@ -542,7 +581,7 @@ func NewWithConfigPath(collection, dir, cmdName string, netMon *netmon.Monitor,
cfgPath = paths.TryConfigFileMigration(earlyLogf, oldPath, cfgPath)
case "tailscale-ipn":
for _, oldBase := range []string{"wg64.log.conf", "wg32.log.conf"} {
oldConf := filepath.Join(dir, oldBase)
oldConf := filepath.Join(opts.Dir, oldBase)
if fi, err := os.Stat(oldConf); err == nil && fi.Mode().IsRegular() {
cfgPath = paths.TryConfigFileMigration(earlyLogf, oldConf, cfgPath)
break
@@ -555,9 +594,9 @@ func NewWithConfigPath(collection, dir, cmdName string, netMon *netmon.Monitor,
if err != nil {
earlyLogf("logpolicy.ConfigFromFile %v: %v", cfgPath, err)
}
if err := newc.Validate(collection); err != nil {
if err := newc.Validate(opts.Collection); err != nil {
earlyLogf("logpolicy.Config.Validate for %v: %v", cfgPath, err)
newc = NewConfig(collection)
newc = NewConfig(opts.Collection)
if err := newc.Save(cfgPath); err != nil {
earlyLogf("logpolicy.Config.Save for %v: %v", cfgPath, err)
}
@@ -568,31 +607,39 @@ func NewWithConfigPath(collection, dir, cmdName string, netMon *netmon.Monitor,
PrivateID: newc.PrivateID,
Stderr: logWriter{console},
CompressLogs: true,
HTTPC: &http.Client{Transport: NewLogtailTransport(logtail.DefaultHost, netMon, health, logf)},
}
if collection == logtail.CollectionNode {
if opts.Collection == logtail.CollectionNode {
conf.MetricsDelta = clientmetric.EncodeLogTailMetricsDelta
conf.IncludeProcID = true
conf.IncludeProcSequence = true
}
if envknob.NoLogsNoSupport() || testenv.InTest() {
logf("You have disabled logging. Tailscale will not be able to provide support.")
opts.Logf("You have disabled logging. Tailscale will not be able to provide support.")
conf.HTTPC = &http.Client{Transport: noopPretendSuccessTransport{}}
} else {
// Only attach an on-disk filch buffer if we are going to be sending logs.
// No reason to persist them locally just to drop them later.
attachFilchBuffer(&conf, dir, cmdName, logf)
attachFilchBuffer(&conf, opts.Dir, opts.CmdName, opts.Logf)
conf.HTTPC = opts.HTTPC
if val := getLogTarget(); val != "" {
logf("You have enabled a non-default log target. Doing without being told to by Tailscale staff or your network administrator will make getting support difficult.")
conf.BaseURL = val
u, _ := url.Parse(val)
conf.HTTPC = &http.Client{Transport: NewLogtailTransport(u.Host, netMon, health, logf)}
if conf.HTTPC == nil {
logHost := logtail.DefaultHost
if val := getLogTarget(); val != "" {
opts.Logf("You have enabled a non-default log target. Doing without being told to by Tailscale staff or your network administrator will make getting support difficult.")
conf.BaseURL = val
u, _ := url.Parse(val)
logHost = u.Host
}
conf.HTTPC = &http.Client{Transport: TransportOptions{
Host: logHost,
NetMon: opts.NetMon,
Health: opts.Health,
Logf: opts.Logf,
}.New()}
}
}
lw := logtail.NewLogger(conf, logf)
lw := logtail.NewLogger(conf, opts.Logf)
var logOutput io.Writer = lw
@@ -610,19 +657,19 @@ func NewWithConfigPath(collection, dir, cmdName string, netMon *netmon.Monitor,
log.SetOutput(logOutput)
}
logf("Program starting: v%v, Go %v: %#v",
opts.Logf("Program starting: v%v, Go %v: %#v",
version.Long(),
goVersion(),
os.Args)
logf("LogID: %v", newc.PublicID)
opts.Logf("LogID: %v", newc.PublicID)
if earlyErrBuf.Len() != 0 {
logf("%s", earlyErrBuf.Bytes())
opts.Logf("%s", earlyErrBuf.Bytes())
}
return &Policy{
Logtail: lw,
PublicID: newc.PublicID,
Logf: logf,
Logf: opts.Logf,
}
}
@@ -763,23 +810,48 @@ func dialContext(ctx context.Context, netw, addr string, netMon *netmon.Monitor,
return c, err
}
// NewLogtailTransport returns an HTTP Transport particularly suited to uploading
// logs to the given host name. See DialContext for details on how it works.
//
// The netMon parameter is optional. It should be specified in environments where
// Tailscaled is manipulating the routing table.
//
// The logf parameter is optional; if non-nil, logs are printed using the
// provided function; if nil, log.Printf will be used instead.
// Deprecated: Use [TransportOptions.New] instead.
func NewLogtailTransport(host string, netMon *netmon.Monitor, health *health.Tracker, logf logger.Logf) http.RoundTripper {
return TransportOptions{Host: host, NetMon: netMon, Health: health, Logf: logf}.New()
}
// TransportOptions is used to construct an [http.RoundTripper].
type TransportOptions struct {
// Host is the optional hostname of the logs server.
// If empty, then [logtail.DefaultHost] is used.
Host string
// NetMon is an optional parameter for monitoring.
// If non-nil, it's used to do faster interface lookups.
NetMon *netmon.Monitor
// Health is an optional parameter for health status.
// If non-nil, it's used to construct the default HTTP client.
Health *health.Tracker
// Logf is an optional logger to use.
// If nil, [log.Printf] will be used instead.
Logf logger.Logf
// TLSClientConfig is an optional TLS configuration to use.
// If non-nil, the configuration will be cloned.
TLSClientConfig *tls.Config
}
// New returns an HTTP Transport particularly suited to uploading logs
// to the given host name. See [DialContext] for details on how it works.
func (opts TransportOptions) New() http.RoundTripper {
if testenv.InTest() {
return noopPretendSuccessTransport{}
}
if netMon == nil {
netMon = netmon.NewStatic()
if opts.NetMon == nil {
opts.NetMon = netmon.NewStatic()
}
// Start with a copy of http.DefaultTransport and tweak it a bit.
tr := http.DefaultTransport.(*http.Transport).Clone()
if opts.TLSClientConfig != nil {
tr.TLSClientConfig = opts.TLSClientConfig.Clone()
}
tr.Proxy = tshttpproxy.ProxyFromEnvironment
tshttpproxy.SetTransportGetProxyConnectHeader(tr)
@@ -790,10 +862,10 @@ func NewLogtailTransport(host string, netMon *netmon.Monitor, health *health.Tra
tr.DisableCompression = true
// Log whenever we dial:
if logf == nil {
logf = log.Printf
if opts.Logf == nil {
opts.Logf = log.Printf
}
tr.DialContext = MakeDialFunc(netMon, logf)
tr.DialContext = MakeDialFunc(opts.NetMon, opts.Logf)
// We're uploading logs ideally infrequently, with specific timing that will
// change over time. Try to keep the connection open, to avoid repeatedly
@@ -815,7 +887,8 @@ func NewLogtailTransport(host string, netMon *netmon.Monitor, health *health.Tra
tr.TLSNextProto = map[string]func(authority string, c *tls.Conn) http.RoundTripper{}
}
tr.TLSClientConfig = tlsdial.Config(host, health, tr.TLSClientConfig)
host := cmp.Or(opts.Host, logtail.DefaultHost)
tr.TLSClientConfig = tlsdial.Config(host, opts.Health, tr.TLSClientConfig)
// Force TLS 1.3 since we know log.tailscale.io supports it.
tr.TLSClientConfig.MinVersion = tls.VersionTLS13

View File

@@ -213,6 +213,7 @@ type Logger struct {
procSequence uint64
flushTimer tstime.TimerController // used when flushDelay is >0
writeBuf [bufferSize]byte // owned by Write for reuse
bytesBuf bytes.Buffer // owned by appendTextOrJSONLocked for reuse
jsonDec jsontext.Decoder // owned by appendTextOrJSONLocked for reuse
shutdownStartMu sync.Mutex // guards the closing of shutdownStart
@@ -725,9 +726,16 @@ func (l *Logger) appendTextOrJSONLocked(dst, src []byte, level int) []byte {
// whether it contains the reserved "logtail" name at the top-level.
var logtailKeyOffset, logtailValOffset, logtailValLength int
validJSON := func() bool {
// TODO(dsnet): Avoid allocation of bytes.Buffer struct.
// The jsontext.NewDecoder API operates on an io.Reader, for which
// bytes.Buffer provides a means to convert a []byte into an io.Reader.
// However, bytes.NewBuffer normally allocates unless
// we immediately shallow copy it into a pre-allocated Buffer struct.
// See https://go.dev/issue/67004.
l.bytesBuf = *bytes.NewBuffer(src)
defer func() { l.bytesBuf = bytes.Buffer{} }() // avoid pinning src
dec := &l.jsonDec
dec.Reset(bytes.NewBuffer(src))
dec.Reset(&l.bytesBuf)
if tok, err := dec.ReadToken(); tok.Kind() != '{' || err != nil {
return false
}

View File

@@ -236,6 +236,10 @@ type Client struct {
// If false, the default net.Resolver will be used, with no caching.
UseDNSCache bool
// if non-zero, force this DERP region to be preferred in all reports where
// the DERP is found to be reachable.
ForcePreferredDERP int
// For tests
testEnoughRegions int
testCaptivePortalDelay time.Duration
@@ -780,6 +784,12 @@ func (o *GetReportOpts) getLastDERPActivity(region int) time.Time {
return o.GetLastDERPActivity(region)
}
func (c *Client) SetForcePreferredDERP(region int) {
c.mu.Lock()
defer c.mu.Unlock()
c.ForcePreferredDERP = region
}
// GetReport gets a report. The 'opts' argument is optional and can be nil.
// Callers are discouraged from passing a ctx with an arbitrary deadline as this
// may cause GetReport to return prematurely before all reporting methods have
@@ -1221,17 +1231,19 @@ func (c *Client) measureICMPLatency(ctx context.Context, reg *tailcfg.DERPRegion
// Try pinging the first node in the region
node := reg.Nodes[0]
// Get the IPAddr by asking for the UDP address that we would use for
// STUN and then using that IP.
//
// TODO(andrew-d): this is a bit ugly
nodeAddr := c.nodeAddr(ctx, node, probeIPv4)
if !nodeAddr.IsValid() {
if node.STUNPort < 0 {
// If STUN is disabled on a node, interpret that as meaning don't measure latency.
return 0, false, nil
}
const unusedPort = 0
stunAddrPort, ok := c.nodeAddrPort(ctx, node, unusedPort, probeIPv4)
if !ok {
return 0, false, fmt.Errorf("no address for node %v (v4-for-icmp)", node.Name)
}
ip := stunAddrPort.Addr()
addr := &net.IPAddr{
IP: net.IP(nodeAddr.Addr().AsSlice()),
Zone: nodeAddr.Addr().Zone(),
IP: net.IP(ip.AsSlice()),
Zone: ip.Zone(),
}
// Use the unique node.Name field as the packet data to reduce the
@@ -1275,6 +1287,9 @@ func (c *Client) logConciseReport(r *Report, dm *tailcfg.DERPMap) {
if r.CaptivePortal != "" {
fmt.Fprintf(w, " captiveportal=%v", r.CaptivePortal)
}
if c.ForcePreferredDERP != 0 {
fmt.Fprintf(w, " force=%v", c.ForcePreferredDERP)
}
fmt.Fprintf(w, " derp=%v", r.PreferredDERP)
if r.PreferredDERP != 0 {
fmt.Fprintf(w, " derpdist=")
@@ -1433,6 +1448,21 @@ func (c *Client) addReportHistoryAndSetPreferredDERP(rs *reportState, r *Report,
// which undoes any region change we made above.
r.PreferredDERP = prevDERP
}
if c.ForcePreferredDERP != 0 {
// If the forced DERP region probed successfully, or has recent traffic,
// use it.
_, haveLatencySample := r.RegionLatency[c.ForcePreferredDERP]
var recentActivity bool
if lastHeard := rs.opts.getLastDERPActivity(c.ForcePreferredDERP); !lastHeard.IsZero() {
now := c.timeNow()
recentActivity = lastHeard.After(rs.start)
recentActivity = recentActivity || lastHeard.After(now.Add(-PreferredDERPFrameTime))
}
if haveLatencySample || recentActivity {
r.PreferredDERP = c.ForcePreferredDERP
}
}
}
func updateLatency(m map[int]time.Duration, regionID int, d time.Duration) {
@@ -1478,8 +1508,8 @@ func (rs *reportState) runProbe(ctx context.Context, dm *tailcfg.DERPMap, probe
return
}
addr := c.nodeAddr(ctx, node, probe.proto)
if !addr.IsValid() {
addr, ok := c.nodeAddrPort(ctx, node, node.STUNPort, probe.proto)
if !ok {
c.logf("netcheck.runProbe: named node %q has no %v address", probe.node, probe.proto)
return
}
@@ -1528,12 +1558,20 @@ func (rs *reportState) runProbe(ctx context.Context, dm *tailcfg.DERPMap, probe
c.vlogf("sent to %v", addr)
}
// proto is 4 or 6
// If it returns nil, the node is skipped.
func (c *Client) nodeAddr(ctx context.Context, n *tailcfg.DERPNode, proto probeProto) (ap netip.AddrPort) {
port := cmp.Or(n.STUNPort, 3478)
// nodeAddrPort returns the IP:port to send a STUN queries to for a given node.
//
// The provided port should be n.STUNPort, which may be negative to disable STUN.
// If STUN is disabled for this node, it returns ok=false.
// The port parameter is separate for the ICMP caller to provide a fake value.
//
// proto is [probeIPv4] or [probeIPv6].
func (c *Client) nodeAddrPort(ctx context.Context, n *tailcfg.DERPNode, port int, proto probeProto) (_ netip.AddrPort, ok bool) {
var zero netip.AddrPort
if port < 0 || port > 1<<16-1 {
return
return zero, false
}
if port == 0 {
port = 3478
}
if n.STUNTestIP != "" {
ip, err := netip.ParseAddr(n.STUNTestIP)
@@ -1546,7 +1584,7 @@ func (c *Client) nodeAddr(ctx context.Context, n *tailcfg.DERPNode, proto probeP
if proto == probeIPv6 && ip.Is4() {
return
}
return netip.AddrPortFrom(ip, uint16(port))
return netip.AddrPortFrom(ip, uint16(port)), true
}
switch proto {
@@ -1554,20 +1592,20 @@ func (c *Client) nodeAddr(ctx context.Context, n *tailcfg.DERPNode, proto probeP
if n.IPv4 != "" {
ip, _ := netip.ParseAddr(n.IPv4)
if !ip.Is4() {
return
return zero, false
}
return netip.AddrPortFrom(ip, uint16(port))
return netip.AddrPortFrom(ip, uint16(port)), true
}
case probeIPv6:
if n.IPv6 != "" {
ip, _ := netip.ParseAddr(n.IPv6)
if !ip.Is6() {
return
return zero, false
}
return netip.AddrPortFrom(ip, uint16(port))
return netip.AddrPortFrom(ip, uint16(port)), true
}
default:
return
return zero, false
}
// The default lookup function if we don't set UseDNSCache is to use net.DefaultResolver.
@@ -1609,13 +1647,13 @@ func (c *Client) nodeAddr(ctx context.Context, n *tailcfg.DERPNode, proto probeP
addrs, err := lookupIPAddr(ctx, n.HostName)
for _, a := range addrs {
if (a.Is4() && probeIsV4) || (a.Is6() && !probeIsV4) {
return netip.AddrPortFrom(a, uint16(port))
return netip.AddrPortFrom(a, uint16(port)), true
}
}
if err != nil {
c.logf("netcheck: DNS lookup error for %q (node %q region %v): %v", n.HostName, n.Name, n.RegionID, err)
}
return
return zero, false
}
func regionHasDERPNode(r *tailcfg.DERPRegion) bool {

View File

@@ -201,6 +201,7 @@ func TestAddReportHistoryAndSetPreferredDERP(t *testing.T) {
steps []step
homeParams *tailcfg.DERPHomeParams
opts *GetReportOpts
forcedDERP int // if non-zero, force this DERP to be the preferred one
wantDERP int // want PreferredDERP on final step
wantPrevLen int // wanted len(c.prev)
}{
@@ -366,12 +367,65 @@ func TestAddReportHistoryAndSetPreferredDERP(t *testing.T) {
wantPrevLen: 2,
wantDERP: 1, // diff is 11ms, but d2 is greater than 2/3s of d1
},
{
name: "forced_two",
steps: []step{
{time.Second, report("d1", 2, "d2", 3)},
{2 * time.Second, report("d1", 4, "d2", 3)},
},
forcedDERP: 2,
wantPrevLen: 2,
wantDERP: 2,
},
{
name: "forced_two_unavailable",
steps: []step{
{time.Second, report("d1", 2, "d2", 1)},
{2 * time.Second, report("d1", 4)},
},
forcedDERP: 2,
wantPrevLen: 2,
wantDERP: 1,
},
{
name: "forced_two_no_probe_recent_activity",
steps: []step{
{time.Second, report("d1", 2)},
{2 * time.Second, report("d1", 4)},
},
opts: &GetReportOpts{
GetLastDERPActivity: mkLDAFunc(map[int]time.Time{
1: startTime,
2: startTime.Add(time.Second),
}),
},
forcedDERP: 2,
wantPrevLen: 2,
wantDERP: 2,
},
{
name: "forced_two_no_probe_no_recent_activity",
steps: []step{
{time.Second, report("d1", 2)},
{PreferredDERPFrameTime + time.Second, report("d1", 4)},
},
opts: &GetReportOpts{
GetLastDERPActivity: mkLDAFunc(map[int]time.Time{
1: startTime,
2: startTime,
}),
},
forcedDERP: 2,
wantPrevLen: 2,
wantDERP: 1,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
fakeTime := startTime
c := &Client{
TimeNow: func() time.Time { return fakeTime },
TimeNow: func() time.Time { return fakeTime },
ForcePreferredDERP: tt.forcedDERP,
}
dm := &tailcfg.DERPMap{HomeParams: tt.homeParams}
rs := &reportState{
@@ -887,8 +941,8 @@ func TestNodeAddrResolve(t *testing.T) {
c.UseDNSCache = tt
t.Run("IPv4", func(t *testing.T) {
ap := c.nodeAddr(ctx, dn, probeIPv4)
if !ap.IsValid() {
ap, ok := c.nodeAddrPort(ctx, dn, dn.STUNPort, probeIPv4)
if !ok {
t.Fatal("expected valid AddrPort")
}
if !ap.Addr().Is4() {
@@ -902,8 +956,8 @@ func TestNodeAddrResolve(t *testing.T) {
t.Skipf("IPv6 may not work on this machine")
}
ap := c.nodeAddr(ctx, dn, probeIPv6)
if !ap.IsValid() {
ap, ok := c.nodeAddrPort(ctx, dn, dn.STUNPort, probeIPv6)
if !ok {
t.Fatal("expected valid AddrPort")
}
if !ap.Addr().Is6() {
@@ -912,8 +966,8 @@ func TestNodeAddrResolve(t *testing.T) {
t.Logf("got IPv6 addr: %v", ap)
})
t.Run("IPv6 Failure", func(t *testing.T) {
ap := c.nodeAddr(ctx, dnV4Only, probeIPv6)
if ap.IsValid() {
ap, ok := c.nodeAddrPort(ctx, dnV4Only, dn.STUNPort, probeIPv6)
if ok {
t.Fatalf("expected no addr but got: %v", ap)
}
t.Logf("correctly got invalid addr")

View File

@@ -876,9 +876,10 @@ func (t *Wrapper) filterPacketOutboundToWireGuard(p *packet.Parsed, pc *peerConf
if filt.RunOut(p, t.filterFlags) != filter.Accept {
metricPacketOutDropFilter.Add(1)
t.metrics.outboundDroppedPacketsTotal.Add(usermetric.DropLabels{
Reason: usermetric.ReasonACL,
}, 1)
// TODO(#14280): increment a t.metrics.outboundDroppedPacketsTotal here
// once we figure out & document what labels to use for multicast,
// link-local-unicast, IP fragments, etc. But they're not
// usermetric.ReasonACL.
return filter.Drop, gro
}

View File

@@ -453,7 +453,7 @@ func TestFilter(t *testing.T) {
assertMetricPackets(t, "inACL", 3, metricInboundDroppedPacketsACL)
assertMetricPackets(t, "inError", 0, metricInboundDroppedPacketsErr)
assertMetricPackets(t, "outACL", 1, metricOutboundDroppedPacketsACL)
assertMetricPackets(t, "outACL", 0, metricOutboundDroppedPacketsACL)
}
func assertMetricPackets(t *testing.T, metricName string, want, got int64) {

View File

@@ -597,18 +597,22 @@ func newConn(ctx context.Context, dm *tailcfg.DERPMap, n *tailcfg.DERPNode, isPr
if err != nil {
return nil, err
}
cs, ok := dc.TLSConnectionState()
if !ok {
dc.Close()
return nil, errors.New("no TLS state")
}
if len(cs.PeerCertificates) == 0 {
dc.Close()
return nil, errors.New("no peer certificates")
}
if cs.ServerName != n.HostName {
dc.Close()
return nil, fmt.Errorf("TLS server name %q != derp hostname %q", cs.ServerName, n.HostName)
// Only verify TLS state if this is a prober.
if isProber {
cs, ok := dc.TLSConnectionState()
if !ok {
dc.Close()
return nil, errors.New("no TLS state")
}
if len(cs.PeerCertificates) == 0 {
dc.Close()
return nil, errors.New("no peer certificates")
}
if cs.ServerName != n.HostName {
dc.Close()
return nil, fmt.Errorf("TLS server name %q != derp hostname %q", cs.ServerName, n.HostName)
}
}
errc := make(chan error, 1)

View File

@@ -10,7 +10,6 @@ import (
"bytes"
"context"
"crypto/rand"
"encoding/base64"
"encoding/json"
"errors"
"fmt"
@@ -45,7 +44,6 @@ import (
"tailscale.com/util/clientmetric"
"tailscale.com/util/httpm"
"tailscale.com/util/mak"
"tailscale.com/util/slicesx"
)
var (
@@ -80,16 +78,14 @@ type server struct {
logf logger.Logf
tailscaledPath string
pubKeyHTTPClient *http.Client // or nil for http.DefaultClient
timeNow func() time.Time // or nil for time.Now
timeNow func() time.Time // or nil for time.Now
sessionWaitGroup sync.WaitGroup
// mu protects the following
mu sync.Mutex
activeConns map[*conn]bool // set; value is always true
fetchPublicKeysCache map[string]pubKeyCacheEntry // by https URL
shutdownCalled bool
mu sync.Mutex
activeConns map[*conn]bool // set; value is always true
shutdownCalled bool
}
func (srv *server) now() time.Time {
@@ -204,7 +200,6 @@ func (srv *server) OnPolicyChange() {
//
// Do the user auth
// - NoClientAuthHandler
// - PublicKeyHandler (only if NoClientAuthHandler returns errPubKeyRequired)
//
// Once auth is done, the conn can be multiplexed with multiple sessions and
// channels concurrently. At which point any of the following can be called
@@ -234,10 +229,9 @@ type conn struct {
finalAction *tailcfg.SSHAction // set by doPolicyAuth or resolveNextAction
finalActionErr error // set by doPolicyAuth or resolveNextAction
info *sshConnInfo // set by setInfo
localUser *userMeta // set by doPolicyAuth
userGroupIDs []string // set by doPolicyAuth
pubKey gossh.PublicKey // set by doPolicyAuth
info *sshConnInfo // set by setInfo
localUser *userMeta // set by doPolicyAuth
userGroupIDs []string // set by doPolicyAuth
acceptEnv []string
// mu protects the following fields.
@@ -268,9 +262,6 @@ func (c *conn) isAuthorized(ctx ssh.Context) error {
action := c.currentAction
for {
if action.Accept {
if c.pubKey != nil {
metricPublicKeyAccepts.Add(1)
}
return nil
}
if action.Reject || action.HoldAndDelegate == "" {
@@ -293,10 +284,6 @@ func (c *conn) isAuthorized(ctx ssh.Context) error {
// policy.
var errDenied = errors.New("ssh: access denied")
// errPubKeyRequired is returned by NoClientAuthCallback to make the client
// resort to public-key auth; not user visible.
var errPubKeyRequired = errors.New("ssh publickey required")
// NoClientAuthCallback implements gossh.NoClientAuthCallback and is called by
// the ssh.Server when the client first connects with the "none"
// authentication method.
@@ -305,13 +292,12 @@ var errPubKeyRequired = errors.New("ssh publickey required")
// starting it afresh). It returns an error if the policy evaluation fails, or
// if the decision is "reject"
//
// It either returns nil (accept) or errPubKeyRequired or errDenied
// (reject). The errors may be wrapped.
// It either returns nil (accept) or errDenied (reject). The errors may be wrapped.
func (c *conn) NoClientAuthCallback(ctx ssh.Context) error {
if c.insecureSkipTailscaleAuth {
return nil
}
if err := c.doPolicyAuth(ctx, nil /* no pub key */); err != nil {
if err := c.doPolicyAuth(ctx); err != nil {
return err
}
if err := c.isAuthorized(ctx); err != nil {
@@ -332,8 +318,6 @@ func (c *conn) nextAuthMethodCallback(cm gossh.ConnMetadata, prevErrors []error)
switch {
case c.anyPasswordIsOkay:
nextMethod = append(nextMethod, "password")
case slicesx.LastEqual(prevErrors, errPubKeyRequired):
nextMethod = append(nextMethod, "publickey")
}
// The fake "tailscale" method is always appended to next so OpenSSH renders
@@ -353,41 +337,20 @@ func (c *conn) fakePasswordHandler(ctx ssh.Context, password string) bool {
return c.anyPasswordIsOkay
}
// PublicKeyHandler implements ssh.PublicKeyHandler is called by the
// ssh.Server when the client presents a public key.
func (c *conn) PublicKeyHandler(ctx ssh.Context, pubKey ssh.PublicKey) error {
if err := c.doPolicyAuth(ctx, pubKey); err != nil {
// TODO(maisem/bradfitz): surface the error here.
c.logf("rejecting SSH public key %s: %v", bytes.TrimSpace(gossh.MarshalAuthorizedKey(pubKey)), err)
return err
}
if err := c.isAuthorized(ctx); err != nil {
return err
}
c.logf("accepting SSH public key %s", bytes.TrimSpace(gossh.MarshalAuthorizedKey(pubKey)))
return nil
}
// doPolicyAuth verifies that conn can proceed with the specified (optional)
// pubKey. It returns nil if the matching policy action is Accept or
// HoldAndDelegate. If pubKey is nil, there was no policy match but there is a
// policy that might match a public key it returns errPubKeyRequired. Otherwise,
// it returns errDenied.
func (c *conn) doPolicyAuth(ctx ssh.Context, pubKey ssh.PublicKey) error {
// doPolicyAuth verifies that conn can proceed.
// It returns nil if the matching policy action is Accept or
// HoldAndDelegate. Otherwise, it returns errDenied.
func (c *conn) doPolicyAuth(ctx ssh.Context) error {
if err := c.setInfo(ctx); err != nil {
c.logf("failed to get conninfo: %v", err)
return errDenied
}
a, localUser, acceptEnv, err := c.evaluatePolicy(pubKey)
a, localUser, acceptEnv, err := c.evaluatePolicy()
if err != nil {
if pubKey == nil && c.havePubKeyPolicy() {
return errPubKeyRequired
}
return fmt.Errorf("%w: %v", errDenied, err)
}
c.action0 = a
c.currentAction = a
c.pubKey = pubKey
c.acceptEnv = acceptEnv
if a.Message != "" {
if err := ctx.SendAuthBanner(a.Message); err != nil {
@@ -448,7 +411,6 @@ func (srv *server) newConn() (*conn, error) {
ServerConfigCallback: c.ServerConfig,
NoClientAuthHandler: c.NoClientAuthCallback,
PublicKeyHandler: c.PublicKeyHandler,
PasswordHandler: c.fakePasswordHandler,
Handler: c.handleSessionPostSSHAuth,
@@ -516,34 +478,6 @@ func (c *conn) mayForwardLocalPortTo(ctx ssh.Context, destinationHost string, de
return false
}
// havePubKeyPolicy reports whether any policy rule may provide access by means
// of a ssh.PublicKey.
func (c *conn) havePubKeyPolicy() bool {
if c.info == nil {
panic("havePubKeyPolicy called before setInfo")
}
// Is there any rule that looks like it'd require a public key for this
// sshUser?
pol, ok := c.sshPolicy()
if !ok {
return false
}
for _, r := range pol.Rules {
if c.ruleExpired(r) {
continue
}
if mapLocalUser(r.SSHUsers, c.info.sshUser) == "" {
continue
}
for _, p := range r.Principals {
if len(p.PubKeys) > 0 && c.principalMatchesTailscaleIdentity(p) {
return true
}
}
}
return false
}
// sshPolicy returns the SSHPolicy for current node.
// If there is no SSHPolicy in the netmap, it returns a debugPolicy
// if one is defined.
@@ -620,117 +554,19 @@ func (c *conn) setInfo(ctx ssh.Context) error {
}
// evaluatePolicy returns the SSHAction and localUser after evaluating
// the SSHPolicy for this conn. The pubKey may be nil for "none" auth.
func (c *conn) evaluatePolicy(pubKey gossh.PublicKey) (_ *tailcfg.SSHAction, localUser string, acceptEnv []string, _ error) {
// the SSHPolicy for this conn.
func (c *conn) evaluatePolicy() (_ *tailcfg.SSHAction, localUser string, acceptEnv []string, _ error) {
pol, ok := c.sshPolicy()
if !ok {
return nil, "", nil, fmt.Errorf("tailssh: rejecting connection; no SSH policy")
}
a, localUser, acceptEnv, ok := c.evalSSHPolicy(pol, pubKey)
a, localUser, acceptEnv, ok := c.evalSSHPolicy(pol)
if !ok {
return nil, "", nil, fmt.Errorf("tailssh: rejecting connection; no matching policy")
}
return a, localUser, acceptEnv, nil
}
// pubKeyCacheEntry is the cache value for an HTTPS URL of public keys (like
// "https://github.com/foo.keys")
type pubKeyCacheEntry struct {
lines []string
etag string // if sent by server
at time.Time
}
const (
pubKeyCacheDuration = time.Minute // how long to cache non-empty public keys
pubKeyCacheEmptyDuration = 15 * time.Second // how long to cache empty responses
)
func (srv *server) fetchPublicKeysURLCached(url string) (ce pubKeyCacheEntry, ok bool) {
srv.mu.Lock()
defer srv.mu.Unlock()
// Mostly don't care about the size of this cache. Clean rarely.
if m := srv.fetchPublicKeysCache; len(m) > 50 {
tooOld := srv.now().Add(pubKeyCacheDuration * 10)
for k, ce := range m {
if ce.at.Before(tooOld) {
delete(m, k)
}
}
}
ce, ok = srv.fetchPublicKeysCache[url]
if !ok {
return ce, false
}
maxAge := pubKeyCacheDuration
if len(ce.lines) == 0 {
maxAge = pubKeyCacheEmptyDuration
}
return ce, srv.now().Sub(ce.at) < maxAge
}
func (srv *server) pubKeyClient() *http.Client {
if srv.pubKeyHTTPClient != nil {
return srv.pubKeyHTTPClient
}
return http.DefaultClient
}
// fetchPublicKeysURL fetches the public keys from a URL. The strings are in the
// the typical public key "type base64-string [comment]" format seen at e.g.
// https://github.com/USER.keys
func (srv *server) fetchPublicKeysURL(url string) ([]string, error) {
if !strings.HasPrefix(url, "https://") {
return nil, errors.New("invalid URL scheme")
}
ce, ok := srv.fetchPublicKeysURLCached(url)
if ok {
return ce.lines, nil
}
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
req, err := http.NewRequestWithContext(ctx, "GET", url, nil)
if err != nil {
return nil, err
}
if ce.etag != "" {
req.Header.Add("If-None-Match", ce.etag)
}
res, err := srv.pubKeyClient().Do(req)
if err != nil {
return nil, err
}
defer res.Body.Close()
var lines []string
var etag string
switch res.StatusCode {
default:
err = fmt.Errorf("unexpected status %v", res.Status)
srv.logf("fetching public keys from %s: %v", url, err)
case http.StatusNotModified:
lines = ce.lines
etag = ce.etag
case http.StatusOK:
var all []byte
all, err = io.ReadAll(io.LimitReader(res.Body, 4<<10))
if s := strings.TrimSpace(string(all)); s != "" {
lines = strings.Split(s, "\n")
}
etag = res.Header.Get("Etag")
}
srv.mu.Lock()
defer srv.mu.Unlock()
mak.Set(&srv.fetchPublicKeysCache, url, pubKeyCacheEntry{
at: srv.now(),
lines: lines,
etag: etag,
})
return lines, err
}
// handleSessionPostSSHAuth runs an SSH session after the SSH-level authentication,
// but not necessarily before all the Tailscale-level extra verification has
// completed. It also handles SFTP requests.
@@ -832,18 +668,6 @@ func (c *conn) expandDelegateURLLocked(actionURL string) string {
).Replace(actionURL)
}
func (c *conn) expandPublicKeyURL(pubKeyURL string) string {
if !strings.Contains(pubKeyURL, "$") {
return pubKeyURL
}
loginName := c.info.uprof.LoginName
localPart, _, _ := strings.Cut(loginName, "@")
return strings.NewReplacer(
"$LOGINNAME_EMAIL", loginName,
"$LOGINNAME_LOCALPART", localPart,
).Replace(pubKeyURL)
}
// sshSession is an accepted Tailscale SSH session.
type sshSession struct {
ssh.Session
@@ -894,7 +718,7 @@ func (c *conn) newSSHSession(s ssh.Session) *sshSession {
// isStillValid reports whether the conn is still valid.
func (c *conn) isStillValid() bool {
a, localUser, _, err := c.evaluatePolicy(c.pubKey)
a, localUser, _, err := c.evaluatePolicy()
c.vlogf("stillValid: %+v %v %v", a, localUser, err)
if err != nil {
return false
@@ -1277,9 +1101,9 @@ func (c *conn) ruleExpired(r *tailcfg.SSHRule) bool {
return r.RuleExpires.Before(c.srv.now())
}
func (c *conn) evalSSHPolicy(pol *tailcfg.SSHPolicy, pubKey gossh.PublicKey) (a *tailcfg.SSHAction, localUser string, acceptEnv []string, ok bool) {
func (c *conn) evalSSHPolicy(pol *tailcfg.SSHPolicy) (a *tailcfg.SSHAction, localUser string, acceptEnv []string, ok bool) {
for _, r := range pol.Rules {
if a, localUser, acceptEnv, err := c.matchRule(r, pubKey); err == nil {
if a, localUser, acceptEnv, err := c.matchRule(r); err == nil {
return a, localUser, acceptEnv, true
}
}
@@ -1296,7 +1120,7 @@ var (
errInvalidConn = errors.New("invalid connection state")
)
func (c *conn) matchRule(r *tailcfg.SSHRule, pubKey gossh.PublicKey) (a *tailcfg.SSHAction, localUser string, acceptEnv []string, err error) {
func (c *conn) matchRule(r *tailcfg.SSHRule) (a *tailcfg.SSHAction, localUser string, acceptEnv []string, err error) {
defer func() {
c.vlogf("matchRule(%+v): %v", r, err)
}()
@@ -1326,9 +1150,7 @@ func (c *conn) matchRule(r *tailcfg.SSHRule, pubKey gossh.PublicKey) (a *tailcfg
return nil, "", nil, errUserMatch
}
}
if ok, err := c.anyPrincipalMatches(r.Principals, pubKey); err != nil {
return nil, "", nil, err
} else if !ok {
if !c.anyPrincipalMatches(r.Principals) {
return nil, "", nil, errPrincipalMatch
}
return r.Action, localUser, r.AcceptEnv, nil
@@ -1345,30 +1167,20 @@ func mapLocalUser(ruleSSHUsers map[string]string, reqSSHUser string) (localUser
return v
}
func (c *conn) anyPrincipalMatches(ps []*tailcfg.SSHPrincipal, pubKey gossh.PublicKey) (bool, error) {
func (c *conn) anyPrincipalMatches(ps []*tailcfg.SSHPrincipal) bool {
for _, p := range ps {
if p == nil {
continue
}
if ok, err := c.principalMatches(p, pubKey); err != nil {
return false, err
} else if ok {
return true, nil
if c.principalMatchesTailscaleIdentity(p) {
return true
}
}
return false, nil
}
func (c *conn) principalMatches(p *tailcfg.SSHPrincipal, pubKey gossh.PublicKey) (bool, error) {
if !c.principalMatchesTailscaleIdentity(p) {
return false, nil
}
return c.principalMatchesPubKey(p, pubKey)
return false
}
// principalMatchesTailscaleIdentity reports whether one of p's four fields
// that match the Tailscale identity match (Node, NodeIP, UserLogin, Any).
// This function does not consider PubKeys.
func (c *conn) principalMatchesTailscaleIdentity(p *tailcfg.SSHPrincipal) bool {
ci := c.info
if p.Any {
@@ -1388,42 +1200,6 @@ func (c *conn) principalMatchesTailscaleIdentity(p *tailcfg.SSHPrincipal) bool {
return false
}
func (c *conn) principalMatchesPubKey(p *tailcfg.SSHPrincipal, clientPubKey gossh.PublicKey) (bool, error) {
if len(p.PubKeys) == 0 {
return true, nil
}
if clientPubKey == nil {
return false, nil
}
knownKeys := p.PubKeys
if len(knownKeys) == 1 && strings.HasPrefix(knownKeys[0], "https://") {
var err error
knownKeys, err = c.srv.fetchPublicKeysURL(c.expandPublicKeyURL(knownKeys[0]))
if err != nil {
return false, err
}
}
for _, knownKey := range knownKeys {
if pubKeyMatchesAuthorizedKey(clientPubKey, knownKey) {
return true, nil
}
}
return false, nil
}
func pubKeyMatchesAuthorizedKey(pubKey ssh.PublicKey, wantKey string) bool {
wantKeyType, rest, ok := strings.Cut(wantKey, " ")
if !ok {
return false
}
if pubKey.Type() != wantKeyType {
return false
}
wantKeyB64, _, _ := strings.Cut(rest, " ")
wantKeyData, _ := base64.StdEncoding.DecodeString(wantKeyB64)
return len(wantKeyData) > 0 && bytes.Equal(pubKey.Marshal(), wantKeyData)
}
func randBytes(n int) []byte {
b := make([]byte, n)
if _, err := rand.Read(b); err != nil {
@@ -1749,7 +1525,6 @@ func envEq(a, b string) bool {
var (
metricActiveSessions = clientmetric.NewGauge("ssh_active_sessions")
metricIncomingConnections = clientmetric.NewCounter("ssh_incoming_connections")
metricPublicKeyAccepts = clientmetric.NewCounter("ssh_publickey_accepts") // accepted subset of ssh_publickey_connections
metricTerminalAccept = clientmetric.NewCounter("ssh_terminalaction_accept")
metricTerminalReject = clientmetric.NewCounter("ssh_terminalaction_reject")
metricTerminalMalformed = clientmetric.NewCounter("ssh_terminalaction_malformed")

View File

@@ -10,7 +10,6 @@ import (
"context"
"crypto/ed25519"
"crypto/rand"
"crypto/sha256"
"encoding/json"
"errors"
"fmt"
@@ -229,7 +228,7 @@ func TestMatchRule(t *testing.T) {
info: tt.ci,
srv: &server{logf: t.Logf},
}
got, gotUser, gotAcceptEnv, err := c.matchRule(tt.rule, nil)
got, gotUser, gotAcceptEnv, err := c.matchRule(tt.rule)
if err != tt.wantErr {
t.Errorf("err = %v; want %v", err, tt.wantErr)
}
@@ -348,7 +347,7 @@ func TestEvalSSHPolicy(t *testing.T) {
info: tt.ci,
srv: &server{logf: t.Logf},
}
got, gotUser, gotAcceptEnv, match := c.evalSSHPolicy(tt.policy, nil)
got, gotUser, gotAcceptEnv, match := c.evalSSHPolicy(tt.policy)
if match != tt.wantMatch {
t.Errorf("match = %v; want %v", match, tt.wantMatch)
}
@@ -1129,89 +1128,6 @@ func parseEnv(out []byte) map[string]string {
return e
}
func TestPublicKeyFetching(t *testing.T) {
var reqsTotal, reqsIfNoneMatchHit, reqsIfNoneMatchMiss int32
ts := httptest.NewUnstartedServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
atomic.AddInt32((&reqsTotal), 1)
etag := fmt.Sprintf("W/%q", sha256.Sum256([]byte(r.URL.Path)))
w.Header().Set("Etag", etag)
if v := r.Header.Get("If-None-Match"); v != "" {
if v == etag {
atomic.AddInt32(&reqsIfNoneMatchHit, 1)
w.WriteHeader(304)
return
}
atomic.AddInt32(&reqsIfNoneMatchMiss, 1)
}
io.WriteString(w, "foo\nbar\n"+string(r.URL.Path)+"\n")
}))
ts.StartTLS()
defer ts.Close()
keys := ts.URL
clock := &tstest.Clock{}
srv := &server{
pubKeyHTTPClient: ts.Client(),
timeNow: clock.Now,
}
for range 2 {
got, err := srv.fetchPublicKeysURL(keys + "/alice.keys")
if err != nil {
t.Fatal(err)
}
if want := []string{"foo", "bar", "/alice.keys"}; !reflect.DeepEqual(got, want) {
t.Errorf("got %q; want %q", got, want)
}
}
if got, want := atomic.LoadInt32(&reqsTotal), int32(1); got != want {
t.Errorf("got %d requests; want %d", got, want)
}
if got, want := atomic.LoadInt32(&reqsIfNoneMatchHit), int32(0); got != want {
t.Errorf("got %d etag hits; want %d", got, want)
}
clock.Advance(5 * time.Minute)
got, err := srv.fetchPublicKeysURL(keys + "/alice.keys")
if err != nil {
t.Fatal(err)
}
if want := []string{"foo", "bar", "/alice.keys"}; !reflect.DeepEqual(got, want) {
t.Errorf("got %q; want %q", got, want)
}
if got, want := atomic.LoadInt32(&reqsTotal), int32(2); got != want {
t.Errorf("got %d requests; want %d", got, want)
}
if got, want := atomic.LoadInt32(&reqsIfNoneMatchHit), int32(1); got != want {
t.Errorf("got %d etag hits; want %d", got, want)
}
if got, want := atomic.LoadInt32(&reqsIfNoneMatchMiss), int32(0); got != want {
t.Errorf("got %d etag misses; want %d", got, want)
}
}
func TestExpandPublicKeyURL(t *testing.T) {
c := &conn{
info: &sshConnInfo{
uprof: tailcfg.UserProfile{
LoginName: "bar@baz.tld",
},
},
}
if got, want := c.expandPublicKeyURL("foo"), "foo"; got != want {
t.Errorf("basic: got %q; want %q", got, want)
}
if got, want := c.expandPublicKeyURL("https://example.com/$LOGINNAME_LOCALPART.keys"), "https://example.com/bar.keys"; got != want {
t.Errorf("localpart: got %q; want %q", got, want)
}
if got, want := c.expandPublicKeyURL("https://example.com/keys?email=$LOGINNAME_EMAIL"), "https://example.com/keys?email=bar@baz.tld"; got != want {
t.Errorf("email: got %q; want %q", got, want)
}
c.info = new(sshConnInfo)
if got, want := c.expandPublicKeyURL("https://example.com/keys?email=$LOGINNAME_EMAIL"), "https://example.com/keys?email="; got != want {
t.Errorf("on empty: got %q; want %q", got, want)
}
}
func TestAcceptEnvPair(t *testing.T) {
tests := []struct {
in string

View File

@@ -152,7 +152,8 @@ type CapabilityVersion int
// - 107: 2024-10-30: add App Connector to conffile (PR #13942)
// - 108: 2024-11-08: Client sends ServicesHash in Hostinfo, understands c2n GET /vip-services.
// - 109: 2024-11-18: Client supports filtertype.Match.SrcCaps (issue #12542)
const CurrentCapabilityVersion CapabilityVersion = 109
// - 110: 2024-12-12: removed never-before-used Tailscale SSH public key support (#14373)
const CurrentCapabilityVersion CapabilityVersion = 110
type StableID string
@@ -2525,16 +2526,13 @@ type SSHPrincipal struct {
Any bool `json:"any,omitempty"` // if true, match any connection
// TODO(bradfitz): add StableUserID, once that exists
// PubKeys, if non-empty, means that this SSHPrincipal only
// matches if one of these public keys is presented by the user.
// UnusedPubKeys was public key support. It never became an official product
// feature and so as of 2024-12-12 is being removed.
// This stub exists to remind us not to re-use the JSON field name "pubKeys"
// in the future if we bring it back with different semantics.
//
// As a special case, if len(PubKeys) == 1 and PubKeys[0] starts
// with "https://", then it's fetched (like https://github.com/username.keys).
// In that case, the following variable expansions are also supported
// in the URL:
// * $LOGINNAME_EMAIL ("foo@bar.com" or "foo@github")
// * $LOGINNAME_LOCALPART (the "foo" from either of the above)
PubKeys []string `json:"pubKeys,omitempty"`
// Deprecated: do not use. It does nothing.
UnusedPubKeys []string `json:"pubKeys,omitempty"`
}
// SSHAction is how to handle an incoming connection.

View File

@@ -556,17 +556,17 @@ func (src *SSHPrincipal) Clone() *SSHPrincipal {
}
dst := new(SSHPrincipal)
*dst = *src
dst.PubKeys = append(src.PubKeys[:0:0], src.PubKeys...)
dst.UnusedPubKeys = append(src.UnusedPubKeys[:0:0], src.UnusedPubKeys...)
return dst
}
// A compilation failure here means this code must be regenerated, with the command at the top of this file.
var _SSHPrincipalCloneNeedsRegeneration = SSHPrincipal(struct {
Node StableNodeID
NodeIP string
UserLogin string
Any bool
PubKeys []string
Node StableNodeID
NodeIP string
UserLogin string
Any bool
UnusedPubKeys []string
}{})
// Clone makes a deep copy of ControlDialPlan.

View File

@@ -1260,19 +1260,21 @@ func (v *SSHPrincipalView) UnmarshalJSON(b []byte) error {
return nil
}
func (v SSHPrincipalView) Node() StableNodeID { return v.ж.Node }
func (v SSHPrincipalView) NodeIP() string { return v.ж.NodeIP }
func (v SSHPrincipalView) UserLogin() string { return v.ж.UserLogin }
func (v SSHPrincipalView) Any() bool { return v.ж.Any }
func (v SSHPrincipalView) PubKeys() views.Slice[string] { return views.SliceOf(v.ж.PubKeys) }
func (v SSHPrincipalView) Node() StableNodeID { return v.ж.Node }
func (v SSHPrincipalView) NodeIP() string { return v.ж.NodeIP }
func (v SSHPrincipalView) UserLogin() string { return v.ж.UserLogin }
func (v SSHPrincipalView) Any() bool { return v.ж.Any }
func (v SSHPrincipalView) UnusedPubKeys() views.Slice[string] {
return views.SliceOf(v.ж.UnusedPubKeys)
}
// A compilation failure here means this code must be regenerated, with the command at the top of this file.
var _SSHPrincipalViewNeedsRegeneration = SSHPrincipal(struct {
Node StableNodeID
NodeIP string
UserLogin string
Any bool
PubKeys []string
Node StableNodeID
NodeIP string
UserLogin string
Any bool
UnusedPubKeys []string
}{})
// View returns a readonly view of ControlDialPlan.

View File

@@ -38,7 +38,6 @@ import (
"golang.org/x/net/proxy"
"tailscale.com/client/tailscale"
"tailscale.com/cmd/testwrapper/flakytest"
"tailscale.com/health"
"tailscale.com/ipn"
"tailscale.com/ipn/store/mem"
"tailscale.com/net/netns"
@@ -822,16 +821,6 @@ func TestUDPConn(t *testing.T) {
}
}
// testWarnable is a Warnable that is used within this package for testing purposes only.
var testWarnable = health.Register(&health.Warnable{
Code: "test-warnable-tsnet",
Title: "Test warnable",
Severity: health.SeverityLow,
Text: func(args health.Args) string {
return args[health.ArgError]
},
})
func parseMetrics(m []byte) (map[string]float64, error) {
metrics := make(map[string]float64)
@@ -905,9 +894,11 @@ func sendData(logf func(format string, args ...any), ctx context.Context, bytesC
for {
got := make([]byte, bytesCount)
n, err := conn.Read(got)
if n != bytesCount {
logf("read %d bytes, want %d", n, bytesCount)
if err != nil {
allReceived <- fmt.Errorf("failed reading packet, %s", err)
return
}
got = got[:n]
select {
case <-stopReceive:
@@ -915,13 +906,17 @@ func sendData(logf func(format string, args ...any), ctx context.Context, bytesC
default:
}
if err != nil {
allReceived <- fmt.Errorf("failed reading packet, %s", err)
return
}
total += n
logf("received %d/%d bytes, %.2f %%", total, bytesCount, (float64(total) / (float64(bytesCount)) * 100))
// Validate the received bytes to be the same as the sent bytes.
for _, b := range string(got) {
if b != 'A' {
allReceived <- fmt.Errorf("received unexpected byte: %c", b)
return
}
}
if total == bytesCount {
break
}
@@ -947,15 +942,135 @@ func sendData(logf func(format string, args ...any), ctx context.Context, bytesC
return nil
}
func TestUserMetrics(t *testing.T) {
flakytest.Mark(t, "https://github.com/tailscale/tailscale/issues/13420")
tstest.ResourceCheck(t)
func TestUserMetricsByteCounters(t *testing.T) {
ctx, cancel := context.WithTimeout(context.Background(), 120*time.Second)
defer cancel()
controlURL, _ := startControl(t)
s1, s1ip, _ := startServer(t, ctx, controlURL, "s1")
defer s1.Close()
s2, s2ip, _ := startServer(t, ctx, controlURL, "s2")
defer s2.Close()
lc1, err := s1.LocalClient()
if err != nil {
t.Fatal(err)
}
lc2, err := s2.LocalClient()
if err != nil {
t.Fatal(err)
}
// Force an update to the netmap to ensure that the metrics are up-to-date.
s1.lb.DebugForceNetmapUpdate()
s2.lb.DebugForceNetmapUpdate()
// Wait for both nodes to have a peer in their netmap.
waitForCondition(t, "waiting for netmaps to contain peer", 90*time.Second, func() bool {
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
status1, err := lc1.Status(ctx)
if err != nil {
t.Logf("getting status: %s", err)
return false
}
status2, err := lc2.Status(ctx)
if err != nil {
t.Logf("getting status: %s", err)
return false
}
return len(status1.Peers()) > 0 && len(status2.Peers()) > 0
})
// ping to make sure the connection is up.
res, err := lc2.Ping(ctx, s1ip, tailcfg.PingICMP)
if err != nil {
t.Fatalf("pinging: %s", err)
}
t.Logf("ping success: %#+v", res)
mustDirect(t, t.Logf, lc1, lc2)
// 1 megabytes
bytesToSend := 1 * 1024 * 1024
// This asserts generates some traffic, it is factored out
// of TestUDPConn.
start := time.Now()
err = sendData(t.Logf, ctx, bytesToSend, s1, s2, s1ip, s2ip)
if err != nil {
t.Fatalf("Failed to send packets: %v", err)
}
t.Logf("Sent %d bytes from s1 to s2 in %s", bytesToSend, time.Since(start).String())
ctxLc, cancelLc := context.WithTimeout(context.Background(), 5*time.Second)
defer cancelLc()
metrics1, err := lc1.UserMetrics(ctxLc)
if err != nil {
t.Fatal(err)
}
parsedMetrics1, err := parseMetrics(metrics1)
if err != nil {
t.Fatal(err)
}
// Allow the metrics for the bytes sent to be off by 15%.
bytesSentTolerance := 1.15
t.Logf("Metrics1:\n%s\n", metrics1)
// Verify that the amount of data recorded in bytes is higher or equal to the data sent
inboundBytes1 := parsedMetrics1[`tailscaled_inbound_bytes_total{path="direct_ipv4"}`]
if inboundBytes1 < float64(bytesToSend) {
t.Errorf(`metrics1, tailscaled_inbound_bytes_total{path="direct_ipv4"}: expected higher (or equal) than %d, got: %f`, bytesToSend, inboundBytes1)
}
// But ensure that it is not too much higher than the data sent.
if inboundBytes1 > float64(bytesToSend)*bytesSentTolerance {
t.Errorf(`metrics1, tailscaled_inbound_bytes_total{path="direct_ipv4"}: expected lower than %f, got: %f`, float64(bytesToSend)*bytesSentTolerance, inboundBytes1)
}
metrics2, err := lc2.UserMetrics(ctx)
if err != nil {
t.Fatal(err)
}
parsedMetrics2, err := parseMetrics(metrics2)
if err != nil {
t.Fatal(err)
}
t.Logf("Metrics2:\n%s\n", metrics2)
// Verify that the amount of data recorded in bytes is higher or equal than the data sent.
outboundBytes2 := parsedMetrics2[`tailscaled_outbound_bytes_total{path="direct_ipv4"}`]
if outboundBytes2 < float64(bytesToSend) {
t.Errorf(`metrics2, tailscaled_outbound_bytes_total{path="direct_ipv4"}: expected higher (or equal) than %d, got: %f`, bytesToSend, outboundBytes2)
}
// But ensure that it is not too much higher than the data sent.
if outboundBytes2 > float64(bytesToSend)*bytesSentTolerance {
t.Errorf(`metrics2, tailscaled_outbound_bytes_total{path="direct_ipv4"}: expected lower than %f, got: %f`, float64(bytesToSend)*bytesSentTolerance, outboundBytes2)
}
}
func TestUserMetricsRouteGauges(t *testing.T) {
// Windows does not seem to support or report back routes when running in
// userspace via tsnet. So, we skip this check on Windows.
// TODO(kradalby): Figure out if this is correct.
if runtime.GOOS == "windows" {
t.Skipf("skipping on windows")
}
ctx, cancel := context.WithTimeout(context.Background(), 120*time.Second)
defer cancel()
controlURL, c := startControl(t)
s1, s1ip, s1PubKey := startServer(t, ctx, controlURL, "s1")
s2, s2ip, _ := startServer(t, ctx, controlURL, "s2")
s1, _, s1PubKey := startServer(t, ctx, controlURL, "s1")
defer s1.Close()
s2, _, _ := startServer(t, ctx, controlURL, "s2")
defer s2.Close()
s1.lb.EditPrefs(&ipn.MaskedPrefs{
Prefs: ipn.Prefs{
@@ -984,24 +1099,11 @@ func TestUserMetrics(t *testing.T) {
t.Fatal(err)
}
// ping to make sure the connection is up.
res, err := lc2.Ping(ctx, s1ip, tailcfg.PingICMP)
if err != nil {
t.Fatalf("pinging: %s", err)
}
t.Logf("ping success: %#+v", res)
ht := s1.lb.HealthTracker()
ht.SetUnhealthy(testWarnable, health.Args{"Text": "Hello world 1"})
// Force an update to the netmap to ensure that the metrics are up-to-date.
s1.lb.DebugForceNetmapUpdate()
s2.lb.DebugForceNetmapUpdate()
wantRoutes := float64(2)
if runtime.GOOS == "windows" {
wantRoutes = 0
}
// Wait for the routes to be propagated to node 1 to ensure
// that the metrics are up-to-date.
@@ -1013,31 +1115,11 @@ func TestUserMetrics(t *testing.T) {
t.Logf("getting status: %s", err)
return false
}
if runtime.GOOS == "windows" {
// Windows does not seem to support or report back routes when running in
// userspace via tsnet. So, we skip this check on Windows.
// TODO(kradalby): Figure out if this is correct.
return true
}
// Wait for the primary routes to reach our desired routes, which is wantRoutes + 1, because
// the PrimaryRoutes list will contain a exit node route, which the metric does not count.
return status1.Self.PrimaryRoutes != nil && status1.Self.PrimaryRoutes.Len() == int(wantRoutes)+1
})
mustDirect(t, t.Logf, lc1, lc2)
// 10 megabytes
bytesToSend := 10 * 1024 * 1024
// This asserts generates some traffic, it is factored out
// of TestUDPConn.
start := time.Now()
err = sendData(t.Logf, ctx, bytesToSend, s1, s2, s1ip, s2ip)
if err != nil {
t.Fatalf("Failed to send packets: %v", err)
}
t.Logf("Sent %d bytes from s1 to s2 in %s", bytesToSend, time.Since(start).String())
ctxLc, cancelLc := context.WithTimeout(context.Background(), 5*time.Second)
defer cancelLc()
metrics1, err := lc1.UserMetrics(ctxLc)
@@ -1045,19 +1127,11 @@ func TestUserMetrics(t *testing.T) {
t.Fatal(err)
}
status1, err := lc1.Status(ctxLc)
if err != nil {
t.Fatal(err)
}
parsedMetrics1, err := parseMetrics(metrics1)
if err != nil {
t.Fatal(err)
}
// Allow the metrics for the bytes sent to be off by 15%.
bytesSentTolerance := 1.15
t.Logf("Metrics1:\n%s\n", metrics1)
// The node is advertising 4 routes:
@@ -1075,33 +1149,11 @@ func TestUserMetrics(t *testing.T) {
t.Errorf("metrics1, tailscaled_approved_routes: got %v, want %v", got, want)
}
// Validate the health counter metric against the status of the node
if got, want := parsedMetrics1[`tailscaled_health_messages{type="warning"}`], float64(len(status1.Health)); got != want {
t.Errorf("metrics1, tailscaled_health_messages: got %v, want %v", got, want)
}
// Verify that the amount of data recorded in bytes is higher or equal to the
// 10 megabytes sent.
inboundBytes1 := parsedMetrics1[`tailscaled_inbound_bytes_total{path="direct_ipv4"}`]
if inboundBytes1 < float64(bytesToSend) {
t.Errorf(`metrics1, tailscaled_inbound_bytes_total{path="direct_ipv4"}: expected higher (or equal) than %d, got: %f`, bytesToSend, inboundBytes1)
}
// But ensure that it is not too much higher than the 10 megabytes sent.
if inboundBytes1 > float64(bytesToSend)*bytesSentTolerance {
t.Errorf(`metrics1, tailscaled_inbound_bytes_total{path="direct_ipv4"}: expected lower than %f, got: %f`, float64(bytesToSend)*bytesSentTolerance, inboundBytes1)
}
metrics2, err := lc2.UserMetrics(ctx)
if err != nil {
t.Fatal(err)
}
status2, err := lc2.Status(ctx)
if err != nil {
t.Fatal(err)
}
parsedMetrics2, err := parseMetrics(metrics2)
if err != nil {
t.Fatal(err)
@@ -1118,23 +1170,6 @@ func TestUserMetrics(t *testing.T) {
if got, want := parsedMetrics2["tailscaled_approved_routes"], 0.0; got != want {
t.Errorf("metrics2, tailscaled_approved_routes: got %v, want %v", got, want)
}
// Validate the health counter metric against the status of the node
if got, want := parsedMetrics2[`tailscaled_health_messages{type="warning"}`], float64(len(status2.Health)); got != want {
t.Errorf("metrics2, tailscaled_health_messages: got %v, want %v", got, want)
}
// Verify that the amount of data recorded in bytes is higher or equal than the
// 10 megabytes sent.
outboundBytes2 := parsedMetrics2[`tailscaled_outbound_bytes_total{path="direct_ipv4"}`]
if outboundBytes2 < float64(bytesToSend) {
t.Errorf(`metrics2, tailscaled_outbound_bytes_total{path="direct_ipv4"}: expected higher (or equal) than %d, got: %f`, bytesToSend, outboundBytes2)
}
// But ensure that it is not too much higher than the 10 megabytes sent.
if outboundBytes2 > float64(bytesToSend)*bytesSentTolerance {
t.Errorf(`metrics2, tailscaled_outbound_bytes_total{path="direct_ipv4"}: expected lower than %f, got: %f`, float64(bytesToSend)*bytesSentTolerance, outboundBytes2)
}
}
func waitForCondition(t *testing.T, msg string, waitTime time.Duration, f func() bool) {

28
types/bools/bools.go Normal file
View File

@@ -0,0 +1,28 @@
// Copyright (c) Tailscale Inc & AUTHORS
// SPDX-License-Identifier: BSD-3-Clause
// Package bools contains the [Compare] and [Select] functions.
package bools
// Compare compares two boolean values as if false is ordered before true.
func Compare[T ~bool](x, y T) int {
switch {
case x == false && y == true:
return -1
case x == true && y == false:
return +1
default:
return 0
}
}
// IfElse is a ternary operator that returns trueVal if condExpr is true
// otherwise it returns falseVal.
// IfElse(c, a, b) is roughly equivalent to (c ? a : b) in languages like C.
func IfElse[T any](condExpr bool, trueVal T, falseVal T) T {
if condExpr {
return trueVal
} else {
return falseVal
}
}

View File

@@ -19,3 +19,12 @@ func TestCompare(t *testing.T) {
t.Errorf("Compare(true, true) = %v, want 0", got)
}
}
func TestIfElse(t *testing.T) {
if got := IfElse(true, 0, 1); got != 0 {
t.Errorf("IfElse(true, 0, 1) = %v, want 0", got)
}
if got := IfElse(false, 0, 1); got != 1 {
t.Errorf("IfElse(false, 0, 1) = %v, want 1", got)
}
}

View File

@@ -1,17 +0,0 @@
// Copyright (c) Tailscale Inc & AUTHORS
// SPDX-License-Identifier: BSD-3-Clause
// Package bools contains the bools.Compare function.
package bools
// Compare compares two boolean values as if false is ordered before true.
func Compare[T ~bool](x, y T) int {
switch {
case x == false && y == true:
return -1
case x == true && y == false:
return +1
default:
return 0
}
}

23
types/iox/io.go Normal file
View File

@@ -0,0 +1,23 @@
// Copyright (c) Tailscale Inc & AUTHORS
// SPDX-License-Identifier: BSD-3-Clause
// Package iox provides types to implement [io] functionality.
package iox
// TODO(https://go.dev/issue/21670): Deprecate or remove this functionality
// once the Go language supports implementing an 1-method interface directly
// using a function value of a matching signature.
// ReaderFunc implements [io.Reader] using the underlying function value.
type ReaderFunc func([]byte) (int, error)
func (f ReaderFunc) Read(b []byte) (int, error) {
return f(b)
}
// WriterFunc implements [io.Writer] using the underlying function value.
type WriterFunc func([]byte) (int, error)
func (f WriterFunc) Write(b []byte) (int, error) {
return f(b)
}

39
types/iox/io_test.go Normal file
View File

@@ -0,0 +1,39 @@
// Copyright (c) Tailscale Inc & AUTHORS
// SPDX-License-Identifier: BSD-3-Clause
package iox
import (
"bytes"
"io"
"testing"
"testing/iotest"
"tailscale.com/util/must"
)
func TestCopy(t *testing.T) {
const testdata = "the quick brown fox jumped over the lazy dog"
src := testdata
bb := new(bytes.Buffer)
if got := must.Get(io.Copy(bb, ReaderFunc(func(b []byte) (n int, err error) {
n = copy(b[:min(len(b), 7)], src)
src = src[n:]
if len(src) == 0 {
err = io.EOF
}
return n, err
}))); int(got) != len(testdata) {
t.Errorf("copy = %d, want %d", got, len(testdata))
}
var dst []byte
if got := must.Get(io.Copy(WriterFunc(func(b []byte) (n int, err error) {
dst = append(dst, b...)
return len(b), nil
}), iotest.OneByteReader(bb))); int(got) != len(testdata) {
t.Errorf("copy = %d, want %d", got, len(testdata))
}
if string(dst) != testdata {
t.Errorf("copy = %q, want %q", dst, testdata)
}
}

View File

@@ -5,9 +5,9 @@
package dnsname
import (
"errors"
"fmt"
"strings"
"tailscale.com/util/vizerror"
)
const (
@@ -36,7 +36,7 @@ func ToFQDN(s string) (FQDN, error) {
totalLen += 1 // account for missing dot
}
if totalLen > maxNameLength {
return "", fmt.Errorf("%q is too long to be a DNS name", s)
return "", vizerror.Errorf("%q is too long to be a DNS name", s)
}
st := 0
@@ -54,7 +54,7 @@ func ToFQDN(s string) (FQDN, error) {
//
// See https://github.com/tailscale/tailscale/issues/2024 for more.
if len(label) == 0 || len(label) > maxLabelLength {
return "", fmt.Errorf("%q is not a valid DNS label", label)
return "", vizerror.Errorf("%q is not a valid DNS label", label)
}
st = i + 1
}
@@ -97,23 +97,23 @@ func (f FQDN) Contains(other FQDN) bool {
// ValidLabel reports whether label is a valid DNS label.
func ValidLabel(label string) error {
if len(label) == 0 {
return errors.New("empty DNS label")
return vizerror.New("empty DNS label")
}
if len(label) > maxLabelLength {
return fmt.Errorf("%q is too long, max length is %d bytes", label, maxLabelLength)
return vizerror.Errorf("%q is too long, max length is %d bytes", label, maxLabelLength)
}
if !isalphanum(label[0]) {
return fmt.Errorf("%q is not a valid DNS label: must start with a letter or number", label)
return vizerror.Errorf("%q is not a valid DNS label: must start with a letter or number", label)
}
if !isalphanum(label[len(label)-1]) {
return fmt.Errorf("%q is not a valid DNS label: must end with a letter or number", label)
return vizerror.Errorf("%q is not a valid DNS label: must end with a letter or number", label)
}
if len(label) < 2 {
return nil
}
for i := 1; i < len(label)-1; i++ {
if !isdnschar(label[i]) {
return fmt.Errorf("%q is not a valid DNS label: contains invalid character %q", label, label[i])
return vizerror.Errorf("%q is not a valid DNS label: contains invalid character %q", label, label[i])
}
}
return nil

View File

@@ -6,7 +6,6 @@
package multierr
import (
"context"
"errors"
"slices"
"strings"
@@ -135,40 +134,3 @@ func Range(err error, fn func(error) bool) bool {
}
return true
}
// DeduplicateContextErrors returns a new slice of errors with at most one
// occurrence of each [context.Canceled] or [context.DeadlineExceeded], if one
// or more of them are present in the input slice.
//
// All other non-nil errors are returned as-is; nil errors are skipped.
func DeduplicateContextErrors(errs []error) []error {
// preserve nil/non-nil distinction
if errs == nil {
return nil
} else if len(errs) == 0 {
return []error{}
}
var (
ret []error
sawCanceled, sawDeadline bool
)
for _, err := range errs {
if err == nil {
continue
}
if errors.Is(err, context.Canceled) {
if sawCanceled {
continue
}
sawCanceled = true
} else if errors.Is(err, context.DeadlineExceeded) {
if sawDeadline {
continue
}
sawDeadline = true
}
ret = append(ret, err)
}
return ret
}

View File

@@ -4,7 +4,6 @@
package multierr_test
import (
"context"
"errors"
"fmt"
"io"
@@ -108,60 +107,6 @@ func TestRange(t *testing.T) {
})), want)
}
func TestDeduplicateContextErrors(t *testing.T) {
testError := errors.New("test error")
tests := []struct {
name string
input []error
want []error
}{
{name: "nil", input: nil, want: nil},
{name: "empty", input: []error{}, want: []error{}},
{name: "single", input: []error{testError}, want: []error{testError}},
{
name: "duplicate_non_context",
input: []error{testError, testError},
want: []error{testError, testError},
},
{
name: "single_context",
input: []error{context.Canceled},
want: []error{context.Canceled},
},
{
name: "duplicate_context",
input: []error{testError, context.Canceled, context.Canceled},
want: []error{testError, context.Canceled},
},
{
name: "duplicate_context_mixed",
input: []error{
testError,
context.Canceled,
context.Canceled,
testError,
context.DeadlineExceeded,
context.DeadlineExceeded,
},
want: []error{
testError,
context.Canceled,
testError,
context.DeadlineExceeded,
},
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := multierr.DeduplicateContextErrors(tt.input)
if diff := cmp.Diff(tt.want, got, cmpopts.EquateErrors()); diff != "" {
t.Errorf("DeduplicateContextErrors() mismatch (-want +got):\n%s", diff)
}
})
}
}
var sink error
func BenchmarkEmpty(b *testing.B) {

Some files were not shown because too many files have changed in this diff Show More