Skip to content

Very strange problem with testing connections #154

@precisionpete

Description

@precisionpete

I am not so sure this is a problem with wgctrl as much as maybe just an understanding problem on my part... Hopefully someone can give me some ideas on how to solve it.

I have a WireGuard config management app written in Go. The purpose of the app is to manage the p2p connections between peers, and if they change, reconfigure. The code uses golang.zx2c4.com/wireguard/wgctrl and does not shell out to wg. This is running on Ubuntu 24.04 but should be portable.

The flow I am having trouble with goes like this:

  • on some occasional interval, do the following:
  • test if the connection is using a relay peer (remote SecureIP not found in AllowedIPs)
  • test if there is a recent wg handshake (a direct connection may be possible)
  • test to see if a direct connection would work (add the remote SecureIP to the AllowedIPs)
  • test the connection by doing an http get/post to the remote SecureIP, including some retries
  • if the connection is successful, leave it in direct mode with the SecureIP in the AllowedIPs
  • if not, remove SecureIP from AlloweedIPs and let it relay again
  • revisit the test again in the future.

And the problem is...

  • If I moch this up in a bash script, it works
  • If I write a simple Go test program using wgctrl etc, it works
  • If I do the test in a goroutine in the test program, it works
  • If I run it in my larger app, in a goroutine but without concurrency, it works
  • If I run it in my larger app concurrently, it fails sometimes (includes mutex locking)

The thing is that it is somewhat intermittent. Mostly, it will fail, but sometimes, it will work. I've tried all kinds of different methods but can't seem to solve this. Also...

  • IP routing does not change. In each case, there is an IP route to the wg0 interface.
  • setting the AllowedIPs seems to work because I read them back and compare before continuing.
  • wg show says the AllowedIPs are correct while the test is going on.
  • basically, the http connection times out, as does a traceroute
  • if I walk away and come back later, it will eventually fail back to direct

Obviously, there is something going on behind the scenes I am not aware of. And the key seems to be the concurrency. With one parallel goroutine, it works. As soon as I try 2 routines (on a 20-core CPU) it has problems. It almost seems like there is some kind of a phantom network session going on where I am changing the parameters of the current session, but the connection test is using something else?

Any and all comments are welcome. Example code below...

// SetAllowedIPs sets the allowed IPs for a given peer.
func SetAllowedIPs(ifaceName string, publicKey string, allowedIPs []net.IPNet) error {
	wgMu.Lock()
	defer wgMu.Unlock()

	wgc, err := wgctrl.New()
	if err != nil {
		return fmt.Errorf("failed to create wgctrl client: %v", err)
	}
	defer wgc.Close()

	key, err := wgtypes.ParseKey(publicKey)
	if err != nil {
		return fmt.Errorf("failed to parse public key: %v", err)
	}

	cfg := wgtypes.Config{
		Peers: []wgtypes.PeerConfig{
			{
				PublicKey:         key,
				UpdateOnly:        true,
				ReplaceAllowedIPs: true,
				AllowedIPs:        allowedIPs,
			},
		},
	}

	err = wgc.ConfigureDevice(ifaceName, cfg)
	if err != nil {
		return fmt.Errorf("failed to configure device: %v", err)
	}

	 logrus.Debugf("Successfully set allowed IPs for peer %s to %s", publicKey, allowedIPs)
	return nil
}

Simplest test...

// TCPing attempts to establish a TCP connection to check IPv4 host reachability
func TCPing(host string, port int, timeout time.Duration) (bool, float64) {
	address := net.JoinHostPort(host, fmt.Sprintf("%d", port))
	dialer := &net.Dialer{
		Timeout: timeout,
	}

	st := time.Now()
	conn, err := dialer.Dial("tcp4", address)
	duration := time.Since(st).Seconds()
	if err != nil {
		return false, duration
	}

	defer conn.Close()
	return true, duration
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions