Juniper/go-netconf

panic: WaitForFunc failed

songtianyi opened this issue · 26 comments

It paniced when i execute rpc in a single session.

package main

import (
	"fmt"
	"golang.org/x/crypto/ssh"
	"log"

	"github.com/Juniper/go-netconf/netconf"
)

func doRpc(s *netconf.Session, xml string) {
	reply, err := s.Exec(netconf.RawMethod(xml))
	if err != nil {
		panic(err)
	}
	fmt.Printf("Reply: %+v", reply)
}

func main() {
	sshConfig := &ssh.ClientConfig{
		User: "root",
		Auth: []ssh.AuthMethod{ssh.Password("r00ttest")},
		HostKeyCallback: ssh.InsecureIgnoreHostKey(),
	}

	s, err := netconf.DialSSH("172.16.240.189", sshConfig)

	if err != nil {
		log.Fatal(err)
	}

	defer s.Close()

	fmt.Println(s.ServerCapabilities)
	fmt.Println(s.SessionID)

	//xml := "<get-config><source><running/></source></get-config>"
	xml := "<rpc><edit-config><target>candidate</target><system><host-name>hehe</host-name></system></edit-config></rpc>"

	// Sends raw XML
	doRpc(s, xml)
	xml = "<rpc></commit></rpc>"
	doRpc(s, xml)
}

any update on this issue?

Do you have the panic stacktrace?

Yes,
FAILURE:
2017/05/24 13:05:33 WaitForFunc failed
exit status 1
On NSO, I actually see the connection.
netconf id=25 new ssh session for user "vmsnso" from

Also usually netconf runs on a different port on Junos. 830 by default

See https://www.juniper.net/documentation/en_US/junos/topics/topic-map/netconf-ssh-connection.html

yea, I'm connecting on port 830

Odd. WaitForFunc should never be called.

Here's what I'm using
s, err := netconf.DialSSHTimeout("...:830", sshConfig, duration_Milliseconds)

I had a similar problem but the same error message. The problem on my side surfaced while using Netopeer2, by default Netopeer2 server closes the ssh connection after 20 seconds of inactivity. So if I used a session after being inactive/waiting for 20 seconds I would get "WaitForFunc failed".

Would it help if I send you a dockerfile and a bash script which would reproduce the bug?

Yes, that would be helpful at least to help isolate the problem

Pull and run the docker container with Netopeer2.

docker pull sysrepo/sysrepo-netopeer2:latest
docker run -p 830:830 --name sysrepo --rm sysrepo/sysrepo-netopeer2:latest

The go code.

package main

import (
	"fmt"
	"log"
	"time"

	"github.com/sartura/go-netconf/netconf"
)

func main() {
	auth := netconf.SSHConfigPassword("netconf", string("netconf"))
	s, err := netconf.DialSSH("localhost", auth)
	if err != nil {
		log.Fatal(err)
	}

	defer s.Close()

	fmt.Println(s.ServerCapabilities)
	fmt.Println(s.SessionID)

	time.Sleep(21 * time.Second)

	// Sends raw XML
	reply, err := s.Exec(netconf.MethodGetConfig("startup"))
	if err != nil {
		panic(err)
	}
	fmt.Printf("Reply: %+v", reply)
}

If you comment out time.Sleep(21 * time.Second) the code will work.

Also one quick note, I used go-netconf from the sartura repository becaus it has the two patches that are needed for working with Netopeer2. They are in the pull request #25. If you use github.com/Juniper/go-netconf/netconf you will get WaitForFunc failed.

If you need any further assistance please let me know.

For me this issue seems has nothing to do with remote peer resetting the connection (like Netopeer2), verifying tcpdump on port 830 I can see that there is no network activity when encountering this issue during the very first RPC send (the session can be established successfully). The SSH TCP connection on port 830 is intact when it happens.

Used github.com/sartura/go-netconf/netconf and it worked right away. I am actually not using Netopeer2 as Netconf server. so this issue seems to be more widespread than just for Netopeer2.

Exec() calls transport.Receive() which calls WaitForBytes() which calls WaitForFunc()

This panic seems to be the result of getting a message back from the netconf server which does not end with the expected message separator. WaitForFunc() will read an EOF and break the loop, causing the panic/error. Once way to patch it would be something like this:

func (t *transportBasicIO) WaitForFunc(f func([]byte) (int, error)) ([]byte, error) {
	var out bytes.Buffer
	buf := make([]byte, 4096)
	pos := 0
	for {
		n, err := t.Read(buf[pos : pos+(len(buf)/2)])
		if err != nil {
			if err != io.EOF {
				return nil, err
			}
			// Handle EOF but no message separator to mark
			// the end of the message
			if n == 0 {
				out.Write(buf[0:pos])
				return out.Bytes(), nil
			}
			break
		}
		if n > 0 {
			end, err := f(buf[0 : pos+n])
			if err != nil {
				return nil, err
			}
			if end > -1 {
				out.Write(buf[0:end])
				return out.Bytes(), nil
			}
			if pos > 0 {
				out.Write(buf[0:pos])
				copy(buf, buf[pos:pos+n])
			}
			pos = n
		}
	}
	return nil, fmt.Errorf("WaitForFunc failed")
}

At the very least the error returned from the server should now come back in the reply. Its not likely to be a valid netconf message though. The error handling needs to be improved.

Even after I applied this patch, the problem is that subsequent operations also fail with the same error. Do I need to reset the session connection?

I'm afraid I'm unable to spend a lot of time helping to debug your specific problem with the package. In my experience with this its likely you are not even getting a valid netconf response from the server. Did you dump the response message to confirm the server is not returning an error?

You can enable debug logging in the package by including the standard lib log package and adding something like this to the top of your code before making any calls to go-netconf:

ncLogger := netconf.NewStdLog(log.New(os.Stdout, "NCDEBUG", log.Lshortfile), netconf.LogDebug)
netconf.SetLog(ncLogger)

I also noticed that the default capabilities string defined in transport.go for this package seems to contradict RFC 4741. In transport.go we have:

var DefaultCapabilities = []string{
	"urn:ietf:params:xml:ns:netconf:base:1.0",
}

While in the RFC it indicates the string should be:

var DefaultCapabilities = []string{
	"urn:ietf:params:netconf:base:1.0",
}

The server I was using returned an error until that was corrected. It also required some namespaces which I hacked in - you can see them by looking at the fork in my repo. Its not pretty but it worked for me.

I hope this helps you to resolve your problem. I appreciate the original authors work on this package but it is unfortunate he was not able to finish it.

thanks for the inputs @eriksejr . I still do see this issue with all the mods suggested. For now, I have resorted to resetting the client connection when I see this error.

@eriksejr I dont see netconf.NewStdLog can you confirm this is available in the public release?

Thanks,
Damian.

@damianoneill I believe the developer who has taken over this project removed or re-wrote the logging code in one of his commits. So this method of enabling the logging is no longer valid.

Just noticed it seems to be a function of v0.1 not v0.1.1.

Can someone explain the difference between these two version?

@eriksejr thanks for the reply.

@GIC-de The reason for asking about the logging, I am seeing the same issue with WaitForFunc error being propagated. I have a client connecting to a Device which I know to be running netopeer2.

I am creating a connection and keeping this open, I send repeated get / get-config operations with no delay down the connection to place it under load. After a small number of requests ~10 I get the WaitForFunc error propagated, all further calls to the session result in the same error.

Has anyone seen similiar behaviour?

On the logging of payloads, this would seem a useful feature? Is it supported any other way?

The WaitForFunc failed error is caused only if read returns EOF which should be the case if the session was closed only. I would propose to change the error message to session closed.

The original code from this issue is reproducible if tested against JunOS. In this case JunOS closes the the session caused by malformed/invalid rpc.

I think this has been fixed. Either way this is pretty old. Going to close for now.

This has not been fixed and is reproducible with Junos 21.2R0 image on a vSRX platform. Appreciate considering #30 (comment) that helps distinguish between EOF and missing expected separator.

There is a lot wrong with this code. Juniper has pretty much abandoned this project (I haven't worked for juniper for almost a decade now). I have started work on a rewrite, but I would accept not breaking PRs.

@eriksejr are you interested in submitting a PR with your suggested fix? I am glad it helped a bit and it didn't break anything.

@inspiregates Thanks for the offer. I think I did this so informally because it was more of a hack job and I no longer use this code for anything nor do I have access to anything to test it against. So I'm afraid I have no more to offer this project at this time.