ericlagergren/go-coreutils

Make tools importable

Opened this issue · 24 comments

mvdan commented

Hi Eric! I am developing a shell package - see https://github.com/mvdan/sh.

One of its components is an interpreter. That means I have to implement the shell builtins like echo and cd. One of the big wins of that library is that Go packages that used to need bash to be installed can simply drop that dependency, and use the shell package as a replacement, statically linked into their binary.

However, that breaks down quite easily on systems that don't have coreutils installed. Lots of shell scripts out in the wild depend on coreutils programs like cat, rm and wc. This is why I opened mvdan/sh#93 - to add them to the interpreter as a sort of second layer of builtins.

However, as you probably very well know, adding even just some of them is a ton of work. Which is why I've been looking around for implementations of coreutils.

I could use upstream or the popular implementation in Rust, but that would mean somehow bundling the binaries into the final binary. Something nasty like including them at compile-time as assets and unpacking them into the filesystem at run-time.

But that's not the case with Go, since I can simply import Go packages. Then, the only roadblock that I see is that your tools (nice job, by the way!) are not importable - they are all main packages.

Have you given thought to adding a common interface for all the tools? For example, similar to what os/exec does:

type Ctx struct {
        Dir    string
        GetEnv func(string) string
        Stdin  io.Reader
        Stdout io.Writer
        Stderr io.Writer
}

func Run(c Ctx, name string, args ...string) error

Then one could do something like coreutils.Run(Ctx{...}, "wc", "somefile").

If you have any input, or would like any help to implement this, do let me know.

Yes I have. I've started to turn some of them into libraries, but I've been more focused on my decimal library as of late. I plan to spend more time on this library once v3.0 of my decimal package drops, which should be whenever trig functions are added.

If you'd like to help in any way you're more than welcome. I'm down to finally make this library useful and help you out!

mvdan commented

Great to hear that! I won't submit a PR right away, as this would require quite a bit of design and refactoring, and I'm not familiar with this codebase. And it would likely save everyone time if you have a look at it first.

When you start working on this or have a design/prototype, do let me know and I'll be happy to help - be it reviews, testing, or coding.

Sounds good. I might fiddle around with it a bit today. If you don't hear from me in a week or so, feel free to ping me. I don't mind being bothered. I'm glad somebody's getting use of this library!

So, I spent a little while and sketched out an implementation using wc:

Example:

// +build ignore

package main

import (
	"os"

	"github.com/ericlagergren/go-coreutils/coreutils"

	_ "github.com/ericlagergren/go-coreutils/wc"
)

func main() {
	ctx := coreutils.Ctx{
		Stdin:  os.Stdin,
		Stdout: os.Stdout,
		Stderr: os.Stderr,
	}
	coreutils.Run(ctx, "wc", "-l", "cmd.go")
}
mvdan commented

Did you forget to commit the coreutils package? I'm also not a terrible fan of the coreutils/coreutils path :) Perhaps you could simply use the root package, or do something else like coreutils/exec.

I would also need Dir in the context struct, similar to what's in the os/exec package. Otherwise, the current dir from the process is forced, which is no good for my interpreter.

Otherwise looks good!

mvdan commented

Yes, this is similar to what I was thinking. Registering the commands sounds fine. Ping me when there's a working version I can test out :)

Ok, here's what I meant to commit the other day: 8b35c72

mvdan commented

Trying it out now, getting this build error on linux/amd64:

# github.com/ericlagergren/go-coreutils/wc/internal/sys
../../../../land/src/github.com/ericlagergren/go-coreutils/wc/internal/sys/fadv_unix.go:7:22: Fadvise redeclared in this block
        previous declaration at ../../../../land/src/github.com/ericlagergren/go-coreutils/wc/internal/sys/fadv.go:5:21

Oh. Just a goofed up build tag inside wc/internal/sys/fadv.go It should be a comma, not a space. Fadv isn't a requirement, anyway. Just theoretically speeds up reading a file by letting the kernel know the desired read pattern.

mvdan commented

Thanks, now it builds. It behaves differently from GNU wc, though. For example, wc -c somefile gives \t<number>\n instead of just <number>\n. And prog | wc gives -\n instead of \t<number>\t<number>\t<number>\n.

Do you happen to have tests that check input/output of your implementations versus GNU's?

mvdan commented

Also, if you have more time, here's another suggestion to add to the common context - a context.Context. This has multiple advantages, such as setting a timeout or being able to cancel. For most programs that won't be very useful, but imagine sleep, cp, or dd.

It behaves differently from GNU wc, though.

It does? What version of coreutils are you running? Mine's identical with coreutils 8.29.

$ go run m.go
25986317 /Users/ericlagergren/out2.s
0:1 /tmp $ gwc -c /Users/ericlagergren/out2.s
25986317 /Users/ericlagergren/out2.s
0:1 /tmp $ go run m.go > go.txt; gwc -c /Users/ericlagergren/out2.s > gnu.txt; diff go.txt gnu.txt
0:1 /tmp $

Do you happen to have tests that check input/output of your implementations versus GNU's?

For some, yeah. wc does.

I like the context.Context idea.

mvdan commented

Simpler example:

$ wc --version
wc (GNU coreutils) 8.28
$ wc /dev/null
      0       0       0 /dev/null
$ cat /dev/null | wc
      0       0       0
$ cat /dev/null | wc -c
0

Unless I got something very wrong in my prototype, your implementation seems to always include the filename (even if it reads from stdin) and when given no flags, it seems to not print those three numbers. That's what I meant by the examples above.

Gotcha. One of the goals of this project is to have it be byte-for-byte exact with GNU, but sometimes there are good reasons for it not to be. For example, coreutils is meant to run on VAX and stuff, so there's lots of weird edge-case code and sometimes they go from A -> B -> C -> D to do something that Go (because it can abstract more and doesn't need to support machines from the '80s) can do simply by going from A -> D, if that makes sense.

For example, GNU wc uses 7 spaces minimum for all printing, unless it can't stat the input (i.e., it's not regular file). Then it just dumps it with 0 spaces.

It should be easy enough to make byte-for-byte perfect.

mvdan commented

Thanks - your recent changes make sense. Now my tests almost pass - the only problem is what when reading from stdin it still prints a trailing space, like wc -c <somefile prints 8 \n. Other than that, all tests should now pass :)

mvdan commented

Sounds good. Note that I absolutely don't need all the tools at once. In particular, the original issue was just about some of the common ones. This will act as an overlay on top of a real os/exec call, so on most environments coreutils will be installed and available anyway.

Even if only one or a few tools are importable as libraries, that's plenty for the interpreter to start using them.

mvdan commented

Basically any that would be used frequently in shell scripts - rm, cp, mv, mkdir, ls, touch, chmod are perhaps the most common ones.

  • rm
  • cp
  • mv
  • mkdir
  • ls
  • touch
  • chmod
  • wc
mvdan commented

For those of you who saw this thread, I'm trying to coordinate with a different project now :) u-root/u-root#2527

Sorry :)

mvdan commented

Certainly not trying to dig up old stuff or put blame - I also have some semi-abandoned projects due to lack of free time and energy :) Just want to point others who might still be interested towards more recent developments.