chrislusf/glow

Read size invalid argument - expected data input?

andrewrt opened this issue · 7 comments

I have Glow running on 1 machine just fine, but when trying to simulate the glow cluster system on my local machine, via:
glow master --address 0.0.0.0:8930
glow agent --dir="/Users/andrew/Desktop/GlowFolder" --port=8931 --master="0.0.0.0:8930" --memory=4096 --clean.restart --cpu.level=4

And start the app via:
myapp -glow -glow.leader="0.0.0.0:8930"

  1. If I don't have my executable in the Desktop/GlowFolder, i get an issue saying Failed to start command ./myapp under /Users/andrewt/Desktop/GlowFolder: fork/exec ./myapp: no such file or directory
    I thought the --dir flag was just for temp documents, do I need to copy the app binary to that folder as well?

  2. Read size:
    If I run from a folder containing myapp's binary, then I can run, but the glow agent outputs the following error:
    2017/03/21 09:41:24 Read size from -ct-0-ds-0-shard-4 offset 1054782852: read /Users/andrew/Desktop/GlowFolder/-ct-0-ds-0-shard-4-8931.dat: invalid argument
    How is read size determined and expected?

Here's the code...

package main

import (
	"flag"
	"fmt"

	"bitbucket.org/myapp/db"
	"bitbucket.org/myapp/model"
	"bitbucket.org/myapp/ptmath"

	_ "github.com/chrislusf/glow/driver"
	_ "github.com/chrislusf/glow/flow"
)

//GroupResult - grouping result
type GroupResult struct {
	Leader    *model.PtSet
	GroupSize int
}

//Group2Test - input for grouping test
type Group2Test struct {
	LeaderCandidate *model.PtSet
	PtSet           []model.PtSet
}

func main() {
	flag.Parse()

	//get PointSets from DB
	ptSets := db.GetAvailablePointSets()

	//Convert points to a different Coordinate System before running analysis:
	for idx, ptSet := range ptSets {
		conversionMatrix := ptmath.GetConversionMatrix(ptSet)
		var xyzSlice []model.XyzPt
		for _, pt := range ptSet.PtSourceSlice {
			xyz := ptmath.CalculateXYZ(*conversionMatrix, pt)
			xyzSlice = append(xyzSlice, *xyz)
		}
		ptSets[idx].xyzSlice = xyzSlice
	}

	//map reduce method to find biggest group:
	bestGroupLeader := loadCadidates(ptSets)
	fmt.Println("\nBest candidate ID:", winner.LeaderCandidate.ID, ", GroupSize: ", winner.GroupSize)

}

var (
	f       = flow.New()
	flowOut = make(chan GroupResult)
)

func loadCadidates(ptSets []model.PtSet) *GroupResult {
	var bestCandidate *GroupResult

	f.Source(func(out chan Group2Test) {
		ptSetIdx := 0
		for ptSetIdx < len(ptSets) {
			out <- Group2Test{&ptSets[ptSetIdx], ptSets}
			ptSetIdx++
		}
	}, /*len(ptSets)*/ 10).Map(func(g2Test Group2Test) GroupResult {
		return loadCandidateGroupSize(g2Test.LeaderCandidate, g2Test.PtSet)
	}).Reduce(func(x GroupResult, y GroupResult) GroupResult {
		//find the largest group:
		if x.GroupSize > y.GroupSize {
			return x
		} //else
		return y
	}).Map(func(winner GroupResult) {
		fmt.Println("\nBest ID:", winner.LeaderCandidate.ID, ", GroupSize: ", winner.GroupSize)
		bestCandidate = &winner
	}).Run()

	return bestCandidate
}

func loadCandidateGroupSize(leaderCandidate *model.PtSet, ptSets []model.PtSet) GroupResult {
	//count how many point sets have all points within some distance of a leaderCandidate
	setSize := len(leaderCandidate.xyzSlice)
	limit := 5
	groupSize := 0

	for _, ptSet := range ptSets {
		idx := 0
		maxDist := 0
		for idx < setSize {
			ptDist = ptmath.Distance(ptSet.xyzSlice[idx], leaderCandidate.xyzSlice[idx])
			if ptDist > maxDist {
				maxDist = ptDist
			}
			idx++
		}
		if maxDist < limit {
			groupSize++
		}
	}
	res := GroupResult{}
	res.LeaderCandidate = leaderCandidate
	res.GroupSize = groupSize
	return res
}
  1. Agent1 appears to be creating dozens and dozens of 1GB sized dat files.
    Agent2 and Agent3 (if using the provided /etc shell script for my own code) created a few small files and then finished / done.

How can I avoid this file blowup on agent1?

[edit]: looks like the agent bearing the brunt of the work (and thus the dat files) changes run to run.

I am not remembering the details of glow now. Please use gleam. It also has a pure go support.

Thanks for the response Chris- I hd previously attempted to port the above to gleam, was having some difficulties loading the f.source and f.map - how might one do this with the above (ie- preprocessed data rather than a file/db)?

Thanks again!

@andrewrt This may help you.

Hi liuluheng,
thanks for the link!

in there io writer is still used in the source (as opposed to channel loading like you can do in glow).

Does this mean you can basically trick the writer to act like a channel?