To run the annotator or write gff3 or create suffix array files type:
julia chloe.jl --help
# or for a specific command e.g.
julia chloe.jl annotate --help
(See installing dependencies below)
For example:
julia chloe.jl annotate testfa/*.fa
Will create .sff
files in the testfa directory.
This annotator is available online at: https://chloe.plantenergy.edu.au
There is a Project.toml
file that contains all the project
dependencies.
To actually add these dependencies type
julia bin/deps.jl
.
or run
import Pkg
(open("Project.toml") |> Pkg.TOML.parse)["deps"] |> keys |> collect |> Pkg.add
I really don't know why there isn't a command for this... :(
You can install Chloe as a julia
package too.
Start julia and type ]
to get the package manager prompt. Then type:
]dev {path/to/chloe/repo/directory}
This will make an entry for Chloë in the Manifest for julia.
Now get julia to compile it by typing import Chloe
at the julia prompt.
You can easily remove Chloë as a package with:
]rm Chloe
Installing Chloë as a (local) package allows you to take advantage of julia's precompilation.
You can of course use julia's Distributed package.
Start julia with 3 workers and load code:
JULIA_NUM_THREADS=8 julia -p 3 -L src/remote.jl
Now you can type:
using Distributed
# just read reference Data on remote workers
@everywhere workers() begin
global REFS = readDefaultReferences()
end
# get a fasta file
fasta = IOBuffer(read("testfa/NC_020019.1.fa", String))
# note that REFS is not defined locally in the REPL!
r = @spawnat :any annotate_one(REFS, fasta)
io, uid = fetch(r)
sff = String(take!(io))
# this works too.., just tell Chloe the filename
r = @spawnat :any annotate_one(REFS, "testfa/NC_020019.1.fa")
r = @spawnat :any annotate_one(REFS, "testfa/NC_020019.1.fa", "write_to_this_file.sff")
This also works:
using Distrbuted
addprocs(3)
@everywhere workers() begin
include("src/remote.jl")
REFS = readDefaultReferences()
end
fasta = IOBuffer(read("testfa/NC_020019.1.fa", String))
io, uid = fetch(@spawnat :any annotate_one(REFS, fasta))
# get chloe sff as a string
sff = String(take!(io))
# *OR*
sff_filename, uid = fetch(@spawnat :any annotate_one(REFS, "testfa/NC_020019.1.fa", nothing))
# sff_filename is where chloe wrote the data:
# in this case NC_020019.1.sff in the local directory
# instead of `nothing` specify an actual filename.
If you have installed Chloe as a (local) package the you can use:
using Distributed
addprocs(4)
@everywhere workers() begin
using Chloe
global REFS = readDefaultReferences()
end
# Note that neither REFS nor annotate_one is defined in the REPL
# ...but all is still good.
r = @spawnat :any annotate_one(REFS, "testfa/NC_020019.1.fa")
# etc...
This takes advantage of the precompilation of julia packages. Also you don't need to be in the repo directory!
Running the chloe server. In a terminal type:
JULIA_NUM_THREADS=8 julia distributed.jl --level=info --workers=4 \
--broker=ipc:///tmp/chloe-client
(Julia as of 1.4 refuses to use more threads that the number of CPUs on your machine:
Sys.CPU_THREADS
or python -c 'import multiprocessing as m; print(m.cpu_count())'
)
In another terminal start julia:
using JuliaWebAPI
i = APIInvoker("ipc:///tmp/chloe-client");
apicall(i, "ping") # ping the server to see if is listening.
# fasta and output should be relative to the server'
# working directory, or specify absolute path names! yes "chloe"
# should be "annotate" but...
ret = apicall(i, "chloe", fastafile, outputfile) # outputfile is optional
code, data = ret["code"], ret["data"]
@assert code === 200
# actual filename written and total elapsed
# time in ms to annotate
sff_fname, elapsed_ms = data["filename"], data["elapsed"]
# to terminate the server cleanly (after finishing any work)
apicall(i, "exit")
The actual production configuration uses distributed.jl
(for threading issues) and runs
the server as a client of a DEALER/ROUTER server
(see bin/broker.py
or src/broker.jl
and the Makefile
). It connects to the
DEALER end on tcp://127.0.0.1:9467
. The
chloe website
connects to ipc:///tmp/chloe-client
which
is the ROUTER end of broker. In this setup
you can run multiple chloe servers connecting
to the same DEALER.
Update: you can now run a broker with julia as julia src/broker.jl
or specify --broker=URL
to distrbuted.jl
. No
python required. (best to use -b default
to select
this projects default endpoint (ipc:///tmp/chloe-client
))
The worker process can be made to share the reference Data using memory mapped data files. You can create these by running:
julia chloe.jl mmap reference_1116/*.fa
The Chloë server can be run remotely through a ssh tunnel.
On the remote server:
git clone ...
the chloe github repo and download the julia runtime (natch!).
And install all chloe package dependencies globally (see above).
Then -- on your puny laptop -- you can run something like:
ssh you@bigserver -t -o ExitOnForwardFailure=yes -L 9476:127.0.0.1:9467 \
'cd /path/to/chloe;
JULIA_NUM_THREADS={BIGNUM} /path/to/bin/julia --startup-file=no --color=yes
distributed.jl --broker=tcp://127.0.0.1:9467 -l info --workers=4'
The port 9467
is an entirely random (but hopefully unused both on
the remote server and locally) port number. The broker port must match
the ssh port specified by -L
. {BIGNUM}
is the enormous number
of CPUs your server has ;).
Since the remote server has no access to the local filesystem you need
to use annotate
instead of chloe
to annotate your your
fasta files e.g:
using JuliaWebAPI
i = APIInvoker("tcp://127.0.0.1:9467")
# read in the entire fasta file
fasta = read("testfa/NC_020019.1.fa", String)
ret = apicall(i, "annotate", fasta)
code, data = ret["code"], ret["data"]
@assert code === 200
sff = data["sff"] # sff file as a string
# terminate the server
apicall(i, "exit")
Nothing interesting beyond here....
Install package with
Pkg.clone("https://github.com:arabidopsis/chloe.git")
To stop julia vomiting unhelpful stacktraces when ^Ctrl-C
ing
run julia with --handle-signals=no
. Don't know what it does
but distributed.jl
will just exit on Ctrl-C.
But don't send a kill -INT
this will not clean up the background
broker (if it's running)
See:
Possibly useful REPL packages
- add Revise: reload edited files within REPL
- add OhMyREPL: pretty print code
@code_warntype f()
check type system- add ProfileView: ProfileView.jl
from stackoverflow:
There is no way to send a subset of the methods in a package to another machine. Very often methods refer to other types and functions in the same module, so the system would have to at least send all dependencies as well. That could work, but the bigger problem is deciding whose responsibility it is to distribute code, and when. For example, initially your library might decide to send itself (or parts of itself) to other nodes, but then the user might later want to do a parallel map of your library functions, such that the whole library is needed on every node. This gets very complex, so it is far simpler for everybody just to load all needed code on all nodes as early as possible.
This only really is of interest with using the add_worker
method that tries
to add new workers to the running server. If the server was
started by loading Chloe
as a package then you can't add new workers by just sending
the required code: The new worker seems to be expecting a Chloe module.
Use distributed.jl
if you want to expand workers dynamically.
- Ian Small: ian.small@uwa.edu.au
- Ian Castleden: ian.castleden@uwa.edu.au