This is a StdLib service to perform exhaustive, ungapped DNA sequence alignment in a massively parallel fashion. It makes use of the NtSeq Node.js package to perform alignment and derive results.
You can find this service on StdLib: keith/sequence, or GitHub: keithwhor/stdlib-sequence.
You can provide any genome sequence you want with the seq
parameter,
though by default the sequence is the first 1,000,000 nucleotides of the E.
coli K12 Genome.
f('keith/sequence')({q: 'GATTACACAT', count: 2}, (err, result) => {
// do something with result
});
Also available at https://keith.stdlib.com/sequence?q=GATTACACAT&count=2
Returns;
{
"results": [
{
"position": 474308,
"sequence": "GATTACGCAT",
"mask": "GATTAC-CAT",
"cover": "GATTACRCAT"
},
{
"position": 965004,
"sequence": "GATTGCACAT",
"mask": "GATT-CACAT",
"cover": "GATTRCACAT"
}
]
}
You can use this service via HTTPS:
https://keith.stdlib.com/sequence?q=ATGC
Via the StdLib CLI "f" command:
$ npm install lib -g
$ f keith/sequence --q ATGC
Or in a Node.js / Browser-based project using the "f" package:
const f = require('f');
f('keith/sequence')({q: 'ATGC'}, (err, result) => {
// handle result
});
This service accepts a few parameters;
q
: Query sequence for alignment (what you're searching for)
seq
: Genome sequence to search through
count
: Number of results to return (default 1)
repeat
: Repeats of query sequence (repeat query sequence this many times)
stats
: Show search statistics
https://keith.stdlib.com/sequence?q=GATTACACAT&count=1&stats
{
"results": [
{
"position": 474308,
"sequence": "GATTACGCAT",
"mask": "GATTAC-CAT",
"cover": "GATTACRCAT"
}
],
"stats": {
"length": {
"q": 10,
"seq": 1000000
},
"workers": 0,
"time": {
"total": 1791,
"prepare": 83,
"map": 722,
"reduce": 0,
"sort": 986
}
}
}
results
is an Array of results, order by number of matches.
result.position
is the position of the matching sequence in the target genome.
result.sequence
is the sequence of the match, beginning at result.position
.
result.mask
is the "sequence mask" from NtSeq, i.e.
the "pessimistic" sequence that represents the intersection of both query and target sequences.
result.cover
is the "sequence cover" from NtSeq, i.e.
the "optimistic" sequence that represents the union of both query and target sequences.
stats
is an Object showing general query statistics, will only be provided if
the stats
query parameter is set.
stats.length
shows the total function input length.
stats.workers
shows the parallelization amount (0 means no workers were dispatched)
stats.time
shows a breakdown of time (in ms) spent in each step
To see an example of StdLib MapReduce in action, simply specify inputs that cause parallelization to occur (more than 1,000,000,000 nt²).
$ f keith/sequence --q A --repeat 10000 --count 0 --stats
https://keith.stdlib.com/sequence?q=A&repeat=10000&count=0&stats
{
"results": [],
"stats": {
"length": {
"q": 10000,
"seq": 1000000
},
"workers": 10,
"time": {
"total": 6497,
"prepare": 47,
"map": 5504,
"reduce": 46,
"sort": 900
}
}
}
If you'd like to deploy a copy of this service on StdLib, use the StdLib CLI tools;
$ npm install lib -g
$ lib init
$ lib up
Make sure to change {"stdlib": {"name": "keith/sequence"}}
in package.json
to match your username and desired service name.
Thanks for checking this service out. I look forward to seeing other people build out MapReduce example on StdLib. :)
You can sign up for StdLib here.
Check out StdLib on GitHub.
Follow us on Twitter, @polybit.
Or follow me specifically, @keithwhor.
Happy Building!