A proper mixed codemirror mode
BoPeng opened this issue · 21 comments
The SoS CodeMirror mode, as defined in kernel.js, is a Python mode + magic and SoS statements. It does not highlight code in other languages correctly. In theory, we should
- use
R
code mirror mode (plus magics) forR
cells, and so on. - Recognize SoS actions and highlight lines after
R:
as R code etc. It should in theory also recognizeexpand=True
etc to correctly identify and highlightSoS
expressions.
Item 1 could be done when the language of a cell is switched. We should be able to create new codemirror mode and switch to it. There is however a slight problem with the name of the language (e.g. R) might not match name of code mirror.
Item 2 could be done with nested codemirror mode, and there are examples such as htmlmixedmode.
This is something that I have always wanted to do, but could not do it with my limited experience in JS, until now.
The new SoS mode has the following features
- If has blue header, but with options parsed in python
- If has bold
input
,output
etc, but with options parsed in python - Recogze actions and use their respecitve mode for syntax highlighting. That is to say, things after
R
are now treated as R code. Options ofR:
are still parsed in Python. It usesem
for the embedded text, but perhaps some gray background would look better. It currently only support R, Python, and markdown actions such asRMarkdown
, but this is super easy to expand. - It recognize option
expand=True
andexpand='${ }'
etc, and highlight the expressions by default. - Unrecognized actions would not have syntax highlighting.
- The codemirror mode now works in JupyterLab so you can use JupyterLab as a nice SoS script editor.
The goal here is to help users edit and debug SoS script. For example, without expand=True
the { }
would not be highlighted.
Many things can be fine tuned because we now can define our own styles. Suggestions on further improvements are welcome. The next step is to use R
codemirror mode for R
cells, etc, with %expand
magic recognized.
I see ... I was about to comment that I do not see R syntax properly highlighted, judged by %*%
. Test here:
(input something like x %*% y
)
But maybe it is what it is?
The codemirror mode now works in JupyterLab so you can use JupyterLab as a nice SoS script editor.
Will happily give that a go!
It should work with Jupyter. Did you python -m sos_notebook.install
?
The
R:
script
style should be the same as script
in a R notebook (except for the italic font).
I did not try myself, I'm only commenting on what you've posted, that screen shot, that %*%
did not get highlighted (there is %*%
in the code you posted above). So maybe you've fixed it now?
I do not see that operator highlighted in Jupyter. I suppose that page used some special css.
I suppose a good indicator should be the $ symbol, which was marked as red (error) by python.
Could you send me the pure text version?
Sure, see below:
[hg19_reference_1 (download)]
# Download `hg19.2bit` and `twoBitToFa` from {ucsc_url}
ucsc_url = "http://hgdownload.cse.ucsc.edu"
output: f"{resource_dir}/hg19.2bit", f"{resource_dir}/twoBitToFa"
download: dest_dir = resource_dir, expand = True
{ucsc_url}/goldenPath/hg19/bigZips/hg19.2bit
{ucsc_url}/admin/exe/linux.x86_64/twoBitToFa
[hg19_reference_2 (decompress hg19.fa)]
# Use `twoBitToFa` to extract `hg19.fa` from `hg19.2bit`
output: f"{resource_dir}/{ref_fa}"
bash: expand = True
chmod +x {_input[1]}
{_input[1]} {_input[0]} {_output}
[hg19_reference_3 (gene annotations)]
# Download `Homo_sapiens.GRCh38.91.gtf.gz` from Ensembl
# https://useast.ensembl.org/info/data/ftp/index.html
ensembl_ftp = 'ftp://ftp.ensembl.org/pub/release-91/gtf/homo_sapiens/'
output: f"{resource_dir}/{ref_gtf}"
download: dest_dir = resource_dir, expand = True
{ensembl_ftp}/{ref_gtf}
and
[align_2 (WASP intersecting SNP): shared = {'wasp_split': 'step_output'}]
# WASP finding unbiased reads intersecting with SNP
depends: sos_step('wasp')
input: group_by = 1, concurrent = True
output: f"{_input:n}.remap.fq1.gz", f"{_input:n}.remap.fq2.gz", f"{_input:n}.to.remap.bam", f"{_input:n}.keep.bam"
bash: workdir = f'{cwd:a}', expand = True
source activate py27
python {wasp_dir}/find_intersecting_snps.py {_input} \
--snp_dir {resource_dir}/wasp_snp_list \
--is_sorted {'--is_paired_end' if paired_end else ''}
[align_3 (STAR post alignment)]
# Align WASP remap with STAR
# Followd by samtools remove reads with quality less than {qual_cutoff}
parameter: qual_cutoff = 10
input: group_by = 4, pattern = '{name}.{qual}.remap.{ext}', concurrent = True
output: expand_pattern(f'{_name[0]}.remapped.qual{qual_cutoff}.bam')
run: workdir = f'{cwd:a}', docker_image = 'gaow/debian-ngs', expand=True
STAR --genomeDir {resource_dir} \
--readFilesIn {_input[0]} {_input[1]} \
--readFilesCommand zcat \
--runThreadN {ncpu} --outStd BAM_SortedByCoordinate \
--outSAMtype BAM SortedByCoordinate \
--sjdbGTFtagExonParentTranscript {resource_dir}/{ref_gtf} |
samtools view -bq {qual_cutoff} > {_output}
[align_4 (WASP remove ambiguously mapped reads)]
to_remap = paths([wasp_split[i:i+4][2] for i in range(0, len(wasp_split), 4)])
input: group_by = 1, paired_with = 'to_remap', pattern = '{name}.remapped.{ext}', concurrent = True
output: expand_pattern(f'{_name[0]}.keep_remapped.{_ext[0]}')
bash: workdir = f'{cwd:a}', expand=True, stderr = f'{_output:n}.log'
source activate py27
python {wasp_dir}/filter_remapped_reads.py {_to_remap} {_input} {_output}
[align_5 (Merge WASP adjusted and originally kept BAM)]
kept = paths([wasp_split[i:i+4][3] for i in range(0, len(wasp_split), 4)])
input: group_by = 1, paired_with = 'kept', pattern = '{name}.keep_remapped.{ext}', concurrent = True
output: expand_pattern(f'{_name[0]}.wasp_remapped.{_ext[0]}')
run: workdir = f'{cwd:a}', docker_image = 'gaow/debian-ngs', expand=True
samtools merge - {_input} {_kept} | samtools sort -o {_output}
samtools index {_output}
Try again now.
Perfect! Although I'm not sure if I like scripts being italic. I personally think it reads better that maybe just action options written in italic but step options and script use regular font. It is just a bit weird to read italic for long scripts.
I do not know either. These things are easy to adjust, I am just not sure which way reads better. I also want to change the yellow highlighting, which is usually used by searching, and will actually interfere with searching.
I am just not sure which way reads better
I would vote for only what follows depends:
, input:
, output:
, task:
and (action):
be italic; others upright. This would natually resolve the yellow highlighting issue because you can use italic there instead.
my eye cannot really tell the difference between normal and italic well.
This is interesting ... italic and upright fonts are sharply different, and the version above reads good in my eyes.
So maybe for that reason this is not very obvious to you, but in your first step of output, output
with hg19.2bit
is italic yet with twoBitToFa
is upright -- need a fix?
I like the background of interpolated text better
Also, maybe using some other font color, or even and underline is better than background color?
talic and upright fonts are sharply different,
The are different if I look for it... background color let me know instantly which code block has expand
option.
in your first step of output, output with hg19.2bit is italic yet with twoBitToFa is upright -- need a fix?
It is a bug in the detection of end of options. Will fix.
other font color, or even and underline is better than background color?
Maybe, but underline would not work well with _input
, I suppose.
OK, I checked other codemirror "nested" style and they do not highlight inside style with italic etc, so I have removed the em
style (but allow styling with .cm-sos-script
class). I have also separated cm-sos-sigil
and cm-sos-interpolated
so that we can highlight sigil and interpolated text separately. The css in kernel.js leads to the following
which I will have to use more to like/dislike it.
It is disappointing that codemirror does not highlight python expressions inside f-string though. Perhaps I will figure out a patch and submit to CodeMirror later.
Looks good to me! Will keep posting like/dislikes after using it more.
Syntax highlighter does not handle
{{{4*10}}}
or
{
multiline
}