vatlab/sos-notebook

A proper mixed codemirror mode

BoPeng opened this issue · 21 comments

The SoS CodeMirror mode, as defined in kernel.js, is a Python mode + magic and SoS statements. It does not highlight code in other languages correctly. In theory, we should

  1. use R code mirror mode (plus magics) for R cells, and so on.
  2. Recognize SoS actions and highlight lines after R: as R code etc. It should in theory also recognize expand=True etc to correctly identify and highlight SoS expressions.

Item 1 could be done when the language of a cell is switched. We should be able to create new codemirror mode and switch to it. There is however a slight problem with the name of the language (e.g. R) might not match name of code mirror.

Item 2 could be done with nested codemirror mode, and there are examples such as htmlmixedmode.

This is something that I have always wanted to do, but could not do it with my limited experience in JS, until now.

The new SoS mode has the following features

  1. If has blue header, but with options parsed in python
  2. If has bold input, output etc, but with options parsed in python
  3. Recogze actions and use their respecitve mode for syntax highlighting. That is to say, things after R are now treated as R code. Options of R: are still parsed in Python. It uses em for the embedded text, but perhaps some gray background would look better. It currently only support R, Python, and markdown actions such as RMarkdown, but this is super easy to expand.
  4. It recognize option expand=True and expand='${ }' etc, and highlight the expressions by default.
  5. Unrecognized actions would not have syntax highlighting.
  6. The codemirror mode now works in JupyterLab so you can use JupyterLab as a nice SoS script editor.

The goal here is to help users edit and debug SoS script. For example, without expand=True the { } would not be highlighted.

Many things can be fine tuned because we now can define our own styles. Suggestions on further improvements are welcome. The next step is to use R codemirror mode for R cells, etc, with %expand magic recognized.

gaow commented

I see ... I was about to comment that I do not see R syntax properly highlighted, judged by %*%. Test here:

http://codemirror.net/mode/r/

(input something like x %*% y)

But maybe it is what it is?

The codemirror mode now works in JupyterLab so you can use JupyterLab as a nice SoS script editor.

Will happily give that a go!

It should work with Jupyter. Did you python -m sos_notebook.install?

The

R:
script

style should be the same as script in a R notebook (except for the italic font).

gaow commented

I did not try myself, I'm only commenting on what you've posted, that screen shot, that %*% did not get highlighted (there is %*% in the code you posted above). So maybe you've fixed it now?

I do not see that operator highlighted in Jupyter. I suppose that page used some special css.

I suppose a good indicator should be the $ symbol, which was marked as red (error) by python.

gaow commented

I'm re-opening this ticket for the 2 observations below on the master of both sos and sos-notebook. , when multiple steps are written in the same cell:

  1. Markdown problem:

2018-03-29-19-42-34_scrot

  1. Italic fonts after certain point:

2018-03-29-20-18-09_scrot

Also some keywords expand etc have not been highlighted.

Could you send me the pure text version?

gaow commented

Sure, see below:

[hg19_reference_1 (download)]
# Download `hg19.2bit` and `twoBitToFa` from {ucsc_url}
ucsc_url = "http://hgdownload.cse.ucsc.edu"
output: f"{resource_dir}/hg19.2bit", f"{resource_dir}/twoBitToFa"
download: dest_dir = resource_dir, expand = True
    {ucsc_url}/goldenPath/hg19/bigZips/hg19.2bit
    {ucsc_url}/admin/exe/linux.x86_64/twoBitToFa

[hg19_reference_2 (decompress hg19.fa)]
# Use `twoBitToFa` to extract `hg19.fa` from `hg19.2bit`
output: f"{resource_dir}/{ref_fa}"
bash: expand = True
    chmod +x {_input[1]}
    {_input[1]} {_input[0]} {_output}

[hg19_reference_3 (gene annotations)]
# Download `Homo_sapiens.GRCh38.91.gtf.gz` from Ensembl
# https://useast.ensembl.org/info/data/ftp/index.html
ensembl_ftp = 'ftp://ftp.ensembl.org/pub/release-91/gtf/homo_sapiens/'
output: f"{resource_dir}/{ref_gtf}"
download: dest_dir = resource_dir, expand = True
    {ensembl_ftp}/{ref_gtf}

and

[align_2 (WASP intersecting SNP): shared = {'wasp_split': 'step_output'}]
# WASP finding unbiased reads intersecting with SNP
depends: sos_step('wasp')
input: group_by = 1, concurrent = True
output: f"{_input:n}.remap.fq1.gz", f"{_input:n}.remap.fq2.gz", f"{_input:n}.to.remap.bam", f"{_input:n}.keep.bam"
bash: workdir = f'{cwd:a}', expand = True
    source activate py27
    python {wasp_dir}/find_intersecting_snps.py {_input} \
        --snp_dir {resource_dir}/wasp_snp_list \
        --is_sorted {'--is_paired_end' if paired_end else ''}

[align_3 (STAR post alignment)]
# Align WASP remap with STAR
# Followd by samtools remove reads with quality less than {qual_cutoff}
parameter: qual_cutoff = 10
input: group_by = 4, pattern = '{name}.{qual}.remap.{ext}', concurrent = True
output: expand_pattern(f'{_name[0]}.remapped.qual{qual_cutoff}.bam')
run: workdir = f'{cwd:a}', docker_image = 'gaow/debian-ngs', expand=True
    STAR --genomeDir {resource_dir} \
        --readFilesIn {_input[0]} {_input[1]} \
        --readFilesCommand zcat \
        --runThreadN {ncpu} --outStd BAM_SortedByCoordinate \
        --outSAMtype BAM SortedByCoordinate \
        --sjdbGTFtagExonParentTranscript {resource_dir}/{ref_gtf} |
    samtools view -bq {qual_cutoff} > {_output}

[align_4 (WASP remove ambiguously mapped reads)]
to_remap = paths([wasp_split[i:i+4][2] for i in range(0, len(wasp_split), 4)])
input: group_by = 1, paired_with = 'to_remap', pattern = '{name}.remapped.{ext}', concurrent = True
output: expand_pattern(f'{_name[0]}.keep_remapped.{_ext[0]}')
bash: workdir = f'{cwd:a}', expand=True, stderr = f'{_output:n}.log'
    source activate py27
    python {wasp_dir}/filter_remapped_reads.py {_to_remap} {_input} {_output}

[align_5 (Merge WASP adjusted and originally kept BAM)]
kept = paths([wasp_split[i:i+4][3] for i in range(0, len(wasp_split), 4)])
input: group_by = 1, paired_with = 'kept', pattern = '{name}.keep_remapped.{ext}', concurrent = True
output: expand_pattern(f'{_name[0]}.wasp_remapped.{_ext[0]}')
run: workdir = f'{cwd:a}', docker_image = 'gaow/debian-ngs', expand=True
    samtools merge - {_input} {_kept} | samtools sort -o {_output}
    samtools index {_output}

Try again now.

gaow commented

Perfect! Although I'm not sure if I like scripts being italic. I personally think it reads better that maybe just action options written in italic but step options and script use regular font. It is just a bit weird to read italic for long scripts.

I do not know either. These things are easy to adjust, I am just not sure which way reads better. I also want to change the yellow highlighting, which is usually used by searching, and will actually interfere with searching.

gaow commented

I am just not sure which way reads better

I would vote for only what follows depends:, input:, output:, task: and (action): be italic; others upright. This would natually resolve the yellow highlighting issue because you can use italic there instead.

That would be something like this:

image

I like the background of interpolated text better because my eye cannot really tell the difference between normal and italic well.

gaow commented

my eye cannot really tell the difference between normal and italic well.

This is interesting ... italic and upright fonts are sharply different, and the version above reads good in my eyes.

So maybe for that reason this is not very obvious to you, but in your first step of output, output with hg19.2bit is italic yet with twoBitToFa is upright -- need a fix?

I like the background of interpolated text better

Also, maybe using some other font color, or even and underline is better than background color?

talic and upright fonts are sharply different,

The are different if I look for it... background color let me know instantly which code block has expand option.

in your first step of output, output with hg19.2bit is italic yet with twoBitToFa is upright -- need a fix?

It is a bug in the detection of end of options. Will fix.

other font color, or even and underline is better than background color?

Maybe, but underline would not work well with _input, I suppose.

OK, I checked other codemirror "nested" style and they do not highlight inside style with italic etc, so I have removed the em style (but allow styling with .cm-sos-script class). I have also separated cm-sos-sigil and cm-sos-interpolated so that we can highlight sigil and interpolated text separately. The css in kernel.js leads to the following

image

which I will have to use more to like/dislike it.

It is disappointing that codemirror does not highlight python expressions inside f-string though. Perhaps I will figure out a patch and submit to CodeMirror later.

gaow commented

Looks good to me! Will keep posting like/dislikes after using it more.

Syntax highlighter does not handle

{{{4*10}}}

or

{
multiline
}

OK, I ended up still using background color to highlight interpolated text.

image

The reason is that I would like to help users avoid mistakes in the following cases

R: expand=True
   if (true) {
      do something
   }

highlighting braces does not work well because two braces are separated.