A proper mixed codemirror mode

Question

A proper mixed codemirror mode

BoPeng opened this issue 7 years ago · 21 comments

The SoS CodeMirror mode, as defined in kernel.js, is a Python mode + magic and SoS statements. It does not highlight code in other languages correctly. In theory, we should

use R code mirror mode (plus magics) for R cells, and so on.
Recognize SoS actions and highlight lines after R: as R code etc. It should in theory also recognize expand=True etc to correctly identify and highlight SoS expressions.

Item 1 could be done when the language of a cell is switched. We should be able to create new codemirror mode and switch to it. There is however a slight problem with the name of the language (e.g. R) might not match name of code mirror.

Item 2 could be done with nested codemirror mode, and there are examples such as htmlmixedmode.

Answer 1 · 2018-03-25T23:21:58.000Z

BoPeng commented 7 years ago

Answer 2 · 2018-03-25T23:31:55.000Z

This is something that I have always wanted to do, but could not do it with my limited experience in JS, until now.

The new SoS mode has the following features

If has blue header, but with options parsed in python
If has bold input, output etc, but with options parsed in python
Recogze actions and use their respecitve mode for syntax highlighting. That is to say, things after R are now treated as R code. Options of R: are still parsed in Python. It uses em for the embedded text, but perhaps some gray background would look better. It currently only support R, Python, and markdown actions such as RMarkdown, but this is super easy to expand.
It recognize option expand=True and expand='${ }' etc, and highlight the expressions by default.
Unrecognized actions would not have syntax highlighting.
The codemirror mode now works in JupyterLab so you can use JupyterLab as a nice SoS script editor.

The goal here is to help users edit and debug SoS script. For example, without expand=True the { } would not be highlighted.

Many things can be fine tuned because we now can define our own styles. Suggestions on further improvements are welcome. The next step is to use R codemirror mode for R cells, etc, with %expand magic recognized.

Answer 3 · 2018-03-25T23:34:58.000Z

I see ... I was about to comment that I do not see R syntax properly highlighted, judged by %*%. Test here:

http://codemirror.net/mode/r/

(input something like x %*% y)

But maybe it is what it is?

The codemirror mode now works in JupyterLab so you can use JupyterLab as a nice SoS script editor.

Will happily give that a go!

Answer 4 · 2018-03-25T23:41:54.000Z

It should work with Jupyter. Did you python -m sos_notebook.install?

The

R:
script

style should be the same as script in a R notebook (except for the italic font).

Answer 5 · 2018-03-25T23:50:10.000Z

I did not try myself, I'm only commenting on what you've posted, that screen shot, that %*% did not get highlighted (there is %*% in the code you posted above). So maybe you've fixed it now?

Answer 6 · 2018-03-26T00:05:26.000Z

I do not see that operator highlighted in Jupyter. I suppose that page used some special css.

Answer 7 · 2018-03-26T00:09:17.000Z

I suppose a good indicator should be the $ symbol, which was marked as red (error) by python.

Answer 8 · 2018-03-30T02:31:27.000Z

I'm re-opening this ticket for the 2 observations below on the master of both sos and sos-notebook. , when multiple steps are written in the same cell:

Markdown problem:

Italic fonts after certain point:

Also some keywords expand etc have not been highlighted.

Answer 9 · 2018-03-30T02:58:52.000Z

Could you send me the pure text version?

Answer 10 · 2018-03-30T03:01:38.000Z

Sure, see below:

[hg19_reference_1 (download)]
# Download `hg19.2bit` and `twoBitToFa` from {ucsc_url}
ucsc_url = "http://hgdownload.cse.ucsc.edu"
output: f"{resource_dir}/hg19.2bit", f"{resource_dir}/twoBitToFa"
download: dest_dir = resource_dir, expand = True
    {ucsc_url}/goldenPath/hg19/bigZips/hg19.2bit
    {ucsc_url}/admin/exe/linux.x86_64/twoBitToFa

[hg19_reference_2 (decompress hg19.fa)]
# Use `twoBitToFa` to extract `hg19.fa` from `hg19.2bit`
output: f"{resource_dir}/{ref_fa}"
bash: expand = True
    chmod +x {_input[1]}
    {_input[1]} {_input[0]} {_output}

[hg19_reference_3 (gene annotations)]
# Download `Homo_sapiens.GRCh38.91.gtf.gz` from Ensembl
# https://useast.ensembl.org/info/data/ftp/index.html
ensembl_ftp = 'ftp://ftp.ensembl.org/pub/release-91/gtf/homo_sapiens/'
output: f"{resource_dir}/{ref_gtf}"
download: dest_dir = resource_dir, expand = True
    {ensembl_ftp}/{ref_gtf}

and

[align_2 (WASP intersecting SNP): shared = {'wasp_split': 'step_output'}]
# WASP finding unbiased reads intersecting with SNP
depends: sos_step('wasp')
input: group_by = 1, concurrent = True
output: f"{_input:n}.remap.fq1.gz", f"{_input:n}.remap.fq2.gz", f"{_input:n}.to.remap.bam", f"{_input:n}.keep.bam"
bash: workdir = f'{cwd:a}', expand = True
    source activate py27
    python {wasp_dir}/find_intersecting_snps.py {_input} \
        --snp_dir {resource_dir}/wasp_snp_list \
        --is_sorted {'--is_paired_end' if paired_end else ''}

[align_3 (STAR post alignment)]
# Align WASP remap with STAR
# Followd by samtools remove reads with quality less than {qual_cutoff}
parameter: qual_cutoff = 10
input: group_by = 4, pattern = '{name}.{qual}.remap.{ext}', concurrent = True
output: expand_pattern(f'{_name[0]}.remapped.qual{qual_cutoff}.bam')
run: workdir = f'{cwd:a}', docker_image = 'gaow/debian-ngs', expand=True
    STAR --genomeDir {resource_dir} \
        --readFilesIn {_input[0]} {_input[1]} \
        --readFilesCommand zcat \
        --runThreadN {ncpu} --outStd BAM_SortedByCoordinate \
        --outSAMtype BAM SortedByCoordinate \
        --sjdbGTFtagExonParentTranscript {resource_dir}/{ref_gtf} |
    samtools view -bq {qual_cutoff} > {_output}

[align_4 (WASP remove ambiguously mapped reads)]
to_remap = paths([wasp_split[i:i+4][2] for i in range(0, len(wasp_split), 4)])
input: group_by = 1, paired_with = 'to_remap', pattern = '{name}.remapped.{ext}', concurrent = True
output: expand_pattern(f'{_name[0]}.keep_remapped.{_ext[0]}')
bash: workdir = f'{cwd:a}', expand=True, stderr = f'{_output:n}.log'
    source activate py27
    python {wasp_dir}/filter_remapped_reads.py {_to_remap} {_input} {_output}

[align_5 (Merge WASP adjusted and originally kept BAM)]
kept = paths([wasp_split[i:i+4][3] for i in range(0, len(wasp_split), 4)])
input: group_by = 1, paired_with = 'kept', pattern = '{name}.keep_remapped.{ext}', concurrent = True
output: expand_pattern(f'{_name[0]}.wasp_remapped.{_ext[0]}')
run: workdir = f'{cwd:a}', docker_image = 'gaow/debian-ngs', expand=True
    samtools merge - {_input} {_kept} | samtools sort -o {_output}
    samtools index {_output}

Answer 11 · 2018-03-30T03:39:07.000Z

Try again now.

Answer 12 · 2018-03-30T03:45:27.000Z

Perfect! Although I'm not sure if I like scripts being italic. I personally think it reads better that maybe just action options written in italic but step options and script use regular font. It is just a bit weird to read italic for long scripts.

Answer 13 · 2018-03-30T03:47:06.000Z

I do not know either. These things are easy to adjust, I am just not sure which way reads better. I also want to change the yellow highlighting, which is usually used by searching, and will actually interfere with searching.

Answer 14 · 2018-03-30T03:52:38.000Z

I am just not sure which way reads better

I would vote for only what follows depends:, input:, output:, task: and (action): be italic; others upright. This would natually resolve the yellow highlighting issue because you can use italic there instead.

Answer 15 · 2018-03-30T04:31:55.000Z

That would be something like this:

I like the background of interpolated text better because my eye cannot really tell the difference between normal and italic well.

Answer 16 · 2018-03-30T04:49:09.000Z

my eye cannot really tell the difference between normal and italic well.

This is interesting ... italic and upright fonts are sharply different, and the version above reads good in my eyes.

So maybe for that reason this is not very obvious to you, but in your first step of output, output with hg19.2bit is italic yet with twoBitToFa is upright -- need a fix?

I like the background of interpolated text better

Also, maybe using some other font color, or even and underline is better than background color?

Answer 17 · 2018-03-30T04:55:57.000Z

talic and upright fonts are sharply different,

The are different if I look for it... background color let me know instantly which code block has expand option.

in your first step of output, output with hg19.2bit is italic yet with twoBitToFa is upright -- need a fix?

It is a bug in the detection of end of options. Will fix.

other font color, or even and underline is better than background color?

Maybe, but underline would not work well with _input, I suppose.

Answer 18 · 2018-03-30T05:44:00.000Z

OK, I checked other codemirror "nested" style and they do not highlight inside style with italic etc, so I have removed the em style (but allow styling with .cm-sos-script class). I have also separated cm-sos-sigil and cm-sos-interpolated so that we can highlight sigil and interpolated text separately. The css in kernel.js leads to the following

which I will have to use more to like/dislike it.

It is disappointing that codemirror does not highlight python expressions inside f-string though. Perhaps I will figure out a patch and submit to CodeMirror later.

Answer 19 · 2018-03-30T06:26:10.000Z

Looks good to me! Will keep posting like/dislikes after using it more.

Answer 20 · 2018-03-30T17:42:20.000Z

Syntax highlighter does not handle

{{{4*10}}}

or

{
multiline
}

Answer 21 · 2018-03-30T22:00:36.000Z

OK, I ended up still using background color to highlight interpolated text.

The reason is that I would like to help users avoid mistakes in the following cases

R: expand=True
   if (true) {
      do something
   }

highlighting braces does not work well because two braces are separated.