swcarpentry/modern-scientific-authoring

Figure to emphasize value of text base file in 01-mess.md

Opened this issue · 9 comments

It might help to have a figure. Something showing the MD file as a source, that you can then generate into multiple output formats (PDF, HTML, DOCX). This depends on the lesson structure. But I think it would fit philosophically with what you are communicating here.

Next to the MD document you could list bullet points like (strong version control, raw code, etc.), and then emphasize that the other formats are for presentation/submission, but that the core document is the MD (or Latex, or whatever).

wking commented

On Wed, Feb 10, 2016 at 09:02:11AM -0800, Mark Mandel wrote:

It might help to have a figure. Something showing the MD file as a
source, that you can then generate into multiple output formats
(PDF, HTML, DOCX).

Like this figure 1? ;) Although that glosses over the point that
transitions between these formats often have reasonable impedance
missmatches, which makes it hard to get quite the PDF (or whatever)
you want from your source without some per-target hoop jumping.

Given that source & output are effectively one file in Word, but are separate if you are using any text-based source file, I thought a more basic introduction might be useful. So, similar to the figure you referenced--but simplified, annotated, and focused on the steps that will be covered in the lesson. If additional complexity is built onto the figure I think it would be useful to add in the guts of the conversion and how images, styles, references, etc. are incorporated (as was done on this blog [1]). Here is a basic concept.

swc-authorship-fig
swc-authorship-fig.pdf

[1]: http://kieranhealy.org/files/misc/workflow-rmd-md.png from http://kieranhealy.org/blog/archives/2014/01/23/plain-text/

wking commented

On Wed, Feb 10, 2016 at 12:30:50PM -0800, Mark Mandel wrote:

… simplified, annotated, and focused on the steps that will be
covered in the lesson…

Yeah, +1 to a workflow diagram (or diagrams) for any workflows covered
by this lesson. That gives much tighter scoping than trying to lay
out all possible workflows.

Here is a basic concept…

I'm not sure what “Elements are fixed (except when edited)” is trying
to convey. Probably that inclusions like graphics are separate files
(in which case I'd add separate entries for them in the workflow
diagram), like Healy does in the blog you cite (although in his case
they're generated from .Rmd).

Good points. Was providing this as feedback for @gvwilson based on his post to the SWC Discuss list. Didn't focus on details since since if it becomes useful then it would be for a revised document.

wking commented

On Wed, Feb 10, 2016 at 12:54:42PM -0800, Mark Mandel wrote:

Was providing this as feedback for @gvwilson based on his post to
the SWC Discuss list…

Fair enough. So think our recommendation for Greg is “add some
figures outlining the workflows you'll cover” :). Which plays nicely
into the figures vs. text dichotomy he discusses in 01-mess.md and
this recent blog post 1.

...and/or show some snippets of source files for the same "Hello World" document in Word (binary gobbledigook), LaTeX and Markdown to cement the differences.

wking commented

On Wed, Feb 10, 2016 at 02:50:46PM -0800, Stuart Rossiter wrote:

… and/or show some snippets of source files for the same "Hello
World" document in Word (binary gobbledigook)…

I think .docx are just zipped up XML and stuff? I think a fairer
comparison would be between the Markdown (or whatever) and whatever
file in the zipped bundle contained “Hello World”.

@wking Yeah, I agree for a technical comparison, but I was thinking more just to emphasise why VCS systems have a hard time with Word and to help with the simple user perspective: for Word / LibreOffice it is "I write it WYSIWYG and there's a 'binary' file produced"; for the others it's "I write it using markup in a text file and that is the (source) file", or alternatively you might be using a WYSIWYG front-end like Overleaf to produce a text-based source file).

wking commented

On Fri, Feb 12, 2016 at 03:10:55AM -0800, Stuart Rossiter wrote:

… for Word / LibreOffice it is "I write it WYSIWYG and there's a
'binary' file produced"; for the others it's "I write it using
markup in a text file and that is the (source) file", or
alternatively you might be using a WYSIWYG front-end like Overleaf
to produce a text-based source file).

There isn't a rigid divide there, it's a continuum. I'd guess it
comes down to “how much processing does my editor do to convert
between the UI and the disk format?” and “how wide-spread are
libraries for doing that work?” For example, if my editor wrote a
UTF-64 “text” file 1, today's Git would consider it “binary”.

Along those lines, I wonder how useful zipping/unzipping in
smudge/clean filters 2 would help with Word versioning. It looks
like it's hard to get working 3, and when it does LibreOffice (and
likely the others) may churn identifiers for no particular reason 4.