Self-documenting scripts

Question

Self-documenting scripts

Ambrevar opened this issue 7 years ago · 37 comments

Add a -h commandline flag which displays the documentation of a script, then exits.

The flag could be called several times.
The documentation could be stored in a global variable.

Answer 1 · 2018-01-19T21:07:28.000Z

Think of a mecanism to make documentation convenient to write contextually. For instance: allow for documenting before each function and each variable, then concatenate all the docstrings together.

Answer 2 · 2018-02-13T22:01:04.000Z

ce0b2c1 fixes this.

Example:

$ demlo -h 60-path

Set the output path according to tags.

Note that 'track' refers to the track number, not the title.

GLOBAL OPTIONS

- ossep: string (default: '/')
  OS path separator.

- lib: string (default: '/home/ambrevar/music')
  Path to the music library.

- fsf: regular expression (default: '\s*/\s*')
  Some filesystems don't accept all characters and those need be replaced.
  Every element of the output path which matches this regular expression will be
  replaced by " - ".

RULES

We make sure no unnecessary subfolders are created.
Extension is set from format.
We pad zeros (2 digits) in track number for file browsers without numeric sorting capabilities.

We try to guess if the genre belongs to classical music.
Since classical pieces usually get recorded several times, the date is not
very relevant. Thus it is preferable to sort classical albums by name on the filesystem.

@fictionic: What do you think?

Answer 3 · 2018-02-13T22:07:25.000Z

@fictionic: I've just commited a substential amount of changes:

Clarified messages fixed make warnings / errors more explicit.
Renamed rmsrc -> removesource.
Fixed lost covers when removesource is used.
Only .lua extensions are accepted for scripts. (This fixes a clash when editors would leave backups behind such as 10-tag.lua~).
DEMLORC env variable was renamed to DEMLO_CONFIG.
Renamed config file.
System config is used as a fallback.

See the "Breaking changes" section in the readme.

Another significant improvement in my opinion: the 15-discfrompath script which guesses the right disc number from the parent folder. I think this should do the right thing most of the time. Let me know what you think.

Answer 4 · 2018-02-13T22:10:34.000Z

I'm utlimately planning to embed the "godoc" document into the program, accessible via a -intro or -doc commandline option.

I'm thinking of dividing it into sections that can be specified as an option:

$ demlo -intro covers

Answer 5 · 2018-02-23T05:17:35.000Z

Very good idea overall. With how my scripts are headed, it'll be important to have documentation apart from just reading the comments.
How does this integrate with #3? I think doing demlo -h path is much better than having to know the number prefix just to get help
3, As for 15-discfrompath, it's basically covered within the functionality of demloconf's tag extraction (which I still haven't modularized; sorry! I've been real busy lately). So I don't see the purpose unless you don't want to incorporate that entire feature, which does quite a bit more than that, and is very extensible.
Can you explain more what you mean by the -intro/-doc stuff? Each script will have multiple sections of documentation?

Answer 6 · 2018-02-23T07:54:35.000Z

Yes, I'll make the `-h` queries work on regexp. As for 15-discfrompath, it's as modular as it gets. We can incorporate further elements of tag inference later, no problem with that. For now I felt that it was very needed, so I just went ahead. `intro`/`doc`: This would basically most of this page: https://godoc.org/github.com/Ambrevar/demlo So everything that relates the general concepts of Demlo but nothing specific to some scripts in particular.

Answer 7 · 2018-02-23T16:03:10.000Z

Yes, I'll make the -h queries work on regexp.

Wait, should they work on regexp, or just a literal match of the script name minus the number prefix? What help info should be displayed if a regex matches multiple scripts?

Answer 8 · 2018-02-23T16:06:51.000Z

As for 15-discfrompath, it's as modular as it gets. We can incorporate
further elements of tag inference later, no problem with that.

Ah, I see. May I suggest establishing a naming scheme for the tag extraction scripts, so they can all be called at once with a regexp? Like 15-extract- or something

Answer 9 · 2018-02-23T16:08:02.000Z

Maybe all of them separated by sections holding their names. What do you think?

Answer 10 · 2018-02-23T16:10:22.000Z

I was going to rename all files. I think `extract` is not obvious enough. Why not `tag`?

Answer 11 · 2018-02-23T16:12:57.000Z

Maybe all of them separated by sections holding their names.

That could work. The other option would be to tell the user which scripts match, and let them run the command again with an exact name. This would let -h double as a way of searching the available scripts.

Answer 12 · 2018-02-23T16:14:38.000Z

I think extract is not obvious enough. Why not tag?

Well because 'tag' also refers to the process of cleaning up the existing tags. Is anything else besides tags extracted? If, not I think 'extract' is a good bet. You could always do tag-extract-<tagname>, so they'd be run when the user gives a regexp tag.

Answer 13 · 2018-02-23T16:16:21.000Z

Here is a counterargument to complete modularization of the tag extraction scripts. If the user wants to extract a very particular set of tags, it's much more annoying to list each script on its own, rather than supply something like -pre 'textr = {"title", "track", "album_artist"}', as they would do using my demloconf.

Answer 14 · 2018-02-23T16:17:34.000Z

The available scripts are always printed at the beginning of a run. I think Demlo should just go ahead instead of being pedantic, it makes for a smoother experience.

Answer 15 · 2018-02-23T16:20:15.000Z

The available scripts are always printed at the beginning of a run.

This is true. But if everything is modularized, looking through them could be a hassle. Do you not think it would be helpful to provide a sort of search functionality? Use case: "I wonder what sort of scripts there are relating to tags... demlo -h tag, oh I see there are several different kinds. Let me look into this particular one.
Otherwise you could end up printing a LOT of documentation. I know that the help text for my tagging scripts will be fairly long.

Answer 16 · 2018-02-23T16:20:36.000Z

Precisely, so I also want to rename `10-tag.lua`. "Extract", without more information, does not give away much about what it does. In the end, all we are doing is "setting the tags". I propose the following: - 10-tag-normalize - 15-tag-discfrompath - 18-tag-case

Answer 17 · 2018-02-23T16:22:58.000Z

What does "extract a particular set of tags" mean?

`-pre 'textr = {"title", "track", "album_artist"}'`

This is not much shorter than, say, `-s tag-title -s tag-track -s tag-albumartist`. Plus you get completion with the `-s` argument. But again, I don't really know your use-case.

Answer 18 · 2018-02-23T16:25:01.000Z

I may be right. But if we have completion, then requiring the exact name is fine too. Let me sleep over this.

Answer 19 · 2018-02-23T16:40:40.000Z

"Extract", without more information, does not give away much about
what it does. In the end, all we are doing is "setting the tags".

I propose the following:

- 10-tag-normalize
- 15-tag-discfrompath
- 18-tag-case

Hm, I don't know how normalize and case would work or interact. My tagging scripts have a fairly different philosophy and approach than yours, but let me just give it very quickly:
Step 1: Gather all the tag fields that should be present in the output file, by first extracting them from the file path (if desired) then rearranging them to fit a standard set of names, with logic governing which tags should exist based on the others.
Step 2: Analyze all the gathered tags, breaking them down into lists of components that should be capitalized (or left alone) independently. The components are extracted as captured matches from template regexes meant to ...match... many different features that can exist in tags (e.g., a parenthetical featured performer indication matches things like " (feat. Eminem)").
Step 3: The components are capitalized (or left alone), then assembled into a finalized set of tags.

So in my system, the role of the tag extraction scripts is more than "setting" the tags—it is either to fill in missing tags for later processing (the most common case), or to override the existing tags (which requires a flag to be passed, in demloconf).

This is not much shorter than, say, -s tag-title -s tag-track -s tag-albumartist.
Plus you get completion with the -s argument.

Good point. I concede this may be superior, so long as the extraction scripts can be easily refered to by regexps. The only problem is how to specify which extracted tags should override the existing ones (as described above). What do you think about this?

Answer 20 · 2018-02-23T17:12:47.000Z

Your approach certainly has its benefits and seems very powerful, but it requires quite some effort to grasp. Demlo is not that approachable already, I don't want to make it more obscur than it already is. Let's not forget that Demlo can fetch tags from MusicBrainz with the `-t` option, it should provide for good results at a decent speed. At this point, I'm wondering if all the extra complexity isn't unnecessary.

Hm, I don't know how `normalize` and `case` would work or interact.

`case` is run after `normalize`, so it works on its result. It can also run without `normalize`. The three scripts work on tags.

The only problem is how to specify which extracted tags should override the existing ones (as described above). What do you think about this?

Why wouldn't it work with the command I suggested above, assuming those scripts do just that, setting one specific override. If I misunderstood what you meant, please provide a detailed use-case.

Answer 21 · 2018-02-23T21:41:24.000Z

Let's not forget that Demlo can fetch tags from MusicBrainz with the
-t option, it should provide for good results at a decent speed.
At this point, I'm wondering if all the extra complexity isn't unnecessary.

I find that tag databases are too inconsistent to be used in my library. And as for the complexity, I believe that a correct titlecase algorithm is simply going to be complex.

case is run after normalize, so it works on its result. It can also
run without normalize. The three scripts work on tags.

But what exactly does normalize do?

Why wouldn't it work with the command I suggested above, assuming those
scripts do just that, setting one specific override.

How would you indicate that you want to extract a particular tag if and only if it doesn't already have a value in the file?

Answer 22 · 2018-02-24T00:03:39.000Z

`normalize` is a little monolithic right now, it removes all unwanted tags and normalizes the genre. Maybe we could split it into additional scripts.

How would you indicate that you want to extract a particular tag if and only if it doesn't already have a value in the file?

Something along the lines of ``` if empty(o.title) then o.title = extract_title(i.path) end ``` ?

Answer 23 · 2018-02-24T01:40:16.000Z

Maybe we could split it into additional scripts.

Yeah I think that's best. Those bits of functionality seem pretty different.

Seomthing along the lines of
...

How would that work from the command line, though? It seems like a strategy where all tags that can be extracted are extracted, but saved somewhere, and then the cmdline arguments tell Demlo which tags to actually put into the file. Using a function extract_title() wouldn't be very good for this, because each separate function would have to read the same string; extracting them all at the start can be done in one blow with multiple capturing regex groups.

Answer 24 · 2018-02-27T15:54:38.000Z

OK, now I understand what you mean. I'll think about something. I you can make your current implementation as minimal and modular as possible, a PR would be very welcome!

Answer 25 · 2018-02-27T15:55:19.000Z

-h now accept regular expressions too. If several scripts are matched, it warns the user with the list of matches.

Answer 26 · 2018-02-27T16:39:40.000Z

Problem: the -h utility tries to parse the entire script, which will fail if it refers to any entities defined in previous scripts (which is largely how demloconf works). It should stop parsing after finding a call to help(), no?

Answer 27 · 2018-02-27T20:07:56.000Z

It's not possible, Lua and Go are not expressive enough to do that. At least not easily with the current code. I'm realizing now that Go+Lua was probably a mistake, but that was a few years ago and I didn't have enough experience to see that coming. That said, can you tell precisely what fails in your case? As far as Demlo is concerned, scripts are meant to be able to run independently from each other.

Answer 28 · 2018-02-28T16:16:38.000Z

That said, can you tell precisely what fails in your case?

I define a table settings in 001-globals.lua that stores things used by every subsequent script. Running demlo -h path gives

error parsing script: /home/dylan/.config/demlo/scripts/32-path.lua:12: attempt to index global 'settings' (a nil value)

Answer 29 · 2018-02-28T16:58:21.000Z

This is to be expected. Look how the other global variables are declared. You should do something like this: local localsettings = settings or {} -- '{}' is the default value, can be anything. -- ... foo = localsettings["option"] If your settings have sub-tables or function calls, your code should either ensure the structure integrity is correct and provide appropriate defaults, or always handle nil cases safely. If you really want to stick to a monolithic approach, you can simply exit your script when `settings` is nil: if settings == nil then return end

Answer 30 · 2018-02-28T17:54:01.000Z

Ok yeah. I'll do that.

Somewhat related: Do you think there would be good reason to expose a non-debug print function to the scripts, so they could tell the user things like "settings is not defined; aborting"?

Answer 31 · 2018-02-28T21:40:25.000Z

Well, this goes together with the `input` functions we mentioned in another issue. `input` needs some form of printing, so it has to be there at some point. I don't see much use for printing beside that.

Answer 32 · 2018-03-01T16:05:38.000Z

Here's an argument in favor.

Often when I run my scripts, I don't immediately understand what it's doing in a certain case; I need to pass -debug. But then a ton of information is printed, as should be expected in any debug mode. What's missing is debug levels. silent, error, warning, info, and debug are pretty standard. What about including two more print functions: error and info, which do the same thing as debug but print in different colors?

As I see it, the scripts can get complex enough to the point that the user won't be able to figure out exactly what will happen in every case, so it would be very helpful if they could give the user some hints about their behavior.

Answer 33 · 2018-03-01T17:30:26.000Z

You can already implement that yourself. For instance, with a `debug_level` global variable: ``` function info(s) if debug_level >= 1 then debug(info_prefix .. s) end end

…

-- ... info("My info string") ``` Then call ``` $ demlo -debug -pre 'debug_level=1' ... ``` Would that be enough?

Answer 34 · 2018-03-02T16:18:38.000Z

Yes but this doesn't let you print in different colors! Maybe allow an optional argument to debug() to specify the color? I guess it's not that important.

Answer 35 · 2018-03-03T15:39:05.000Z

Well, coloring is done via ANSI escapes, which is ugly and brittle. I don't think this should be a key requirement for Demlo... Focusing on an Emacs interface seems much more sensible in my opinion. That said, what do you want to print in color? The prefix? I could also export a more low-level `print` function I guess.

Answer 36 · 2018-03-03T17:43:29.000Z

I don't think this should be a key requirement for Demlo...

Oh certainly not. Just an enhancement.

Focusing on an Emacs interface seems much more sensible in my opinion.

You're on your own on that front, heheh. I have zero use for an emacs interface.

That said, what do you want to print in color? The prefix?

Yes. Currently debug messages are printed with @@ in cyan, and info messages (like "Preview mode, no file was processed") are printed with :: in magenta. Perhaps !! in red for errors and ## in yellow for warning?

Answer 37 · 2018-03-17T07:29:52.000Z

Closing this. Regarding the other points, we can further discuss them in their dedicated issues.

Regarding the Emacs interface: it does not mean that you have to use Emacs as your primary editor. Think of Emacs as a toolkit. Wait for it, you might be surprised... :)