Ambrevar/demlo

Self-documenting scripts

Ambrevar opened this issue · 37 comments

Add a -h commandline flag which displays the documentation of a script, then exits.

The flag could be called several times.
The documentation could be stored in a global variable.

Think of a mecanism to make documentation convenient to write contextually. For instance: allow for documenting before each function and each variable, then concatenate all the docstrings together.

ce0b2c1 fixes this.

Example:

$ demlo -h 60-path

Set the output path according to tags.

Note that 'track' refers to the track number, not the title.

GLOBAL OPTIONS

- ossep: string (default: '/')
  OS path separator.

- lib: string (default: '/home/ambrevar/music')
  Path to the music library.

- fsf: regular expression (default: '\s*/\s*')
  Some filesystems don't accept all characters and those need be replaced.
  Every element of the output path which matches this regular expression will be
  replaced by " - ".

RULES

We make sure no unnecessary subfolders are created.
Extension is set from format.
We pad zeros (2 digits) in track number for file browsers without numeric sorting capabilities.

We try to guess if the genre belongs to classical music.
Since classical pieces usually get recorded several times, the date is not
very relevant. Thus it is preferable to sort classical albums by name on the filesystem.

@fictionic: What do you think?

@fictionic: I've just commited a substential amount of changes:

  • Clarified messages fixed make warnings / errors more explicit.
  • Renamed rmsrc -> removesource.
  • Fixed lost covers when removesource is used.
  • Only .lua extensions are accepted for scripts. (This fixes a clash when editors would leave backups behind such as 10-tag.lua~).
  • DEMLORC env variable was renamed to DEMLO_CONFIG.
  • Renamed config file.
  • System config is used as a fallback.

See the "Breaking changes" section in the readme.

Another significant improvement in my opinion: the 15-discfrompath script which guesses the right disc number from the parent folder. I think this should do the right thing most of the time. Let me know what you think.

I'm utlimately planning to embed the "godoc" document into the program, accessible via a -intro or -doc commandline option.

I'm thinking of dividing it into sections that can be specified as an option:

$ demlo -intro covers
  1. Very good idea overall. With how my scripts are headed, it'll be important to have documentation apart from just reading the comments.
  2. How does this integrate with #3? I think doing demlo -h path is much better than having to know the number prefix just to get help
    3, As for 15-discfrompath, it's basically covered within the functionality of demloconf's tag extraction (which I still haven't modularized; sorry! I've been real busy lately). So I don't see the purpose unless you don't want to incorporate that entire feature, which does quite a bit more than that, and is very extensible.
  3. Can you explain more what you mean by the -intro/-doc stuff? Each script will have multiple sections of documentation?

Yes, I'll make the -h queries work on regexp.

Wait, should they work on regexp, or just a literal match of the script name minus the number prefix? What help info should be displayed if a regex matches multiple scripts?

As for 15-discfrompath, it's as modular as it gets. We can incorporate
further elements of tag inference later, no problem with that.

Ah, I see. May I suggest establishing a naming scheme for the tag extraction scripts, so they can all be called at once with a regexp? Like 15-extract- or something

Maybe all of them separated by sections holding their names.

That could work. The other option would be to tell the user which scripts match, and let them run the command again with an exact name. This would let -h double as a way of searching the available scripts.

I think extract is not obvious enough. Why not tag?

Well because 'tag' also refers to the process of cleaning up the existing tags. Is anything else besides tags extracted? If, not I think 'extract' is a good bet. You could always do tag-extract-<tagname>, so they'd be run when the user gives a regexp tag.

Here is a counterargument to complete modularization of the tag extraction scripts. If the user wants to extract a very particular set of tags, it's much more annoying to list each script on its own, rather than supply something like -pre 'textr = {"title", "track", "album_artist"}', as they would do using my demloconf.

The available scripts are always printed at the beginning of a run.

This is true. But if everything is modularized, looking through them could be a hassle. Do you not think it would be helpful to provide a sort of search functionality? Use case: "I wonder what sort of scripts there are relating to tags... demlo -h tag, oh I see there are several different kinds. Let me look into this particular one.
Otherwise you could end up printing a LOT of documentation. I know that the help text for my tagging scripts will be fairly long.

"Extract", without more information, does not give away much about
what it does. In the end, all we are doing is "setting the tags".

I propose the following:

- 10-tag-normalize
- 15-tag-discfrompath
- 18-tag-case

Hm, I don't know how normalize and case would work or interact. My tagging scripts have a fairly different philosophy and approach than yours, but let me just give it very quickly:
Step 1: Gather all the tag fields that should be present in the output file, by first extracting them from the file path (if desired) then rearranging them to fit a standard set of names, with logic governing which tags should exist based on the others.
Step 2: Analyze all the gathered tags, breaking them down into lists of components that should be capitalized (or left alone) independently. The components are extracted as captured matches from template regexes meant to ...match... many different features that can exist in tags (e.g., a parenthetical featured performer indication matches things like " (feat. Eminem)").
Step 3: The components are capitalized (or left alone), then assembled into a finalized set of tags.

So in my system, the role of the tag extraction scripts is more than "setting" the tags—it is either to fill in missing tags for later processing (the most common case), or to override the existing tags (which requires a flag to be passed, in demloconf).

This is not much shorter than, say, -s tag-title -s tag-track -s tag-albumartist.
Plus you get completion with the -s argument.

Good point. I concede this may be superior, so long as the extraction scripts can be easily refered to by regexps. The only problem is how to specify which extracted tags should override the existing ones (as described above). What do you think about this?

Let's not forget that Demlo can fetch tags from MusicBrainz with the
-t option, it should provide for good results at a decent speed.
At this point, I'm wondering if all the extra complexity isn't unnecessary.

I find that tag databases are too inconsistent to be used in my library. And as for the complexity, I believe that a correct titlecase algorithm is simply going to be complex.

case is run after normalize, so it works on its result. It can also
run without normalize. The three scripts work on tags.

But what exactly does normalize do?

Why wouldn't it work with the command I suggested above, assuming those
scripts do just that, setting one specific override.

How would you indicate that you want to extract a particular tag if and only if it doesn't already have a value in the file?

Maybe we could split it into additional scripts.

Yeah I think that's best. Those bits of functionality seem pretty different.

Seomthing along the lines of
...

How would that work from the command line, though? It seems like a strategy where all tags that can be extracted are extracted, but saved somewhere, and then the cmdline arguments tell Demlo which tags to actually put into the file. Using a function extract_title() wouldn't be very good for this, because each separate function would have to read the same string; extracting them all at the start can be done in one blow with multiple capturing regex groups.

OK, now I understand what you mean. I'll think about something. I you can make your current implementation as minimal and modular as possible, a PR would be very welcome!

-h now accept regular expressions too. If several scripts are matched, it warns the user with the list of matches.

Problem: the -h utility tries to parse the entire script, which will fail if it refers to any entities defined in previous scripts (which is largely how demloconf works). It should stop parsing after finding a call to help(), no?

That said, can you tell precisely what fails in your case?

I define a table settings in 001-globals.lua that stores things used by every subsequent script. Running demlo -h path gives

error parsing script: /home/dylan/.config/demlo/scripts/32-path.lua:12: attempt to index global 'settings' (a nil value)

Ok yeah. I'll do that.

Somewhat related: Do you think there would be good reason to expose a non-debug print function to the scripts, so they could tell the user things like "settings is not defined; aborting"?

Here's an argument in favor.

Often when I run my scripts, I don't immediately understand what it's doing in a certain case; I need to pass -debug. But then a ton of information is printed, as should be expected in any debug mode. What's missing is debug levels. silent, error, warning, info, and debug are pretty standard. What about including two more print functions: error and info, which do the same thing as debug but print in different colors?

As I see it, the scripts can get complex enough to the point that the user won't be able to figure out exactly what will happen in every case, so it would be very helpful if they could give the user some hints about their behavior.

Yes but this doesn't let you print in different colors! Maybe allow an optional argument to debug() to specify the color? I guess it's not that important.

I don't think this should be a key requirement for Demlo...

Oh certainly not. Just an enhancement.

Focusing on an Emacs interface seems much more sensible in my opinion.

You're on your own on that front, heheh. I have zero use for an emacs interface.

That said, what do you want to print in color? The prefix?

Yes. Currently debug messages are printed with @@ in cyan, and info messages (like "Preview mode, no file was processed") are printed with :: in magenta. Perhaps !! in red for errors and ## in yellow for warning?

Closing this. Regarding the other points, we can further discuss them in their dedicated issues.

Regarding the Emacs interface: it does not mean that you have to use Emacs as your primary editor. Think of Emacs as a toolkit. Wait for it, you might be surprised... :)