lsegal/yard

Question: checking common documentation errors for CI

zverok opened this issue ยท 33 comments

Hey.

What I'm trying to achieve currently (on large codebase constantly changed by hundreds of people) is adding continuous integration check for YARD docs. What should be checked, ideally:

  • level 0: No dumb errors (@param vs @params, documentation for parameter that is not there and so on);
  • level 1: "You have changed file X, but it has undocumented methods/classes/modules now";
  • ....more to come...

Let's skip levels 1+ for now completely (just a mention for future discussions) and focus on level 0. yard command currently has two levels of error reporting (amirite?): [error] (can't continue parsing this file) and [warn] (file is parsed but not everything is allright). Most of warn are indeed useful, so, it almost looks like "if there are any warnings, YARD CI step is failed", but there is one case, looking like this:

[warn]: in YARD::Handlers::Ruby::MixinHandler: Undocumentable mixin: YARD::Parser::UndocumentableError for class CodilityService
[warn]:     in file '<some_file_name>.rb':3:

    3: include Rails.application.routes.url_helpers

As far as I understand, "official" solution for this is "just ignore all warnings", which also gets rid of things like "Unknown tag @params", "Cannot resolve link to", "@param tag has unknown parameter name" which are pretty useful, to say the less.

So, to be honest, my "ideal" feature request would be:

  • distinguishable warnings (and ability to turn them off by special comment tags?..);
  • machine-ready output format (JSON?..), turned on by yardoc option;
  • special option for "just check, no docs generation" (or is it yardoc --no-save?).

...but I'm open to any advice and suggestion :)

(Also, I'm not afraid and even willing to provide PRs and other contributions, if it could be appropriate.)

You should check out the "yardstick" gem which is built for doc linting. It might be a bit out of date but should be workable for you. You can also build your own linting using the API: http://yardoc.org/docs -- check the YARD::Docstring.after_parse callback for instance.

As for generating without html, you want the -n switch.

@lsegal Thanks, I assume yardstick is the right tool (though it is underdeveloped for our needs... but that's how opensource works, right?.. I'll just make some PRs there)

OK, I've investigated the matters for some time and have more focused question, about this (and similar) code line: https://github.com/lsegal/yard/blob/master/lib/yard/docstring_parser.rb#L210

if library.has_tag?(tag_name)
  @tags += [library.tag_create(tag_name, tag_buf)].flatten
else
  log.warn "Unknown tag @#{tag_name}" +                                 # <== HERE!
    (object ? " in file `#{object.file}` near line #{object.line}" : "")
end

What is my concern about it? It is that log.warn is a) unstructured and b) unconditional (except for global log level). So, nobody can build a tool that uses YARD's parser which then gathers all the problems and return them in orderly manner; or, for example, nobody can do things like yardoc --ignore-warning UndocumentableMixin.

So, to my proposal. What do you think about introducing structured warnings mechanizm in YARD, so that aforementioned line will look like:

Warnings.add 'UnknownTag', tag_name, object

Then, default behavior would be the same: just print warnings the moment they appear; but that approach allows several good things to emerge:

  • option like --warning-format json which reports them in machine-readable format;
  • option like --ignore-warning UndocumentableMixin;
  • using YARD's native parser for building documentation quality analysis tools (better than yardstick, which only can recognize whether known tag like @return is absent, but just silently ignores unknown or wrongly formatted ones).

Would you accept PR like that, being it properly done and tested?..

I'm not opposed to this, but consider that a tool like yardstick can already implement this fairly easily by overriding the default docstring parser:

module Warnings
  @@warnings = {}

  module_function

  def add(type, text)
    (@@warnings[type] ||= []) << text
  end

  def print
    if @@warnings.size > 0
      puts "Warnings:"
      @@warnings.each do |k, v|
        v.each do |line|
          puts "  #{k}: #{line}"
        end
      end
    end
  end
end

class LintingDocstringParser < YARD::DocstringParser
  def create_tag(tag_name, tag_buf = '')
    if !library.has_tag?(tag_name)
      # handle warnings for non-existent tag
      Warnings.add 'UnknownTag', tag_name
      return
    end

    begin
      library.tag_create(tag_name, tag_buf)
    rescue YARD::Tags::TagFormatError
      # handle tag format error
      Warnings.add 'InvalidTagFormat', tag_name
    end

    super
  end
end

YARD::Docstring.default_parser = LintingDocstringParser
YARD::Parser::SourceParser.after_parse_list { Warnings.print }

##
## Parsing this code:
##

class Foo
  # @parm hi
  # @return b<String> desc
  # @parmy hi
  def foo; end
end

If you run with -q (you probably want to disable normal output anyway), you can make this work.

That said, I wouldn't be opposed to the structured logging idea, but note that it's probably a large undertaking. If you do open a PR, I would recommend starting with a proof of concept on a few small modules to get the general API down before you attempt to convert the entire codebase.

Also, consider that simply adding an extra opts={} parameter to log.warn could be a sufficient API for this:

def warn(msg, opts={})
  if opts[:name]
    (@warnings[opts[:name]] ||= []) << opts.merge(message: msg)
  end
end

Then you could use it as:

log.warn "message here", name: "UnknownTag", tag_name: tag_name, object: object

The latter keys being optional and specific to the warning metadata. That would retain the same basic API but augment logging to support structured data without breaking the API, or even introducing a new API.

@lsegal Yes, I understand an option of "just rewriting parser" (and it is easier for me than properly contribute to YARD :)) but there are pretty important considerations:

  1. I assume that proposed "structured warnings" feature could be useful for any YARD user. For example, I totally dislike using -q option to suppress all the warnings while generating docs (most of them are pretty important), and 'd rather prefer disabling in .yardopts one or two of them I know are harmless, and consider others as a sign of a problem.
  2. To be honest, things like "automatically check how much docs and how good they are" (what yardstick targets) could be targeted by an independent tool, but common "validate documentation syntax is correct" task seems for me to be in the YARD's responsibility. For me, it looks like YARD is almost ready to do it by itself.
  3. Solution with copy-and-rewrite some methods is easy, but pretty fragile: it needs either dependency on fixed YARD version, or recheck (& rewrite) on each next version.

If you do open a PR, I would recommend starting with a proof of concept on a few small modules to get the general API down before you attempt to convert the entire codebase.

OK, thanks! Will do it this way.

  1. ... but common "validate documentation syntax is correct" task seems for me to be in the YARD's responsibility.

You are correct, but my rebuttal to this would be that YARD already handles this responsibility sufficiently. In practice, all warnings that YARD generates are important; they tell you that "X was not parsed correctly". I don't know your exact use-case, but I don't see any reason to hide some warnings but not others, since they are all weighted equally in the context of parsing correctness. At a linting level, I could see certain things weighted higher, but to take the above example, for instance, an typo'd tag name is never something you'd want hidden (IMO).

On that note,

and 'd rather prefer disabling in .yardopts one or two of them I know are harmless, and consider others as a sign of a problem.

Can you point to which warnings you believe are harmless? YARD shouldn't generate warnings for harmless things.

  1. Solution with copy-and-rewrite some methods is easy, but pretty fragile: it needs either dependency on fixed YARD version, or recheck (& rewrite) on each next version.

This shouldn't be the case. The implementation above uses the public API and isn't fixed to any YARD version so long as the API sticks around-- which is really the same issue as for any Warnings.add API, or even the specific warning labels (which is actually harder to audit / ensure stability for, since they are string values).

Can you point to which warnings you believe are harmless? YARD shouldn't generate warnings for harmless things.

Basically, there are two absolutely different kinds of situation YARD is unhappy about:

  1. My code could not be analysed/documented properly (being proper working code);
  2. My docs I've wrote especially for YARD is not properly written.

Examples of first:

# common Rails metaphor:
include Rails.application.routes.url_helpers #  Undocumentable mixin
# common metaprogramming:
attr_reader *OPTIONS # Undocumentable OPTIONS
class SomeData < Struct.new(:kind, :subject, :reason) # Undocumentable superclass

Examples of second:

  • @params instead of @param;
  • wrong parameter name in @param;
  • unresolvable reference to {SomeClassName}.

In the former case, I've wrote code to work, I will not change it just for the sake of documentation.
In the latter, I've wrote comments for YARD to parse them, and I want to be notified on any case when comments are erroneous.

Now, when I've formulated it, maybe entire "structured warnings" solution seems like overkill to solve the thing! What I'd be happy to see, in fact:

  1. Ability to separate warnings of class (1) from class (2) and suppress class (1) through options;
  2. Ability to receive formal output "no errors or warnings class 2" and pass it to CI tool by common Linux conventions (output 0 is ok, output > 0 is not ok).

Examples of first: ...(include, attr_reader, class < Struct)... In the former case, I've wrote code to work, I will not change it just for the sake of documentation.

I think this is something where I would disagree about severity, though I understand where you are coming from. These warnings shouldn't be as innocuous as you think. Let me unpack:

attr_reader *OPTIONS # Undocumentable OPTIONS

This is usually a really bad thing to do. Your documentation will completely omit attributes on the class. If they are public methods, it's even worse. I understand the code is technically correct here, but there's a difference between technically correct and properly documented. Put another way, the intent of your code is not being correctly expressed in your generated HTML documentation (or yri, etc). From an unobjective standpoint, that is a problem that needs correcting.

class SomeData < Struct.new(:kind, :subject, :reason) # Undocumentable superclass

YARD actually does recognize the above Struct subclass, but I understand your point. Let's consider instead:

class SomeData < DatabaseTable.new(:id, :name, :email)
end

This is an equally big documentation problem. SomeData will be documented as inheriting Object. Your API consumers will have no idea how to use the class. For public classes, this is almost equivalent to having no docs at all (especially if the class body is empty as above). You should really be seeing this warning and trying to deal with it so your API consumers can understand your codebase properly.

The idea of technically correct DRY code and documentable code are often at odds in Ruby. I'm not sure it's worth being so tied to certain syntaxes when they affect documentability. This is obviously a personal opinion, but if you can get proper docs by just writing out your attribute names, the minimal amount of duplication has big rewards. DRY should not apply to documentation-- repeating yourself is part of good documentation. YARD helps with keeping docs more DRY than say RDoc, but it still can't solve all edge cases. That's where plugins come in.

Anyway, let me step back from the opinionated behavior for a second and point out that it actually sounds like you might want something completely different here:

It sounds to me like what you really want, pragmatically speaking, is a way to error out a CI build if an error pops up that is important to you. We can actually separate this feature request away from logger output: if the warnings displayed but only certain warnings created a non-zero exit code, presumably the log output would be okay to have around? The way I see it, I still believe YARD should be warning you about the class (1) cases, but I agree that they can still avoid changing the exit code.

If you agree with this, I would be open to (a PR) adding an exit code attribute somewhere in YARD that would be toggled on serious errors (like tag parsing and log.error calls) but would not affect UndocumentableErrors. I guess optionally users could opt-in to a strict mode where all warnings toggled the non-zero exit, but I'm okay with your class 1/2 distinction. In this scenario, logging would remain unchanged-- the exit code would not be tied to actual log output-- so you could still get warning text with a 0 exit code. And it sounds like that's really what you want, here.

Finally, a @!ignorewarning directive could be added for users who want to explicitly opt-out of doc issues. It wouldn't affect warning log output, but it would ignore the associated exit code toggle (if there was one).

It sounds to me like both of these together would give you enough control without needing to touch logging behavior. It's actually fairly easy to do, too, because there are only a few places you'd need to toggle the YARD.exit_code attr (or whatever it gets called). In fact, to implement this, you'd really only need to touch the DocstringParser class and log.error.

That said, a log.warn msg, name: 'Type' API would be equally useful if you wanted to go in that direction, it's just a little more complicated to implement.

Hope that helps!

About the first part of your reply: I understand your concerns, but we are speaking about really large (and pretty pragmatic) codebase, written by myriads of people. Although we are really aiming high standards, there are certain edge cases and complicated domains when many kinds of hard metaprogramming are absolutely necessary. Then, about documenting things, let's imagine two different situations:

  1. John Doe somehow wrote # @returns [String] in his code; he just needs to be clearly notified of the mistyping (maybe even on pre-commit hook).

  2. John Doe had to write attr_reader *OPTIONS in his code; and CI is unhappy about this -- what could be done about it? It is not faulty code by itself (and, in any case, we have several code quality check tools). Though, as you are saying, it should be documented somehow. I assume two possible solutions:

  • John sees the undocumentability notice and adds "synthetic documentation" via YARD tag: in this case, he needs to turn off undocumentability notice (like # yard:disable UndocumentableMixin before line and # yard:enable UndocumentableMixin after it, but I think # !@ignorewarning will do, if there would be opposite directive to put after problematic line);
  • John doesn't see the notice, but separate Alice of "documentation quality team" from time to time checks docs with "harder" level of warnings, and adds "synthetic documentation" herself: in this case we need either levels of warnings, or named warnings to disable them one-by-one;
  • OK, I see your point that third option is "John rewrites the code", but, you know ....

So, my "class 1" errors are not unimportant (and undocumentability of some things should be fixed some way or another), my idea is that they should be skippable somehow (either with "it's bad, but we cannot do anything about it" or with "we document it another way" mindset).

So, what it turns out of our discussion:

  • separate warnings into 2 severity classes somehow (to be honest, I'd even prefer my "class 2" to be counted as errors... but I suppose you know better); it looks like introducing entire "warning type system", all in all, would be overkill;
    • I can think of two styles now: log.warn 'msg', severity: 1 or log.warn 'msg', domain: :code (vs domain: :docs) or something like that;
  • introduce yardoc output code, depending on whether there are warnings/errors of desired severity;
  • introduce # @!ignorewarnings directive (and its disabling counterpart), which, like, ignores warnings completely (see below);
  • about log output: I'm pretty sure that option like --ignore-severity code (and # @!ignorewarnings directive) should hide those warnings completely (think about my example above: John Doe wants not only see green/red on CI, but also in his pre-commit messages, "what's wrong here", and don't want to lose "more severe" errors between "less severe")

WDYT?.. Does it look more reasonable now?

So here's what I think could be done fairly easily:

  1. Update all log methods to take an optional hash of metadata that would get passed off to registerable callbacks.
  2. Add callbacks (above) that could be registered to accept and track warnings
  3. Add codes to various warnings for tracking.

This would not immediately provide the behavior you want in YARD core, but it would give you the tools necessary to implement what you want quickly (including directive support for toggling warnings on/off in source). Sample implementation:

require 'yard'

class DocLinter
  class << self
    # TODO: use this toggle in a Tag::Directive to toggle behavior on and off.
    attr_accessor :enabled
  end

  self.enabled = true

  def initialize
    @exit_code = 1
    @messages = Hash.new { |h, k| h[k] = [] }

    # Add hooks
    YARD::Logger.instance.on_warn(&method(:on_warn))
    YARD::Logger.instance.on_error(&method(:on_error))
  end

  EXIT_CODES = {
    'InvalidTagFormat' => 1,
    'UnknownTag' => 1,
    'NamespaceMissing' => 1
  }.freeze

  def on_warn(message:nil, code:nil, **extra)
    return unless self.class.enabled

    val = EXIT_CODES[code]
    if val
      @exit_code = val

      # TODO: print custom log messages
      @messages[code] << message

      # stop the warning message from being printed.
      # alternate API ideas:
      #   return false
      #   raise YARD::Logger::SuppressMessage
      return :suppress
    end
  end

  def on_error(*)
    return unless self.class.enabled
    @exit_code = 127
  end

  def print_messages
    return if @messages.empty?

    puts "Warnings (#{@messages.values.flatten.size}):"
    @messages.each do |code, msgs|
      msgs.each do |msg|
        puts "  [#{code}]:\t#{msg}"
      end
    end
  end

  def finish
    print_messages
    exit @exit_code
  end
end

module DocLintInjector
  def run(*)
    lint = DocLinter.new
    super

    # YARD will do this:
    lint.on_warn message: 'invalid tag', code: 'InvalidTagFormat'
    lint.on_warn message: 'undocumentable superclass', code: 'Undocumentable'

    lint.finish
  end
end

YARD::CLI::Yardoc.send(:prepend, DocLintInjector)

Unfortunately there's no good way to hook Yardoc execution (mostly only because of the fact that your -e argument is parsed after Yardoc is already running), so you'd need to execute with the following command instead:

$ ruby -r./path/to/doc_linter.rb -S yard

I can add a separate TODO item for an after_run hook on YARD::CLI::Commands which would allow regular yard -e doc_linter.rb and replaces the inject module with just:

class DocLinter
  def initialize
    # ... other stuff ...
    YARD::CLI::Yardoc.after_run(&method(:finish))
  end
end

What do you think about this?

Everything looks reasonable, but...
I just want to do a step back and discuss the "big picture".
I like to think about matters this way:

  1. Of course, first and primary YARD's target is to generate documentation.
  2. Of course, "linting" of docs (like, "whether there are enough of them, whether all types are specified, whether @return specified for all methods except initialize and predicates...") should be a task of a separated tool (probably, using YARD's own parser/DocStrings to do the job).
  3. But about, like, "physical correctness" of docs -- I'm not sure that it should be called "linting" & extracted to separate tools. I'd be rather happy to see (& do) it by YARD itself.

So, what am I imagining now (I know that during the ticket I am constantly changing those ideas, but that's how discussions working, right?.. I'm trying to find the compormise that all of us would be happy with). Things I am aware of, currently:

  1. CI-able yardoc (at the very simplest level, done with just adding exit code, right? With dumb simple heuristic "No errors/warnings = exit code 0, otherwise 1"). Maybe it could go as a separate small PR? I don't think something should be broken by it. But, to be absolutely backwards-compatible, output codes could be turned on only with command-line option. It is like patching YARD::Logger with a small error/warn counters, and using their values in the end.
  2. (1), being implemented, creates the problem with meta-programmed code, which is good code, but undocumentable. The simplest solution I see now is just this # !@ignorewarnings/# !@unignorewarnings directives pair: when we have our attr_reader *OPTIONS, we just document it with proper tags/directives, and ignorewarnings around. It is also easily implementable, useful for multiple situations (like "ah, it is bad ol' code, lets ignore it for now and fix docs later") and could stand for separate PR.
  3. Now, to the hard part of CI-ability: fancy output of CI fails (HTML one, presumably). This one, I assume, could be done by third-party tools, just substituting YARD's logger with something fancy-shmancy. (Or by one more separate PR with independent HTMLLogger class added by command-line option.)

Surprisingly for me, even requirement for different warning levels is not that necessary in this scheme. Like, "don't ignorewarnings too often and you are safe to know YARD'll catch any error".

Hm?

  1. But about, like, "physical correctness" of docs -- I'm not sure that it should be called "linting" & extracted to separate tools. I'd be rather happy to see (& do) it by YARD itself.

I'm not sure we're really agreeing on premise here, so I really can't see myself following through to your conclusion points. Or rather, this point is technically correct, but I don't interpret YARD's current behavior as failing to properly ensure documentation correctness. I would argue the other way around: YARD is correctly telling you that your code is undocumentable, and should continue to tell you this for as long as the undocumentable code exists. YARD does not know that you've implemented a workaround; from its perspective, your documentation has correctness problems. In that interpretation, YARD is doing its job.

I think where we disagree is the idea that better control of logger output is related to correctness. I don't think it is. Logger output in YARD is simply there to tell you what it is seeing for debugging and analysis purposes. To be clear, there are no plans to support a @!ignorewarnings directive in YARD's core codebase as you describe it, because this completely messes with consistency of the tool and is rife for abuse. It's the same reason YARD does not support disabling parsing, comments, or arbitrarily hiding objects in an easy fashion. The idea that "my doc tool doesn't understand my code, I'll just tell it to keep quiet instead of resolving the problem" happens fairly often. See :nodoc: for a clear example of how documentation is left by the wayside when tooling provides a workaround to avoid solving the root problem (documenting code). YARD's fundamental goal is to improve documentation in the Ruby community. Adding a switch to hide warnings works against that goal more than it does anything to further it.

On the flipside, having redundant warnings when you have a legitimate documentation workaround in place does not in itself work against the goal of improving documentation. Frankly, a constant reminder that you're doing "weird things" to resolve your undocumentable code (i.e., everything except just rewriting the code in a documentable form) is probably a good thing. I would actually argue that by ignoring a warning, you're not just ignoring the log output, but you're ignoring the fundamental problem, which means you're not furthering the goal of improving documentation either (even if you might think you are). As you pointed out, the "third option is 'John rewrites the code'"-- yet it seems like you do not consider this an equally valid solution. Code should be self-documentable-- I understand Ruby's edge cases, but if you're writing meta-code to document attr *OPTIONS instead of just writing the attributes in a documentable way, your code is not self-documentable. That's an anti-pattern, IMO.

I do understand that you're in a different boat, because it sounds like your team is pushing for documentation excellence. For that reason, I see why you are making the arguments you are. Realize, however, that you're very likely in a small minority of people who push that hard for documentation excellence. For YARD to raise the bar, it needs to focus on all users. I personally wish that the concept of a "documentation quality team" was a real and popular thing, but it's not. Perhaps if it becomes a thing, we can re-visit this issue and provide better support for those teams. Until then, YARD provides enough tooling and extensibility to provide more customized reporting for your specific project with very little effort. I think the best path forward here is so provide the minor updates to the logger so that a plugin could be written to do what you need. If the plugin proves that it is worthwhile to users, we can have a discussion about merging it into YARD core. I think this is the right path forward.

Hope that makes sense! Thanks for bringing up this topic, although we don't agree 100% I know where you are coming from and I appreciate your focus on trying to improve things!

Frankly, a constant reminder that you're doing "weird things" to resolve your undocumentable code (i.e., everything except just rewriting the code in a documentable form) is probably a good thing.

OK, I think we've successfully reached "let's agree to disagree" point here :) For me, Ruby's metaprogramming is one of its precious parts, and discouraging metaprogramming for the sake of documentability is somewhat questionable. But yes, I understand your reasons and see your points, so, let's return to a pragmatic compromise discussion.

So, as far as I can understand, "pragmatic compromise" would be just adding callbacks to the logger. In this case, I propose to stick "the simplest thing that should possibly work" and implement, for "first try", only Logger#on_warn and Logger#on_error, without adding any additional structure to warning messages, for now. Possible tool I'll develop on base of it will, for ver.0.0.1, just use regexps to filter out warnings it is interested in, or group them. This will allow to validate the entire approach with minimal invasion to YARD's logic.

But one additional change I'd be willing to do in this case: make warnings atomic. I mean:

# Change thins like this: lib/yard/code_objects/proxy.rb:196
log.warn "Load Order / Name Resolution Problem on #{path}:"
log.warn "-"
log.warn "Something is trying to call #{meth} on object #{path} before it has been recognized."
log.warn "This error usually means that you need to modify the order in which you parse files"
log.warn "so that #{path} is parsed before methods or other objects attempt to access it."
log.warn "-"
log.warn "YARD will recover from this error and continue to parse but you *may* have problems"
log.warn "with your generated documentation. You should probably fix this."
log.warn "-"

# Into this:
log.warn "Load Order / Name Resolution Problem on #{path}:\n"\ 
          "-\n"\ 
          "Something is trying to call #{meth} on object #{path} before it has been recognized.\n"\ 
          "This error usually means that you need to modify the order in which you parse files\n"\ 
          "so that #{path} is parsed before methods or other objects attempt to access it.\n"\ 
          "-\n"\ 
          "YARD will recover from this error and continue to parse but you *may* have problems\n"\ 
          "with your generated documentation. You should probably fix this.\n"\ 
          "-"\ 
# ...or something like that (with warn, for exmple, properly formatting it with indents to 
# read like one message)

This way it would be one call to callback instead of 9 "independent" ones.

Is this acceptable?

This was actually just resolved the other day:

de13266#diff-79d0d073dc872cfaaa6870d2fa4f29ebR168

There are a bunch of others to fix up too, though.

This was actually just resolved the other day

๐Ÿ‘

Then, I just do Logger#on_<warn/error> PR and we'll see how where it will go further?..

I'm actually working on an API for the logging stuff. I'll have something pretty soon to show! I think it will be very easy to build plugins off of.

Oh, OK! JFYI, I'm really willing to help wherever I could be useful.

@zverok Well, I'm working on integrating the following gist into the Logger class. Feel free to play around with it and provide comments or even start filling in tests for that API!

https://gist.github.com/lsegal/82c9ddf036ea2cf36111efc2f2f9f412

Basically the idea is that you register a code to a severity so YARD can track usage (and provide debug info if you typo'd a code name), where the core logging severity types (warn/debug/...) are also mapped to codes. You can do:

log.add :my_custom_code, 'msg'

or just a simple:

log.add :warn, 'msg'

And callbacks can be registered for a specialized code name or to all messages. The way to think about it is that you can register custom code names that simply map to regular severity levels, where the regular severity levels are default code names.

I'm going to be working on wrapping the core #warn #error #fatal #info and #debug calls to also call the callbacks so you could still use the standard log.warn and get callbacks when no special code is needed (generic log messages)-- just for consistency. YARD will convert all of its own .warn calls to .add over time, though (this one requires a bit of work modifying tests).

Gist was just updated to near final API, I'll try to integrate into YARD::Logger and push a branch.

@lsegal Now, it looks really useful and well-thought! So, in a short time we can expect all YARD's log messages to became structured, and callbacks implemented?..

One small notice: maybe here is a good place to use Ruby's infamous throw/catch instead of raise SuppressMessage?..

One small notice: maybe here is a good place to use Ruby's infamous throw/catch instead of raise SuppressMessage?..

throw/catch is far too obscure to rely on for a general purpose public API, and fairly difficult to document. I'd rather use a :supress return, but an exception class is much more "checkable". Given the expectation that message suppression will be a relatively rare case (and warnings themselves are already fairly uncommon), I'm okay with the perf overhead.

You can check the structured-logger branch for all WIP: 3b3d47d

Wow. That's quite an impressive work. I am starting to integrate it with our code practices immediately, maybe to find some "thin places". Thank you so much!

@zverok just added the ability to use callbacks to modify logger output, so YARD can move away from ever even passing message parameters to log.add and simply do:

def parse(*args)
  ...
rescue LoadOrderError => err
  log.add :load_order, :error => err
end

# in a central logging message file
log.on_message :load_order do |data|
  data[:message] = "the big long load order error warning"
end

This gets us almost all the way to your original proposed Warnings.add :foo API over time.

@lsegal Looks good! My attempts to use the new functionality also seem successful. The class looks like this. Sample output:

Running YARD documentation validator....

Files:        2327
Modules:       525 (  467 undocumented)
Classes:      2192 ( 1857 undocumented)
Constants:     968 (  758 undocumented)
Attributes:    259 (   12 undocumented)
Methods:      9536 ( 4349 undocumented)
 44.78% documented


Warnings
--------
(mistyped tags or other typos in documentation)

app/models/<some_nda_here>.rb:15: @param tag has unknown parameter name: foo
...
app/controllers/<some_nda_here>.rb:216: Unknown tag @internal
...
apq/<some_nda_here>.rb:240: Unknown tag @retun

Undocumentability notices
-------------------------
(related to metaprogramming: not to be fixed, but should be documented with directives)

app/services/<some_nda_here>.rb:3: Undocumentable mixin
...
lib/<some_nda_here>.rb:13: Undocumentable ATTRIBUTES

0 errors, 8 warnings, 8 notices (123.96 seconds to validate)

(And it exits with 2 on errors, 1 on warnings and 0 on what I'm calling "notices" here)

The only thing I'm disliking currently is a need for pretty cumbersome code from here and below, to re-format YARDS "free form" messages into more regular ones.

OK, everything works extremely well with this branch! I have solid validator constantly working in production, and it is really useful for us, so, we are grateful to the sky!

When this branch will be released, I'll opensource the tool we build and hope it will be useful for community.

@lsegal Any plan to merge the structured-logging branch in the mainline? I am developing a small doc validation tool for our CI at work (in fact, borrowing a good deal of @zverok's code โ€“ thanks!) and we'd love to gemify and open source the whole thing

...And when/if it will be merged, we could opensource our document checker, it is already gem-quality...

...So we'll have two competing YARD checker gems :)))

@zverok I would be happy to join forces and contribute back the code I'm writing.. it's not a wheel worth reinventing over and over :D

Yeah, I am thinking about the same of course. Just let's hope the structured-logger will be merged...

Well... As @lsegal seem not being a big fan of this branch of events... I've finally managed to do the independent patch/plugin gem for structured logging and documentation validations.
@lordofthelake We can discuss our joined forces on CI-ing docs in that gem's issues, if you want.

Great read. Thanks for the efforts @zverok