PHPCSStandards/PHPCSDevTools

[FR] A script which can be used in CI to check the formatting of XML docs files

Opened this issue ยท 25 comments

Is your feature request related to a problem?

XML docs for sniffs should have consistent formatting, but this can't currently be checked automatically as the xmllint + diff commands need the file name of each individual file to be checked, which (without a script to provide those names) would mean the check in GH actions would need to be manually updated each time a docs file is added, which makes it fiddly and something which will easily go out of date/miss files which should be checked.

This feature request was originally in the WPCS repo, but as it looks like it would be possible to handle this via a portable bash script, it would make sense to add that script to this package and make it available for (re)use by any PHPCS related repo which provides XML docs for sniffs.

Describe the solution you'd like

See the conversation here: WordPress/WordPress-Coding-Standards#2334 (comment)

Basically we'd need a little (bash or otherwise) script to gather the files in the WordPress/Docs directory and then run the following command against each file with XMLLINT_INDENT set to (four) spaces.

diff -B --tabsize=4 ./WordPress/Docs/Cat/SniffNameStandard.xml <(xmllint --format "./WordPress/Docs/Cat/SniffNameStandard.xml")

Additional context (optional)

In the original ticket, @viditagrawal56 has already posted an outline of a script which could be used for this and has indicated they would like to continue working on this feature.

This ticket is intended to talk through further implementation details for the feature.

Here's a summary of some of the questions discussed in the previous ticket:

@viditagrawal56: This script just shows the diff and does not update the xml file. Let me know if that is needed to be implemented!

This question was previously unanswered.

In my opinion, I don't think the script should update the XML file. Showing the diff should be fine.

Having said that, having a command parameter to optionally turn on fixing would be an interesting feature to add.

IMO, this is not a requirement for the initial version of the script though, so it would be perfectly fine to open a ticket for this as "future scope" after the initial script has been merged.


@jrfnl: Would the above script work cross-platform ? I.e. would it be able to run on both *nix systems as well as Windows ? that would be helpful to allow contributors to run the script locally.

Response by @viditagrawal56:

The script that I wrote works only on Unix-like systems because of the following reasons:-

  1. The script is a Bash script and Windows does not natively support Bash scripts.
  2. xmllint is part of libxml2 and is not included by default on Windows and needs to be installed separately.

Although WSL can be used for these problems.

The solution for the cross platform problem of this script is to use wsl, even then libxml2 would be needed to be installed manually if its not already installed. Although I think that we can include a step to install libxml2 automatically in the script.


@jrfnl: Another question I'm wondering about: How difficult would it be to create this as a bash script with a parameter for the path(s) to check ?
If that would be an option, it would be great if this could be turned into a script which would be re-usable, not just by WPCS, but also by something like PHPCSExtra, PHPCompatibility and even PHP_CodeSniffer itself.

Response by @viditagrawal56:

It's actually pretty straightforward, Here's the code that for bash script that takes path(s) as parameter.
<snip>
Here's pastebin link for easy viewing https://pastebin.com/EmpErJX5

It's all same at the start with just an addition for checking if args are provided or not and then calling the function for each path at the bottom.

Here are some follow up questions I've been thinking off. These are mostly related to the re-usability of the script.

1. Path parameter

As it appears to be possible to pass the path as a parameter, this now brings up the question of whether the script should be able to handle different types of "path" inputs ?, like:

  • file names (check just one file);
  • multiple files names (either comma separated or space separated);
  • directory path (that is currently handled in the current version of the script);
  • multiple directory paths;
  • glob patterns (check the files which comply with the glob pattern).

The reason for thinking of this, is that a PHPCS package may contain multiple standards, like the PHPCS package itself has PEAR, PSR1, PSR2 etc.
Being able to check the docs of all those in one go with a pattern along the lines of src/*/Docs/*/*Standard.xml would be a nice option.
Though being able to pass multiple paths (without glob support), would also solve this without having to call the command multiple times.

Having said that, again, this would be perfectly fine as something for "Future scope".

2. Check all XML files or only docs files ?

As things are, the script looks to check all files with an .xml extension.
This is fine if the script is passed just and only the "Docs" directory (which will contain only Docs which all must be space indented with four spaces).

However, this could run into trouble if someone would pass a "Standard" directory as that directory will also contain a ruleset.xml file which does not have to comply with the four space indentation rule for docs and - depending on the package - may be using tab indentation.

Similarly, if someone would pass the project root, this would mean that phpunit.xml and phpcs.xml files (and the likes) would also be checked by the script.

So, this brings us to a choice:

  • Either the script should only check PHPCS Docs XML files (which will always have a name along the lines of /Docs/*/*Standard.xml
  • Or the script should be more generic and allow for checking any XML file, but in that case, the XMLLINT_INDENT setting would need to be made configurable from the command line.

3. Graceful error handling

As the script will be distributed as re-usable, I think it may be good to throw an error if xmllint is missing or even if diff is missing ? Could this possibly be done by checking the exit code of xmllint --version ?

I seem to remember that the GH Actions images don't always come with xmllint pre-installed and that we have a sudo apt-get install --no-install-recommends -y libxml2-utils command in various GHA scripts which use xmllint.

4. Windows

I can imagine that PHPCS package maintainers (like me) may have xmllint and diff installed on Windows anyhow. I wonder if, given that knowledge, in a future iteration, the script could be made compatible with Windows too.
Alternatively, could Windows be detected early and an error be thrown ?

5. GitHub Actions runner ?

Last thing I'm thinking off and this ties a lot of the above together: would it be an idea to publish the script as a GitHub action runner ?
In which case, it might be better to create a separate repo for it ?

Bear with me on this while I outline the idea:
If a PHPCS package has XML docs, typically, the following checks should be executed for the XML formatting:

  • Does it comply with the docs XSD file ?
  • Is the format consistent ?

To do so, the typical steps which need to be executed in GHA are like so:

  • Check out the repo
  • Update the available package list
  • Install xmllint
  • Optionally allow for showing XML violations inline
  • Validate the Docs against the XSD file (either in the vendor dir if this repo is a dependency or by downloading the publicly available file)
  • Check the code style of the Docs XML files (which is what the script we're currently talking about does)

Also see: https://github.com/PHPCSStandards/PHPCSDevTools/blob/develop/.github/workflows/cs.yml

This repo already contains an XSD file for PHPCS docs files, this is typically used in GHA like so: xmllint --noout --schema vendor/phpcsstandards/phpcsdevtools/DocsXsd/phpcsdocs.xsd ./*/Docs/*/*Standard.xml.

An action runner could execute those steps automatically with the following inputs:

  • path (string)
  • xsd-check (bool)
  • code-style-check (bool)
  • show-errors-inline (bool)

(input names open for discussion)

That is providing the script is limiting itself to docs files and not other XML files.

The action runner could potentially, at a later point in time, then also include the code sample check as proposed in #142 /cc @rodrigoprimo

While this, again, could be something for "Future Scope", I think it might be good to have a think about this now anyway as, even if moved to "Future Scope", it would inform whether the script should be in a separate repo or in this repo.

@viditagrawal56 What do you think ?


Implementation details wise:

  • We'll need to have a think about how we can add tests for the script. The IOTestCase could possibly be used for this ? Or alternatively a tool like expect, or possibly bashunit ?
  • Documentation for the tool will need to be added to the readme.

Hey @jrfnl ๐Ÿ‘‹
Thank you for all the detailed feedback and suggestions! I'm excited about the possibility of expanding this script's functionality and making it more versatile and reusable. Here are my thoughts on the points raised:

In my opinion, I don't think the script should update the XML file. Showing the diff should be fine.

Having said that, having a command parameter to optionally turn on fixing would be an interesting feature to add.

Yeah, I agree having an option like -f to turn on formatting would be nice and can be implemented pretty easily using the getops command.

while getopts ":f" opt; do
    case $opt in
        f)
            fix_files=true
            ;;
        \?)
            echo "Invalid option: -$OPTARG" >&2
            usage
            ;;
        :)
            echo "Option -$OPTARG requires an argument." >&2
            usage
            ;;
    esac
done
shift $((OPTIND -1))

1. Path parameter

As it appears to be possible to pass the path as a parameter, this now brings up the question of whether the script should be able to handle different types of "path" inputs ?

I agree that supporting different types of path inputs would be beneficial, especially glob patterns. I think it is pretty doable
and could be added as a future scope enhancement.

2. Check All XML Files or Only Docs Files:

I think introducing a configurable parameter for the XMLLINT_INDENT setting to make it more generic would be great and pretty straight forward to implement. We just need to add a check for another option, lets say -i and then set the XMLLINT_INDENT to the provided argument.

The command while execution will look like ./script.sh -f -i "\t" path the -t will set the indentation to a tab character.

3. Graceful Error Handling:

As the script will be distributed as re-usable, I think it may be good to throw an error if xmllint is missing or even if diff is missing ? Could this possibly be done by checking the exit code of xmllint --version ?

command -v xmllint command checks if xmllint is available on the system. This can be implemented by calling this check function first.

check_dependencies() {
    if ! command -v xmllint &> /dev/null; then
        echo "Error: xmllint is not installed."
        exit 1
    fi

    if ! command -v diff &> /dev/null; then
        echo "Error: diff is not installed."
        exit 1
    fi
}

4. Windows Compatibility:

Alternatively, could Windows be detected early and an error be thrown ?

Yeah Windows can be detected. bash has an environment variable $OSTYPE which is only available on Unix systems and I think would be undefined on Windows systems. So Windows can be detected if $OSTYPE is undefined.


check_os() {
    if [[ -z "$OSTYPE" ]]; then
        echo "Error: OSTYPE is not set. This script cannot be run on Windows."
        exit 1
}

5. GitHub Actions Runner:

The idea of creating a GitHub Actions runner is fantastic. Though I don't have experience with GitHub Actions runners, I'm interested to learn and contribute to this project.

I have a question about where the script should reside in this repo. Should it be in the root directory or in the scripts directory?

If we decide to proceed with creating a GitHub Actions runner, we can then discuss creating a separate repository for it.

Looking forward to your feedback and next steps!

@viditagrawal56 Thank you for your response. Sounds like everything I came up with is doable, so that leaves us with some decisions to take and the fine print/implementation details to sort out.

I have a question about where the script should reside in this repo. Should it be in the root directory or in the scripts directory?

If we'd go the GitHub actions runner route, we can lay out the (new) repo in whichever way works for its purpose.

If we're talking this repo, the "entry point script" should be in the bin repo, but the script could be divided into multiple subscripts (to make testing more straight forward) and those could live in a dedicated subdirectory of the Scripts directory.

The idea of creating a GitHub Actions runner is fantastic. Though I don't have experience with GitHub Actions runners, I'm interested to learn and contribute to this project.

I've never created a GH action runner from scratch, but I have contributed to a number of them, most notably to the Composer install action runner, which we could possibly use as an example.
So I suspect between the two of us we can figure this out ;-)

The Composer install action runner is a composite action, which means it's not written in Node, but runs other actions + runs some native bash scripts. It also has a test setup for bash scripts which we could probably re-use, which is why I think that project might be a good starting point.

I suggest you may want to take a little time to study that repo (and possibly others) to gather your thoughts on this.

There are only a few other action runners involving XMLint. None of which do what we need the action to do, though we could, again, have a look at them to see what we could learn from them.


So I think to proceed, we'll first need to decide the following:

  • Are we going to go for a stand-alone script ? Or are we going for a GH actions runner ?
  • Are we going for a re-usable "xmllint diff with fixer" script/action runner which does just and only that and can be used with any kind of XML file ? Or are we going for a PHPCS specific "Docs QA" action runner which can run several types of QA checks on sniff documentation XML files ?

FYI: my availability/response time over the next two weeks will be minimal/slow as I'll be at the WCEU conference and doing some traveling around that.
Once I'm back from the conference, I should have more time again and if you like, we could do a videocall to talk things through and work on this together.

@jrfnl Really sorry for late reply I was busy with college work.

I've never created a GH action runner from scratch, but I have contributed to a number of them, most notably to the Composer install action runner, which we could possibly use as an example.
So I suspect between the two of us we can figure this out ;-)
I suggest you may want to take a little time to study that repo (and possibly others) to gather your thoughts on this.

That sounds like a solid plan! I'll take some time to study Composer install action runner repo.

So I think to proceed, we'll first need to decide the following:

  • Are we going to go for a stand-alone script ? Or are we going for a GH actions runner ?
  • Are we going for a re-usable "xmllint diff with fixer" script/action runner which does just and only that and can be used with any kind of XML file ? Or are we going for a PHPCS specific "Docs QA" action runner which can run several types of QA checks on sniff documentation XML files ?

FYI: my availability/response time over the next two weeks will be minimal/slow as I'll be at the WCEU conference and doing some traveling around that.
Once I'm back from the conference, I should have more time again and if you like, we could do a videocall to talk things through and work on this together.

Thanks for letting me know. I understand.
I wanted to mention that I'll also be quite busy during that time with my college work. However, I'll try my best to squeeze in enough time to study the Composer install repo.

I'm definitely open to scheduling a video call to discuss the details further. Just let me know when would be convenient for you.

Safe travels and enjoy the conference!

@viditagrawal56 Good luck with your studies and take your time with having a look at the repos. Let's touch base again towards the end of June to schedule a videocall. Okay ?

@jrfnl Thank you! That works perfectly.

@viditagrawal56 I'm back and up and (nearly) caught up with my backlog by now. Are you still up for planning a call ?

@jrfnl Sorry for missing our call and taking so long to respond. I'm still up for a call. I just need a bit of time to go over our previous conversations and the issue details again.

How about we schedule the call for sometime next week? That way, I can be well-prepared and we can have a productive chat.

@viditagrawal56 Next week sounds good. What timezone are you in ? I'm in CET. Would Tuesday July 30 work somewhere between 08:00 - 13:00 UTC ?

@jrfnl I'm in IST (Indian Standard Time), so Tuesday, July 30th at 09:30 UTC would be 15:00 IST. That works for me! Let me know if that time fits your schedule.

@viditagrawal56 Works for me. We can use whereby.com/jrfnl (no software install or login needed) for the call.

@jrfnl Sure ๐Ÿ‘

@jrfnl Hi, I am so sorry but I just found out that I need to go to college tomorrow for my final year project, so unfortunately, I wonโ€™t be able to make our meeting.
Would it be possible to reschedule for another day? Let me know what times work for you, and Iโ€™ll do my best to adjust.

@viditagrawal56 It happens, let's reschedule. How about Friday ? I can do any time between 07:30 - 14:00 UTC.

@jrfnl Thank you for understanding! Friday should work, but I'll confirm as soon as possible.

@jrfnl Iโ€™m really sorry for the short notice, but Iโ€™ll be busy with college on Friday. Would it be possible to reschedule our meeting to today at around 10:30 UTC? Let me know if that works for you.

@viditagrawal56 Sorry, didn't see your message in time.

@viditagrawal56 I'd still like to try and get this set up. Any chance we can try again planning a call ?

Hi @jrfnl , I'd love to plan a call too, but due to college and the time zone differences, I have to attend college during regular working hours. Would it be possible to arrange a call on Sunday? I know it's a lot to ask to schedule a call during the weekend, and I'm really sorry for that. If that doesn't work, we could also schedule a call today anytime after 13:30 UTC. Let me know what works best for you!

Hey @jrfnl, are you free for a call tomorrow, September 5th, anytime after 11:30 UTC? If so, we can setup a call

@viditagrawal56 Okay, let's do this. Let's make it 12:00 UTC ? We can use [redacted].

@jrfnl Sure, that works!

One of the things we discussed was maybe also adding different verbosity levels.

I.e. the default being showing progress dots for each file scanned.
-v would show verbose output, showing the path traversal details.
-q would mean "quiet" and not show progress dots nor verbose output output.

Example (from PHPCS itself):

image