melink14/rikaikun

Create visual regression tests

Closed this issue ยท 6 comments

Percy

Percy uses a unique flow in order reduce flakiness. It can't capture selections but it does have approval flow integrated with github.

Visual Reguression Test Plugin

  • Requires proper X server setup to run in wsl.
  • Browser stack supports this but running into flakiness where the rikaikun element is not present or not styled.
    • Reduce concurrency because capture doesn't work if another iframe is ontop of the element to screenshot.
    • Need to set up a flag based a approach for optionally running with browserstack since groups doesn't allow overriding concurrency or test framework parameters.
    • Not sure if the flash of unstyled text is a bug in my code or a strange problem with browserstack.
  • Can use test name directly but the spaces aren't removed; might want to use a custom namer for that.
  • Before investing more time into cloud solutions, let's try just running them locally and seeing if they're flaky.
  • Current proof of concept is in rikaicontent test but probbaly will make an integration test since it covers a larger subset.
  • It takes a bit of time to load the full dictionaries, I'm thinking it makes sense to create a subset or even just stub the HTML directly. Stubbing directly would make it a smaller test and these aren't so slow to make me want to reduce the scope just yet.

Which tests to create:

  • A set in each theme
  • Word dictionary
  • Names
  • Kanji
  • title lookup (image, etc)
  • text highlight test (not sure if this needs a screenshot)
  • more and less
  • mini help

I should use a look up that hits as many spots as possible. declination and more/less. Later when I test style conflicts I should just include them at the top level as worse case scenario. Since this is OSS and screenshots/time is money there's no reason to try to make many small scoped images if one image can cover both.

Also, note that using network logs breaks browserstack due to the use of webscokets by web test runner.

Couple more lessons:

  • Flakiness in browserstack was due to it taking 1-2 seconds for the rikaikun CSS to load. That makes sense and probably was never an issue with local testing. For shadow dom I'll probably want to inline the styles anyway, I should look into a style plugin to inject the CSS into content script for direct embedding.
  • One of the slowest network calls is name's dictionary and index which take 15-20 seconds. Definitely think a smaller test only dict makes sense. I wonder if it makes sense to make a function which runs update dict normally but filters for words used by tests. I could even use the tests to generate the list of words to filter by...

Example config using browserstack and/or percy. Will revert for now until I use them (and will probably create separate workflow)

jobs:
  presubmit:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v2.3.4
      - name: Use Node.js 14
        uses: actions/setup-node@v2.4.0
        with:
          node-version: '14'
      - run: npm ci
      - name: 'BrowserStack Env Setup'
        uses: 'browserstack/github-actions/setup-env@master'
        with:
          username: ${{ secrets.BROWSER_STACK_USERNAME }}
          access-key: ${{ secrets.BROWSER_STACK_ACCESS_KEY }}
      - name: 'Start BrowserStackLocal Tunnel'
        uses: 'browserstack/github-actions/setup-local@master'
        with:
          local-testing: start
          local-logging-level: all-logs
          local-identifier: random
      - run: npm run presubmit:coverage
        env:
          PERCY_TOKEN: ${{ secrets.PERCY_TOKEN }}
          BROWSER_STACK_USERNAME: ${{ secrets.BROWSER_STACK_USERNAME }}
          BROWSER_STACK_ACCESS_KEY: ${{ secrets.BROWSER_STACK_ACCESS_KEY }}
      - name: 'Stop BrowserStackLocal'
        uses: 'browserstack/github-actions/setup-local@master'
        with:
          local-testing: stop
      - name: Upload coverage to Codecov
        uses: codecov/codecov-action@v2
        with:
          token: ${{ secrets.CODECOV_TOKEN }}
          files: ./coverage/lcov.info
          fail_ci_if_error: true
          verbose: true

Lessons from initial upload to github:

  • github ubuntu has no Japanese fonts by default but I can install them with sudo apt-get install noto-sans-cjk
  • The font on my Pengwin installation seems to be a windows font so that led to mismatch anyway.

After talking to @Stephie it seems like relying on local and github environments to match is not a good long term strategy. Alternatives:

  • Use Percy with it's good github integration.
    • Downside is no highlight visible and no way to take screenshots of just the popup. These aren't major though.
  • Use browser stack, this allows consistent non linux environment.
    • Has some network flakiness which needs to be fixed.
    • Doesn't have any useful way to check and approve diffs.
  • Use github CI to produce the baselines.

None of these work locally so Stephie's idea was to generate local baselines upon npm install so that you could get quick feedback of visual changes. I would use a custom signal to make a special screenshots directory that was ignored by .gitignore.

Percy has the least amount of control and least amount of work. The other solutions could be feasible if I use a github action to automatically update baselines and commit upon failure. Then you could use github to easily check diffs and approve. (Of course, the test will stop failing as soon as the commit is added so it relies on careful approval of the changes.)

(These font differences were probably why various checks for characters under mouse midway through sentence weren't consistent when I found them via trial and error.

๐ŸŽ‰ This issue has been resolved in version 2.2.5 ๐ŸŽ‰

The release is available on:

Your semantic-release bot ๐Ÿ“ฆ๐Ÿš€