journalovi/2024-Cashman-PAC-learning-game

[REVIEW] PAC Learning Or: Why We Should (and Shouldn't) Trust Machine Learning

Opened this issue · 2 comments

REVIEW

This paper presents an interactive game representing the types of tasks solved by machine learning algorithms. It supports the definition of Probably Approximately Correct (PAC) learning, illustrating proof of PAC learnability for Empirical Risk Minimization (ERM). Then, it shows three vulnerabilities for ERM.

The paper briefly introduces the problems of machine learning and the Four Germans game, a game where is essential to estimate the rectangle that effectively splits correctly ALL the points of a binary classification, being guided only by a subset of the points (eventually even zero for a complete guess)

Through a set of incrementally complex versions of the game (starting from zero support from training data to full support) it allows for review of some of the basic concepts of Machine Learning.

The first version of the game presented is without any point: just a guess, to discuss the extreme case.

Next, a version with ten fixed points is given.

This is helpful for introducing different heuristics for estimating the containing box /its general version, not the one specific for the visible points): Tightest fit (the tightest possible rectangle is chosen), Loosest Fit (the algorithm chooses the loosest possible rectangle), and Maximum Margin (a midpoint between the two, which tries to maximize distances from both)

These incremental steps help the reader get in touch with the problem and, with partial interactivity supported, create their own mental model through active testing.

This part is interleaved with theoretical concepts and definitions, in particular for the PAC learning algorithm and PAC-learnable spaces and its implication on the correct choice of an ML algorithm for a specific problem, based on a bound error ϵ and a probability of the error being bounded 1−δ.

Overall, the authors argue that by proving that an algorithm is a PAC learning one it is possible to avoid applying a wrong ML algorithm for a specific task and get a false sense of accuracy by the “by-default” applied Empirical Risk minimization on the training data. On the other hand, if this is not the case, the analyst can receive a false sense of confidence in learning results while effectively not being in a correct generalization case.
To explain this concept, the authors first consider that the tightest fit is always contained in the target rectangle, and then partition this part into four strips.

Then, first focusing on the top strip, it is shown how through geometric properties, error is bound by exactly (1- ϵ /4)^M. Multiplying it by four (the four strips) and solving for ϵ the original implication is demonstrated (the error is bounded for unseen examples).

The process then focuses on how many samples are needed to find the correct estimated box. This equals to solving the previous equation for M, with the result that M is not infinity and so the problem is PAC-learnable as the definition dictates.

The environment then moves on to describing several assumptions taken, not all the time valid for real applicative cases, without which the presented demonstration and properties are no more valid. The first is the violation of independent and identically distributed data for sampled data, without which the prototype shows things are not working correctly anymore.

Similar considerations are valid for the very well-known case in which the distribution of training data has nothing to do (or in general is not the same) as the one for the test dataset. The interactive example, taken in the first extreme case, shows that no guarantee can be given to the estimation of the box.

A third assumption is relaxed in the form of the “containing box” to guess (a generic polygon or even more generic). The test shows the problem in practice with an elliptic box.

This part ends with providing more advanced references on PAC learnability and some limitations in its applicability.

Overall, in all these cases, the interactive examples help at grasping the real problem, even if sometimes just at the surface level. No matter what, this can be good for teaching or getting introduced to the matter one step at a time with more engagement.

The last part, which opens more broadly to visual explanations for AI is the weakest as it just presents static text. It could have been interesting, in line with what has been presented up to that moment, to provide an interactive glimpse of what it means (i.e., an example taken from the referenced Ph.D. thesis).

The text flows well, with some minor corrections reported at the end of this review.


Revisions/Questions (Q):

Section 1: improve the points explorer/testing environment
Q1: When I played the game, the first thing I did was wait a little for some green points to appear, identify a potential box, and then draw it to the border of limiting red points. Then I pushed the TEST button: what I saw was that at that moment many of the green points I used for reference were not displayed anymore, while new green points appeared (with the green box not perfectly overlapping, but that is correct as intended).
Why are the already visible points removed? This creates some confusion in the user. Those points should be kept in the view while the new ones could be added, simulating the acceleration of the remaining part of the dataset passing through the model.

Section 2: clarify the assumptions for the four side of the box
Q2: In estimating the toughest fit error, why is it implied that each strip size must be lower than epsilon/4? Does it not imply that the contribution to the error is equal for each strip? Would it be more correct to say that the sum of the errors must be lower or equal to epsilon more correct, allowing different contributions to the error by different strips?

Or more generally speaking, does this assumption affect the final equation for estimating M, being a parameter for it, or instead is it the M estimation agnostic to it apart a constant term?

Section 3: clarify the hosting of the final prototype (probably JoVI as it is)
Q3: the system right now is a packaged web application to launch locally in order to make it work. Do the authors plan to host it somewhere? The reviewer asks it, as, for example, launching it in a completely local environment without an internet connection breaks the formula formatting and some parts of the layout (tested with both Chrome and Firefox browsers) while the backend and interactive examples continue working flawlessly. Would it be good to provide even a link to the hosted version (if any) or details about its hosting?

Section 4: improve the explainability coordinating the points classes with the points selections
Q4: this is more a suggestion (not mandatory to comply with): would it be good to make also the four classes of outcomes (TP, FP, TN, FN) interactive, selecting at any stage the subset of points corresponding to them? It could help the reader quantifying the introduced errors also in the “geometric space”


MINOR revisions:

A) The text size of the paper is quite small even on big screens and difficult to read. Both the manuscript text and the interactive environment on the right could better fit the screen size vertically to make the elements more visible (The reviewer checked that the environment is responsive, so the problem seems more correctly exploiting the dynamic screen size readings). This is true even for the PAC-Learning section.

B) The “Loosest Fit” example seems too conservative on the left and right sides (there is still some area that can be covered with respect to the most right and most left red points).

C) sometimes the “interactive parts” are covered by textual parts, in particular by definitions. It seems a problem of overflow management and visibility.

D) The transitions of the interactive part through the different stages by scrolling down and up functions in a very hectic one, sometimes freezing in a wrong state with respect to the part of the text chosen. While the blue buttons restore the correct situation, the risk could be for a not-savvy reader to misinterpret that part.

E) the “big arrows” used to guide the reader in scrolling down show a bad presentation (leaving a lot of vertical space and not working as automatic scrolling to the next anchor, at least in the reviewer experience). The reviewer suggests making them better distanced, more visually appealing, and interactive to automatically reach the next anchor.


Decision:

Overall, the reviewer believes that the work is valuable and well-constructed. On the other hand, some revisions are needed in terms of the written part, its interactive aspects, and in their fluid combination, to make it of the quality of the JoVI Experimental track.
For these reasons, the reviewer argues for a major revision score and asks the authors to consider the requested revisions, try to implement them, and/or contact me for any clarification or discussion on their meanings.


MINOR comments/typos::

  1. the underlying phenomonen – > the underlying phenomenon

  2. (here error defined as the total area that is only inside one box, i.e. false positives (in our proposed concept, not in the ground truth concept), or false negatives (in ground truth concept, not in our proposed concept)) – > please avoid nested parentheses

3) probably approximately correct (PAC) learning,-- > please report the primary source instead of Wikipedia page (it is listed already in Wikipedia as the original 1984 paper)

  1. precictions – > predictions

  2. demonstrate why fully-automated approaches will never be able to be error-free if these assumptions are validated. – > Is this sentence a little too strong?

Why this is important

  1. effected by them – > affected by them

  2. called called Gender… – > called Gender

8) Risk Minimization – > This link is broken (page not found)

PAC learning

  1. Blumer et. al. – > Please add a hyperlink to the reference

  2. labeled trainign – > labeled training

  3. Sometimes formulas are not correctly rendered, e.g., “… than 𝜖ϵ, with probability at least 1−𝛿1−𝛿…”

  4. with much more complex data and intricate, high-dimensional processes -- > this sentence seems a little too vague and “flashy”. What are “high-dimensional processes”?

  5. topiccan -- > topic can

  6. “this article on its Github repository.” – > the link does not bring the user to the Github repo but to this paper: https://www.liebertpub.com/doi/full/10.1089/big.2016.0051

Clarified which points are drawn in #12

Thank you for your review! Adding some clarifications about the revisions that were made.

Section 1: improve the points explorer/testing environment

I added a counter for the number of training points and the number of testing points shown. The training points disappear because they are not part of how the model is evaluated - it's to illustrate what happens when we put a model out into the wilderness.

Section 2: clarify the assumptions for the four side of the box

The proof that is illustrated is taken from the cited literature, so I am somewhat constrained in what I can change, but I did change the description of the proof somewhat to point out that the errors are based on uniform sampling of the space, and so the area of the strips correspond to probability of errors.

Section 3: clarify the hosting of the final prototype (probably JoVI as it is)

For the internet connection issue, I am happy to provide a static build that is hosted on JOVI. I'm not sure how that happens though. The project is fully buildable and does not rely on any backend server.

Section 4: improve the explainability coordinating the points classes with the points selections

While I like the suggestion, I ultimately didn't change the interactivity of the game because it was not mentioned as a necessary revision in the meta review.

Beyond the above revisions, I made some grammatical and typo-related fixes. I also spaced out the sections a little bit to make the transitions less jarring in the proof stage.