pgf-tikz/pgfplots

boxplot `data filter` drops samples

Opened this issue · 0 comments

The boxplot data filter drops outliers as shown in the following example (adapted from the manual):

\documentclass[tikz]{standalone}

\usepackage{filecontents}
\usepackage{pgfplots}

\pgfplotsset{
  compat=1.18,
  only if/.style args={entry of #1 is #2}{
    /pgfplots/boxplot/data filter/.code={
      \edef\tempa{\thisrow{#1}}
      \edef\tempb{#2}
      \ifx\tempa\tempb
      \else
        \def\pgfmathresult{}
      \fi
    },
  },
}

\usepgfplotslibrary{statistics}

\begin{filecontents*}[overwrite]{combined.csv}
v,set
0.1,a
0.2,a
0.3,a
1.0,a
0.4,a
0.2,a
0.8,b
0.9,b
1.0,b
\end{filecontents*}
\begin{filecontents*}[overwrite]{one.csv}
v,set
0.1,a
0.2,a
0.3,a
1.0,a
0.4,a
0.2,a
\end{filecontents*}
\begin{filecontents*}[overwrite]{other.csv}
v,set
0.8,b
0.9,b
1.0,b
\end{filecontents*}

\begin{document}
\begin{tikzpicture}
  % FIXME: Missing outlier for boxplot of a!
  \begin{axis}[
      boxplot,
      boxplot/draw direction=y,
      table/col sep=comma,
  ]
    \addplot table[only if={entry of set is a},y=v] {combined.csv};
    \addplot table[only if={entry of set is b},y=v] {combined.csv};
  \end{axis}
\end{tikzpicture}
\begin{tikzpicture}
  % Plotting the data from two different data files works!
  \begin{axis}[
      boxplot,
      boxplot/draw direction=y,
      table/col sep=comma,
  ]
    \addplot table[y=v] {one.csv};
    \addplot table[y=v] {other.csv};
  \end{axis}
\end{tikzpicture}
\begin{tikzpicture}
  % Unclear if the issue is the filter or reading from the file...
  \begin{axis}[
      boxplot,
      boxplot/draw direction=y,
      table/col sep=comma,
  ]
    \addplot table[only if={entry of set is a},y=v] {combined.csv};
    \addplot table[only if={entry of set is a},y=v] {one.csv};
  \end{axis}
\end{tikzpicture}
\end{document}

Contrast the first two graphs produced by the prior code:

Example where an outlier has been dropped Expected output -- not using `data filter`

The one on the left is missing the outlier whereas the one on the right shows the outlier as expected.