boxplot `data filter` drops samples
Opened this issue · 0 comments
joel-coffman commented
The boxplot data filter
drops outliers as shown in the following example (adapted from the manual):
\documentclass[tikz]{standalone}
\usepackage{filecontents}
\usepackage{pgfplots}
\pgfplotsset{
compat=1.18,
only if/.style args={entry of #1 is #2}{
/pgfplots/boxplot/data filter/.code={
\edef\tempa{\thisrow{#1}}
\edef\tempb{#2}
\ifx\tempa\tempb
\else
\def\pgfmathresult{}
\fi
},
},
}
\usepgfplotslibrary{statistics}
\begin{filecontents*}[overwrite]{combined.csv}
v,set
0.1,a
0.2,a
0.3,a
1.0,a
0.4,a
0.2,a
0.8,b
0.9,b
1.0,b
\end{filecontents*}
\begin{filecontents*}[overwrite]{one.csv}
v,set
0.1,a
0.2,a
0.3,a
1.0,a
0.4,a
0.2,a
\end{filecontents*}
\begin{filecontents*}[overwrite]{other.csv}
v,set
0.8,b
0.9,b
1.0,b
\end{filecontents*}
\begin{document}
\begin{tikzpicture}
% FIXME: Missing outlier for boxplot of a!
\begin{axis}[
boxplot,
boxplot/draw direction=y,
table/col sep=comma,
]
\addplot table[only if={entry of set is a},y=v] {combined.csv};
\addplot table[only if={entry of set is b},y=v] {combined.csv};
\end{axis}
\end{tikzpicture}
\begin{tikzpicture}
% Plotting the data from two different data files works!
\begin{axis}[
boxplot,
boxplot/draw direction=y,
table/col sep=comma,
]
\addplot table[y=v] {one.csv};
\addplot table[y=v] {other.csv};
\end{axis}
\end{tikzpicture}
\begin{tikzpicture}
% Unclear if the issue is the filter or reading from the file...
\begin{axis}[
boxplot,
boxplot/draw direction=y,
table/col sep=comma,
]
\addplot table[only if={entry of set is a},y=v] {combined.csv};
\addplot table[only if={entry of set is a},y=v] {one.csv};
\end{axis}
\end{tikzpicture}
\end{document}
Contrast the first two graphs produced by the prior code:
![Example where an outlier has been dropped](https://private-user-images.githubusercontent.com/8376231/278721877-ae271b9c-fbd5-4492-82da-805daed7b746.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk4NzQwNjUsIm5iZiI6MTcxOTg3Mzc2NSwicGF0aCI6Ii84Mzc2MjMxLzI3ODcyMTg3Ny1hZTI3MWI5Yy1mYmQ1LTQ0OTItODJkYS04MDVkYWVkN2I3NDYucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDcwMSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA3MDFUMjI0MjQ1WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9OTRkNDM5ZTljYzU4MWE1ZjEyNzllYjY0N2EzZTdiMDU4OWY5ZDdmY2MwNjY0MjkzZTlkMzdlYjYyNDY3NjM2YiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.VqtgQQ123CZWo7bKWtR52_qq7xFgGCddqifMXgYe5xc)
![Expected output -- not using `data filter`](https://private-user-images.githubusercontent.com/8376231/278721897-25133349-7768-4cd3-93c3-78eddde6e999.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk4NzQwNjUsIm5iZiI6MTcxOTg3Mzc2NSwicGF0aCI6Ii84Mzc2MjMxLzI3ODcyMTg5Ny0yNTEzMzM0OS03NzY4LTRjZDMtOTNjMy03OGVkZGRlNmU5OTkucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDcwMSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA3MDFUMjI0MjQ1WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9YzNmOTZlYzU4YmVkZjFhZmZlNjI0MjM2ZWQ2ZDFmZTY0YzE2MThjODBmYjU0MTcxNDgwN2NkZjdhMjNjZjM2ZCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.2BxItokoWBhEN9_jVmdS2MK6ShFloxApwViWSrLQmf8)
The one on the left is missing the outlier whereas the one on the right shows the outlier as expected.