Minimum number of observations
kimiakarimi opened this issue · 4 comments
kimiakarimi commented
Is there a recommendation on choosing the minimum number of observations in ModelEstimation?
I know the default is 100, but if I have a larger set of observations, say 1000, what should be the minNumObs? I am working on summer (June, July and August) loads and selecting different minNumObs significantly affects WRTDS loading estimates especially in large basins and high flow years.
rmhirsch49 commented
I would say that sticking with 100 is a good choice. The choice of going to a lower value exists for working with smaller data sets so that the windows don’t spread out too much. I’m a bit surprised to hear that modifying minNumObs makes a substantial difference. I’d be interested in seeing a data set where that is an issue. Please feel to send me an eList from such an example - please send it to rhirsch@usgs.gov<mailto:rhirsch@usgs.gov>.
Bob Hirsch
From: kimiakarimi <notifications@github.com>
Reply-To: USGS-R/EGRET <reply@reply.github.com>
Date: Thursday, May 28, 2020 at 6:27 PM
To: USGS-R/EGRET <EGRET@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Subject: [EXTERNAL] [USGS-R/EGRET] Minimum number of observations (#257)
Is there a recommendation on choosing the minimum number of observations in ModelEstimation?
I know the default is 100, but if I have a larger set of observations, say 1000, what should be the minNumObs? I am working on summer (June, July and August) loads and selecting different minNumObs significantly affects WRTDS loading estimates especially in large basins and high flow years.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#257>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAI53UJKZSYJGZY3GWXR6G3RT3QOJANCNFSM4NNQBV6A>.
kimiakarimi commented
Thank you. My question was more related to choosing larger values of minNumObs for regression. Will the estimates be more realistic if I choose 100 or use approximately all the observations (around 1000 in this case)?
rmhirsch49 commented
I have looked at your data set and that helps me formulate an answer. You indicated that you set minNumObs to a value very close to the total number of samples you have (over 1000). Doing this largely defeats the purpose of the weighted regression. It means that the data used in each regression is very spread out, both in terms of time and in terms of discharge. Thus, the regression is no longer a “local” regression. In the limit this becomes like using ordinary least squares. It will not adjust for changes like changing seasonality or changing concentration versus discharge relationships. So, you are fortunate to have a very large and very long record. As such, you should take advantage of it by keeping minNumObs a low number (such as 100). I wouldn’t go below 100 because that will start to introduce some discontinuities.
I would note from looking at some of the periods of high flow that the sampling is rather sparse and the residuals appear to have a lot of serial correlation (meaning that the model may predict high for a period of a few months and then predict low for a few months). You might want to consider using WRTDS-K if your main goal is to get the best estimates for each month or each season (as opposed to doing a trend study). You can read about it in two of the references on our EGRET page (see below). There is also a vignette there that shows how it is done and the additional code that is needed. You can read about it at http://usgs-r.github.io/EGRET/articles/Making%20WRTDS_K%20flux%20estimates.html. The two papers on this are mentioned in the first paragraph of the write up. They are: https://pubs.er.usgs.gov/publication/sir20195084 and https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2019WR025338
Let me know if you need more help.
Bob Hirsch
From: kimiakarimi <notifications@github.com>
Reply-To: USGS-R/EGRET <reply@reply.github.com>
Date: Friday, May 29, 2020 at 4:38 PM
To: USGS-R/EGRET <EGRET@noreply.github.com>
Cc: Robert Hirsch <rhirsch@usgs.gov>, Comment <comment@noreply.github.com>
Subject: [EXTERNAL] Re: [USGS-R/EGRET] Minimum number of observations (#257)
Thank you. My question was more related to choosing larger values of minNumObs for regression. Will the estimates be more realistic if I choose 100 or use approximately all the observations (around 1000 in this case)?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#257 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAI53UK7VZL6VESD6LIMJBLRUAML3ANCNFSM4NNQBV6A>.
kimiakarimi commented
Dear Dr Hirsch
Thank you for looking into this. I was thinking that using almost all
observations will lead WRTDS to act as LOADEST. As I mentioned before, it
seems like loading estimates in summer 1995 is largely influenced by
estimates in June 29, 1995, which is a high flow day. There is not
sufficient number of similar observations (in terms of discharge) and I
believe even with 100 minNumObs, the window widths need to be increased. We
also don't have observation in that day and as you mentioned, sparse
observations on extreme days.
My main goal is to estimate summer load for my study area and use it as
"true" loads in my model. I used WRTDS-k but that would lead to higher
concentration for that day and higher load for summer 1995.
Thank you,
Kimia
…On Fri, May 29, 2020 at 5:09 PM Robert Hirsch ***@***.***> wrote:
I have looked at your data set and that helps me formulate an answer. You
indicated that you set minNumObs to a value very close to the total number
of samples you have (over 1000). Doing this largely defeats the purpose of
the weighted regression. It means that the data used in each regression is
very spread out, both in terms of time and in terms of discharge. Thus, the
regression is no longer a “local” regression. In the limit this becomes
like using ordinary least squares. It will not adjust for changes like
changing seasonality or changing concentration versus discharge
relationships. So, you are fortunate to have a very large and very long
record. As such, you should take advantage of it by keeping minNumObs a low
number (such as 100). I wouldn’t go below 100 because that will start to
introduce some discontinuities.
I would note from looking at some of the periods of high flow that the
sampling is rather sparse and the residuals appear to have a lot of serial
correlation (meaning that the model may predict high for a period of a few
months and then predict low for a few months). You might want to consider
using WRTDS-K if your main goal is to get the best estimates for each month
or each season (as opposed to doing a trend study). You can read about it
in two of the references on our EGRET page (see below). There is also a
vignette there that shows how it is done and the additional code that is
needed. You can read about it at
http://usgs-r.github.io/EGRET/articles/Making%20WRTDS_K%20flux%20estimates.html.
The two papers on this are mentioned in the first paragraph of the write
up. They are: https://pubs.er.usgs.gov/publication/sir20195084 and
https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2019WR025338
Let me know if you need more help.
Bob Hirsch
From: kimiakarimi ***@***.***>
Reply-To: USGS-R/EGRET ***@***.***>
Date: Friday, May 29, 2020 at 4:38 PM
To: USGS-R/EGRET ***@***.***>
Cc: Robert Hirsch ***@***.***>, Comment ***@***.***>
Subject: [EXTERNAL] Re: [USGS-R/EGRET] Minimum number of observations
(#257)
Thank you. My question was more related to choosing larger values of
minNumObs for regression. Will the estimates be more realistic if I choose
100 or use approximately all the observations (around 1000 in this case)?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<
#257 (comment)>, or
unsubscribe<
https://github.com/notifications/unsubscribe-auth/AAI53UK7VZL6VESD6LIMJBLRUAML3ANCNFSM4NNQBV6A>.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#257 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALJ22UY3VATP4OESJQZFMA3RUAP7NANCNFSM4NNQBV6A>
.
--
---------------
Kimia Karimi
Ph.D. Student | Center for Geospatial Analytics
College of Natural Resources
Broughton Hall 2333
North Carolina State University
2601 Stinson Dr.
Raleigh, NC 27607 USA
kkarimi2@ncsu.edu <jsmith@ncsu.edu> | 984-218-9761 | geospatial.ncsu.edu