powerResource is a package designed to improve solar resource data analysis. Most engineers use a compilation of spreadsheet templates to organize all of their onsite solar resource measurements, compare this data to satellite models, and predict future solar resource statistics. To handle the amount of data involved in these analyses and to perform all of the complex calculations, these spreadsheets end up being very clunky and slow. These templates also have ample room for human error. Utilizing a scripting language (R), drastically speeds up the analysis, ensures reproducibility, and reduces errors. This package unlocks a workflow that will be accessible to engineers with little programming experience.
The easiest way to install this package from github is to use the ‘devtools’ package. To make sure you have that, you can run the code:
install.packages('devtools')
If you have that package installed, then you can install powerResource with the following code:
devtools::install_github('btaute/powerResource')
This package takes advantage of the excellent group of packages made available by RStudio known as the TidyVerse. I strongly recommend you install and use RStudio to serve as your development environment and you also install the tidyverse packages with
install.packages('tidyverse')
Once you have the package installed, the following examples will show how to utilize the package to evaluate onsite data and perform an “MCP”. The term MCP describes the method of Measuring onsite data, Correlating it to long-term satellite models, and then Predicting future solar resource for that location.
To start your data analysis, you will want to have all of your datasets accessible in one location. You’ll also want to save the script that you use to run your analysis in this location so that future you or colleagues can go back and reproduce your work. This collection of files and folders into one directory is called a repository. This example uses a repository organized like the following: example_repository/ example_script.R data/ groundwork/ cpr/ vaisala/ tmy/
The folders “groundwork”, “cpr”, “vaisala”, and “tmy” within the “data” folder each contain the CSVs that have the data we plan to use in the analysis. Groundwork is the company that supplies the onsite data for this analysis, Clean Power Research has satellite model datasets that will be used as one source of longterm datasets, and Vaisala is another company that provides satellite model datasets to be used as a source of our longterm datasets.
You will want to make this repository your “working directory” so that you can easily access all the data with your code. To do this, put the following code at the top of your example_script.R file, but be sure to put the correct path to the repository on your computer. I’m guessing your username isn’t “btaute”… Note that you need to use “/” instead of "" if you have folders to navigate through folders.
setwd('C:/Users/btaute/Documents/example_respository')
Now you want to load in all the data you have in your repository. The script below shows a couple different ways to accomplish that. Note that it starts out by loading in the powerResource package so we can have access to its functions. These examples also make use of the tidyverse packages.
## Load the powerResource package
library(powerResource)
library(tidyverse)
## Save all of the files in the groundwork folder that end in CSV in a list
groundwork_files <- paste0('data/groundwork/', dir(path = "data/groundwork/", pattern = "*.csv"))
## Read in the groundwork files, clean up all of the variables, and save them as a dataframe named sms_flagged
sms_flagged <- read_groundwork(groundwork_files)
## Save a list of the files in the cpr folder
cpr_files <- c('data/cpr/cpr_timeseries_v33.csv', 'data/cpr/cpr_timeseries_v34.csv')
## Read in the cpr files, clean up all of the variable names, and save them in a named list
cpr_list <- read_cpr(cpr_files)
## Read in a single vaisala time series file, clean up its variable names, and save it in a named list
vaisala_list <- read_vaisala('data/vaisala/vaisala_timeseries.csv')
Let’s check that all of that data imported correctly.
head(sms_flagged)
#> # A tibble: 6 x 13
#> datetime month year ghi1 ghi2 ghi3 ws temp ghi1_flag
#> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2018-11-08 20:48:00 11 2018 -0.452 -0.534 -0.383 2.96 0.5 1
#> 2 2018-11-08 20:49:00 11 2018 -0.472 -0.556 -0.387 3.00 0.5 1
#> 3 2018-11-08 20:50:00 11 2018 -0.480 -0.570 -0.396 3.64 0.59 1
#> 4 2018-11-08 20:51:00 11 2018 -0.480 -0.570 -0.335 2.34 0.6 1
#> 5 2018-11-08 20:52:00 11 2018 -0.483 -0.578 -0.255 2.13 0.6 1
#> 6 2018-11-08 20:53:00 11 2018 -0.495 -0.580 -0.274 2.72 0.6 1
#> # ... with 4 more variables: ghi2_flag <dbl>, ghi3_flag <dbl>, ws_flag <dbl>,
#> # temp_flag <dbl>
summary(sms_flagged)
#> datetime month year
#> Min. :2018-11-08 20:48:00 Min. : 1.000 Min. :2018
#> 1st Qu.:2019-03-01 03:35:45 1st Qu.: 3.000 1st Qu.:2019
#> Median :2019-06-21 10:23:30 Median : 7.000 Median :2019
#> Mean :2019-06-21 10:23:30 Mean : 6.743 Mean :2019
#> 3rd Qu.:2019-10-11 17:11:15 3rd Qu.:11.000 3rd Qu.:2019
#> Max. :2020-01-31 23:59:00 Max. :12.000 Max. :2020
#>
#> ghi1 ghi2 ghi3 ws
#> Min. : -4.33 Min. : -4.51 Min. : -35.94 Min. : 0.00
#> 1st Qu.: -1.25 1st Qu.: -1.25 1st Qu.: -0.74 1st Qu.: 1.00
#> Median : 1.18 Median : 1.30 Median : 0.73 Median : 2.00
#> Mean : 159.42 Mean : 159.50 Mean : 158.18 Mean : 11.54
#> 3rd Qu.: 233.90 3rd Qu.: 234.40 3rd Qu.: 230.57 3rd Qu.: 3.42
#> Max. :1277.28 Max. :1276.14 Max. :1334.39 Max. :3277.00
#> NA's :68934 NA's :68934 NA's :68934 NA's :68934
#> temp ghi1_flag ghi2_flag ghi3_flag
#> Min. :-15.30 Min. : 1.0 Min. : 1.0 Min. : 1.0
#> 1st Qu.: 2.70 1st Qu.: 1.0 1st Qu.: 1.0 1st Qu.: 1.0
#> Median : 11.27 Median : 1.0 Median : 1.0 Median : 1.0
#> Mean : 12.03 Mean : 107.5 Mean : 108.1 Mean : 108.6
#> 3rd Qu.: 21.68 3rd Qu.: 1.0 3rd Qu.: 1.0 3rd Qu.: 1.0
#> Max. : 34.90 Max. :1000.0 Max. :1000.0 Max. :1000.0
#> NA's :68934
#> ws_flag temp_flag
#> Min. : 1.0 Min. : 1.0
#> 1st Qu.: 1.0 1st Qu.: 1.0
#> Median : 1.0 Median : 1.0
#> Mean : 107.5 Mean : 107.5
#> 3rd Qu.: 1.0 3rd Qu.: 1.0
#> Max. :1000.0 Max. :1000.0
#>
head(cpr_list)
#> $cpr3.3
#> # A tibble: 195,048 x 9
#> datetime year month ghi dni dhi temp ws model
#> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 1998-01-01 01:00:00 1998 1 0 0 0 0 7 cpr3.3
#> 2 1998-01-01 02:00:00 1998 1 0 0 0 0 7 cpr3.3
#> 3 1998-01-01 03:00:00 1998 1 0 0 0 0 7 cpr3.3
#> 4 1998-01-01 04:00:00 1998 1 0 0 0 0 7 cpr3.3
#> 5 1998-01-01 05:00:00 1998 1 0 0 0 0 7 cpr3.3
#> 6 1998-01-01 06:00:00 1998 1 0 0 0 -1 7 cpr3.3
#> 7 1998-01-01 07:00:00 1998 1 0 0 0 -1 7 cpr3.3
#> 8 1998-01-01 08:00:00 1998 1 5 14 5 0 8 cpr3.3
#> 9 1998-01-01 09:00:00 1998 1 117 336 66 1 8 cpr3.3
#> 10 1998-01-01 10:00:00 1998 1 148 160 100 2 9 cpr3.3
#> # ... with 195,038 more rows
#>
#> $cpr3.4
#> # A tibble: 194,280 x 9
#> datetime year month ghi dni dhi temp ws model
#> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 1998-01-01 01:00:00 1998 1 0 0 0 3 6 cpr3.4
#> 2 1998-01-01 02:00:00 1998 1 0 0 0 3 6 cpr3.4
#> 3 1998-01-01 03:00:00 1998 1 0 0 0 2 6 cpr3.4
#> 4 1998-01-01 04:00:00 1998 1 0 0 0 2 6 cpr3.4
#> 5 1998-01-01 05:00:00 1998 1 0 0 0 2 7 cpr3.4
#> 6 1998-01-01 06:00:00 1998 1 0 0 0 2 7 cpr3.4
#> 7 1998-01-01 07:00:00 1998 1 0 0 0 2 6 cpr3.4
#> 8 1998-01-01 08:00:00 1998 1 5 16 5 2 6 cpr3.4
#> 9 1998-01-01 09:00:00 1998 1 115 290 71 3 6 cpr3.4
#> 10 1998-01-01 10:00:00 1998 1 154 159 106 6 7 cpr3.4
#> # ... with 194,270 more rows
head(vaisala_list)
#> $vaisala2.1.3
#> # A tibble: 203,039 x 9
#> datetime year month ghi dni dhi temp ws model
#> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 1996-12-31 19:00:00 1996 12 0 0 0 7.22 1.42 vaisala2.1.3
#> 2 1996-12-31 20:00:00 1996 12 0 0 0 6.73 1.55 vaisala2.1.3
#> 3 1996-12-31 21:00:00 1996 12 0 0 0 6.49 1.75 vaisala2.1.3
#> 4 1996-12-31 22:00:00 1996 12 0 0 0 6.46 1.98 vaisala2.1.3
#> 5 1996-12-31 23:00:00 1996 12 0 0 0 6.04 2.33 vaisala2.1.3
#> 6 1997-01-01 00:00:00 1997 1 0 0 0 5.21 2.81 vaisala2.1.3
#> 7 1997-01-01 01:00:00 1997 1 0 0 0 4.64 3.15 vaisala2.1.3
#> 8 1997-01-01 02:00:00 1997 1 0 0 0 4.58 3.37 vaisala2.1.3
#> 9 1997-01-01 03:00:00 1997 1 0 0 0 4.77 3.45 vaisala2.1.3
#> 10 1997-01-01 04:00:00 1997 1 0 0 0 4.84 3.41 vaisala2.1.3
#> # ... with 203,029 more rows
You can also break up the process of importing the onsite data by using the following workflow:
## Import the files
groundwork_df <- import_groundwork_files(groundwork_files)
## Tidy the Dataframe
tidy_sms <- tidy_groundwork(groundwork_df)
## Flag the Dataframe
sms_flagged <- flag_sms(tidy_sms)
It’s important to note the flags that Groundwork attaches to each measurement of the onsite data. An important metric of the quality of the data is the percentage of data that is flagged. With our data in a tidy format, it’s easy to visualize this. For a complete description of the flag definitions, please reference Groundwork’s monthly reports.
## Get the flag count dataframe
flag_count <- count_flags(sms_flagged)
## Plot the monthly data recovery results
plot_flag_count_bar_chart(flag_count)
Once you have evaluated the onsite data, it’s time to correlate it with the longterm satellite model datasets. To do this, we need to compile all of the datasets into one large dataframe, what I call a Megaframe! This can be done with the following code:
## Prepare for Megaframe
sms <- prepare_sms_flagged_for_megaframe(sms_flagged, flagged_data_is_filtered = TRUE)
## Create the Megaframe!
megaframe <- compile_megaframe(sms, cpr_list, vaisala_list)
The code below breaks down the steps of prepare_sms_flagged_for_megaframe().
sms_filtered <- filter_sms(sms_flagged)
sms_averaged <- average_ghi_sensors(sms_filtered)
sms_hourly <- average_data_to_hourly(sms_averaged)
sms <- format_sms_vars_for_megaframe(sms_hourly)
There are two ways to calculate the results of the MCP. The first is done by creating an MCP object, which will store all of the inputs and results of the MCP in one location. This is the simplest method.
## Create the MCP object by giving it the megaframe parameter
my_mcp <- MCP(megaframe)
## Run the mcp
my_mcp$run()
## Get the results of the mcp
my_mcp$get_results()
#> # A tibble: 21 x 12
#> model longterm_resour~ sensor longterm_adjust~ timestep por iav sms_por
#> <chr> <dbl> <chr> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 cpr3~ 1578. ghi 1644. daily 22.3 0.0341 1.10
#> 2 cpr3~ 1578. ghi 1642. hourly 22.3 0.0341 1.10
#> 3 cpr3~ 1578. ghi 1607. daily 22.2 0.0308 1.10
#> 4 cpr3~ 1578. ghi 1606. hourly 22.2 0.0308 1.10
#> 5 sms 1532. ghi NA <NA> 1.10 NA 1.10
#> 6 vais~ 1559. ghi 1597. daily 23.2 0.0329 1.10
#> 7 vais~ 1559. ghi 1592. hourly 23.2 0.0329 1.10
#> 8 cpr3~ 14.4 temp 14.4 daily 22.3 0.0539 1.10
#> 9 cpr3~ 14.4 temp 14.4 hourly 22.3 0.0539 1.10
#> 10 cpr3~ 14.8 temp 14.0 daily 22.2 0.0514 1.10
#> # ... with 11 more rows, and 4 more variables: adj.r.squared <dbl>,
#> # r.squared <dbl>, sigma <dbl>, rom <dbl>
You can also view the MCP results in a more convenient format, broken out by the timestep used in the MCP (hourly or daily).
my_mcp$get_daily_results_table()
sensor | model | timestep | longterm_resource | longterm_adjustment | r.squared | iav | rom |
---|---|---|---|---|---|---|---|
ghi | vaisala2.1.3 | daily | 1558.720223 | 1596.867428 | 0.9847949 | 0.0329220 | 0.0078100 |
ghi | cpr3.4 | daily | 1578.156018 | 1606.854728 | 0.9844874 | 0.0307964 | 0.0074459 |
ghi | cpr3.3 | daily | 1578.208504 | 1644.474186 | 0.9826992 | 0.0340787 | 0.0083353 |
temp | cpr3.4 | daily | 14.820918 | 14.019656 | 0.9961618 | 0.0513910 | 0.0113056 |
temp | cpr3.3 | daily | 14.406030 | 14.365829 | 0.9955701 | 0.0538586 | 0.0118892 |
temp | vaisala2.1.3 | daily | 13.825857 | 14.386784 | 0.9880819 | 0.0592220 | 0.0136900 |
ws | vaisala2.1.3 | daily | 4.512977 | 2.483125 | 0.6382267 | 0.0345728 | 0.0206383 |
ws | cpr3.3 | daily | 4.144328 | 2.591740 | 0.5281097 | 0.0571327 | 0.0384372 |
ws | cpr3.4 | daily | 3.291134 | 2.437311 | 0.5150573 | 0.0333453 | 0.0227138 |
my_mcp$get_hourly_results_table()
sensor | model | timestep | longterm_resource | longterm_adjustment | r.squared | iav | rom |
---|---|---|---|---|---|---|---|
ghi | vaisala2.1.3 | hourly | 1558.720223 | 1591.510175 | 0.9718675 | 0.0329220 | 0.0085497 |
ghi | cpr3.4 | hourly | 1578.156018 | 1605.510111 | 0.9688469 | 0.0307964 | 0.0082598 |
ghi | cpr3.3 | hourly | 1578.208504 | 1642.204193 | 0.9663798 | 0.0340787 | 0.0092635 |
temp | cpr3.4 | hourly | 14.820918 | 14.003561 | 0.9843749 | 0.0513910 | 0.0124354 |
temp | cpr3.3 | hourly | 14.406030 | 14.357571 | 0.9828037 | 0.0538586 | 0.0131631 |
temp | vaisala2.1.3 | hourly | 13.825857 | 14.375962 | 0.9721408 | 0.0592220 | 0.0153525 |
ws | vaisala2.1.3 | hourly | 4.512977 | 2.483620 | 0.5757527 | 0.0345728 | 0.0221493 |
ws | cpr3.3 | hourly | 4.144328 | 2.591382 | 0.4761033 | 0.0571327 | 0.0403000 |
ws | cpr3.4 | hourly | 3.291134 | 2.435817 | 0.4074752 | 0.0333453 | 0.0248847 |
The second method of running an MCP allows you to specify a particular timestep that you want to run (instead of running both daily and hourly automatically).
## Filter Megaframe so we only have ghi data
megaframe_ghi_only <- select(megaframe, -temp, -ws)
## Run an MCP for daily timestep
mcp_example_2 <- run_mcp_for_single_timestep(megaframe_ghi_only, daily = TRUE)
## Get MCP Results
get_mcp_results(mcp_example_2, megaframe_ghi_only)
#> # A tibble: 4 x 12
#> model longterm_resour~ sensor longterm_adjust~ timestep por iav sms_por
#> <chr> <dbl> <chr> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 cpr3~ 1578. ghi 1644. daily 22.3 0.0341 1.10
#> 2 cpr3~ 1578. ghi 1607. daily 22.2 0.0308 1.10
#> 3 sms 1532. ghi NA <NA> 1.10 NA 1.10
#> 4 vais~ 1559. ghi 1597. daily 23.2 0.0329 1.10
#> # ... with 4 more variables: adj.r.squared <dbl>, r.squared <dbl>, sigma <dbl>,
#> # rom <dbl>
The last function we’ll demonstrate helps you evaluate the benefits of additional onsite data. It runs an MCP after each month of onsite data so that you can benchmark how the longterm predictions change with each additional month of data. Typically, after about 16 months of data, you will see that collecting more onsite data has a very small impact on the longterm prediction. At this point, it may make sense to decommission your onsite measurements. This plot also illustrates the danger in running an MCP too soon, as the first couple months of onsite data have drastically different results.
## Run Campaign
campaign_results <- run_sms_campaign(megaframe)
## Plot Longterm GHI Adjustment after each month of the campaign
plot_campaign_ghi_longterm_adjustments(campaign_results)
## Plot Campaign uncertainty after each month of the campaign
plot_campaign_rom_uncertainty(campaign_results)