(This work is from some unpublished research that is available for review with references starting on page 105 of thesis: http://e-collection.library.ethz.ch/eserv/eth:49522/eth-49522-02.pdf). This is just a primer landing to give some understanding of the work's scientific context. Figures and tables from the LaTeX may be available soon.
Environmental applications of engineered nanomaterials (ENMs) have tremendous potential to enhance remediation of pollutants in the subsurface, to improve sustainability of nano-formulated plant protection products in comparison to traditional pesticides, and to improve nano-based plant nutrient delivery. However, the increased production of ENMs, coupled with their apparent aquatic toxicity,123–125 raise serious concerns about their fate and transport of these materials in the environment.
Understanding of ENM transport in the subsurface is far from complete. Similar to larger, colloidal particles, the transport behavior of a machine learning model to predict enm attachment efficiency ENMs in the subsurface environment is a function of numerous deposition processes, which are controlled by a complex system of physical and chemical factors.126 Despite numerous qualitative investigations that examine the influence of physical chemical factors on ENM transport, relatively few quantitative insights have been gained. One contributing factor for the slow rate of progress is that it is not yet possible to identify and separate natural nanomaterials from engineered nanomaterials in complex environmental matrices.47 As a result, investigation of ENM transport, particularly for high production volume nanomaterials (e.g., TiO2 and ZnO), is limited to examination in simple, artificial soil systems. Another critical factor is that ENM transport is primarily investigated through experimentation with soil columns. Soil columns have been in use for more than 130 years and are the foundation for most fluid dynamics, but are too coarse to sufficiently understand nano-scale processes. On the other hand, new techniques, such as high-resolution X-ray computed tomography of soil columns, enable non-destructive transport characterization, but are still too course to resolve ENMs (scanning resolution is typically between 0.84μm–4.4μm).
Quartz crystal microbalance with dissipation (QCM-D) monitoring enables real-time monitoring of nanoscale mass deposition. QCM-D monitors the deposition rate of particles onto an oscillating sensor surface, isolating deposition kinetics, and allowing for a direct measure of attachment efficiency (α) without contributions from convection or filtration, unlike traditional soil column experiments. α is an important kinetic transport parameter and is widely used in colloid filtration theory (CFT) to quantify the likelihood of a particle attaching to a surface after a collision. It is important to note that several closed-form correlations have been developed to predict α, although none explicitly consider the physicochemical conditions which have been reported to affect the value of α, including ionic strength, pH, charge of particle and collector surface, presence of organic matter, whether dissolved in solution or coating the collector surface, temperature, and particle shape. Furthermore, while the qualitative influence of these conditions on α has been well studied and, in some cases, reasonable mechanistic explanations are provided (e.g. agreement or disagreement with Derjaguin-Landau-Verwey-Overbeek theory), a predictive model of α that explicitly considers these conditions has not yet been developed.
Machine learning allows us to develop empirical models from complex systems where the underlying relationships between the data are too complex to develop by hand.150 Machine learning has been successfully applied to a wide range of complex problems.151–154 In two recent studies90,91 machine learning was applied to predict the toxicity and biological impacts of ENMs, based explicitly on the molecular properties of the nanomaterial. Despite the successes of machine learning in a wide range of applications, it has not been applied to the complex task of modeling environmental transport until very recently. A very recent study by Goldberg et al. 107 employed ensemble machine learning (random forest) regression and classification to predict the retained fraction (RF; the fraction of materials retained during a soil column experiment in comparison to the total mass of materials injected into the column) and shape of retention profile (RP) using a database of more than 200 nanomaterial column transport experiments amassed from published literature. Goldberg et al.107 reported that their model was able to predict the RF with a mean squared error between 0.025–0.033, and the RP with an expected F1-score (the weighted harmonic mean of precision and recall) between 60–70%. Further, by recursively removing physical and chemical features to optimize model predictive performance, the authors were able to rank the importance of the physicochemical state features (e.g., pH, ionic strength, nanomaterial type, etc) to ENM transport.
The high variability in reported α values under seemingly similar experimental conditions is one reason why the mechanistic understanding remains poor. For example, the presence of an attached layer of natural organic matter (NOM) has been reported to hinder, enhance, or have no effect on α under similar solution conditions. The purpose of this work is to combine all of the available α data in order to identify which of the complex set of variables are most important and, ultimately, produce a predictive model for α based on the identified variables. Here, 299 total experiments with 13 physicochemical features each were chosen from 12 publications from 2008–2015 to form the largest QCM-D derived α database to date. Ensemble machine learning (gradient boosting decision trees) was employed to empirically relate the physicochemical state (i.e., physicochemical training features) to the α values (i.e., target feature) measured by QCM-D. Grid search hyper parameter optimization with cross validation (GSHPOCV) and recursive feature elimination with cross-validation (RFECV) were employed to optimize model performance. To identify the physicochemical features most important to prediction, RFECV results from 100 model runs were aggregated and investigated to identify the physical and chemical features critical to predicting α. The predictive, empirical model presented here will aid in identifying which physicochemical characteristics are most influential to α. An improved understanding of transport parameters is key to accurately predicting the impact of new particle types and making risk-informed regulatory decisions, or even designing new ENMs to be safe from the ground up.
The database developed for this work includes 299 separate experiments extracted from 12 peer-reviewed QCM-D nanomaterial transport studies from 2008–2015.127–138 From each experiment, 13 physicochemical training features were recorded and 1 target experimental result feature, i.e., α, was chosen. Studies which reported particle deposition rate but not particle attachment efficiency were not included in the database. As mentioned previously, in order to calculate α from particle deposition rates, the favorable deposition rate at identical experimental conditions must be known, and so without this knowledge, α could not be extrapolated from particle deposition rates. Furthermore, studies which explored particle deposition onto oppositely charged surfaces were not considered, as α = 1 in these cases, by definition.