Abstract
Fitting a specified model to data is critical in many science and engineering fields. A major task in fitting a specified model to data is to estimate the value of each parameter in the model. Iterative local methods, such as the Gauss–Newton method and the Levenberg–Marquardt method, are often employed for parameter estimation in nonlinear models. However, practitioners must guess the initial value for each parameter to initialize these iterative local methods. A poor initial guess can contribute to non-convergence of these methods or lead these methods to converge to a wrong or inferior solution. In this paper, a solution interval method is introduced to find the optimal estimator for each parameter in a nonlinear model that minimizes the squared error of the fit. To initialize this method, it is not necessary for practitioners to guess the initial value of each parameter in a nonlinear model. The method includes three algorithms that require different levels of computational power to find the optimal parameter estimators. The method constructs a solution interval for each parameter in the model. These solution intervals significantly reduce the search space for optimal parameter estimators. The method also provides an empirical probability distribution for each parameter, which is valuable for parameter uncertainty assessment. The solution interval method is validated through two case studies in which the Michaelis–Menten model and Fick’s second law are fit to experimental data sets, respectively. These case studies show that the solution interval method can find optimal parameter estimators efficiently. A four-step procedure for implementing the solution interval method in practice is also outlined.
1 Introduction
Fitting a specified model to data, also called curve fitting, is an inverse problem that has wide application in many science and engineering fields. For example, researchers often fit a fatigue model, such as the stress-life (S-N) model and the strain-life (ɛ-N) model, to the cycles to failure experimental data of different metals [1]. Fatigue curves, which play a significant role in machine component design [2], are then constructed accordingly [3,4]. Similarly, Fick’s laws govern the moisture diffusion in composite materials [5]. Practitioners usually fit Fick’s second law to experimental data in order to estimate mass diffusivities of different composite materials [6]. These estimated mass diffusivities are then used to design composite structures (e.g., wings of passenger aircraft and the bodies of racing cars) under different service environments [7].
A major task in fitting a specified model to data is to estimate the value of each parameter in the model. Several objective functions, such as least-squares error and maximum likelihood, are available for parameter estimation [8]. Practitioners need to find the optimal parameter values that minimize or maximize a chosen objective function. Linear regression has been applied successfully to estimate parameter values in linear models [9]. However, many scientific and engineering models are nonlinear [10]. Iterative local methods, such as the Gauss–Newton method [9] and the Levenberg–Marquardt method [11], and Bayesian approaches, are usually employed for parameter estimation in nonlinear models [12–14]. Importantly, practitioners must guess the initial value or the prior distribution of each parameter to initialize these methods. A poor initial guess can contribute to non-convergence of these methods or lead these methods to converge to a wrong solution or a sub-optimal solution (e.g., local optimum) [15]. Moreover, practitioners may not know whether the result is the optimal solution or not even when a good initial guess is employed to initialize these methods.
In this paper, a broadly applicable method is introduced to estimate parameter values in a nonlinear model that minimizes the squared error of the fit when the number of unknown parameters in the model is less than the number of data points. To initialize this method, it is not necessary for practitioners to guess the initial value of each parameter in a nonlinear model. To our knowledge, an initial guess free method for least-squares parameter estimation that can be applied to fit any specified nonlinear models to data is a new contribution to curve fitting. This method establishes an empirical probability distribution for each parameter in a nonlinear model. This method also constructs a solution interval for each parameter in a nonlinear model. It is proved that the optimal estimator for each parameter in a nonlinear model that minimizes the squared error of the fit belongs to its corresponding solution interval under certain conditions. The optimal least-squares estimator for each parameter could be found through exhaustive search with specified significant digits (e.g., four significant digits) in the solution interval. The procedure to compute an initial value for each parameter in a nonlinear model is provided when an iterative local method is preferred for parameter estimation. A stochastic algorithm is also developed to find the optimal parameter estimators efficiently.
This paper begins with a short review of currently available methods for parameter estimation in nonlinear models in Sec. 2. The new method to construct a solution interval and search for the optimal least-squares estimator for each parameter in a nonlinear model is presented in Sec. 3. To ease the implementation of this method in practice, four steps to implement this method are summarized in Sec. 4. In Sec. 5, the application of this method is illustrated by fitting the Michaelis–Menten model and Fick’s second law to experimental data sets, respectively. The paper concludes with a brief discussion of the contribution of the work presented here and future research directions.
2 Background and Related Work
The goal of parameter estimation in nonlinear models is to find the optimal estimator for each parameter in a specified nonlinear model that minimizes or maximizes a chosen objective function. Available objective functions include, but are not limited to, least-squares error, weighted least-squares error, maximum likelihood, and robust regression functions for M-estimators [8]. Due to space limitations, this section only summarizes several popular methods to find the optimal parameter estimators in nonlinear models that minimize the squared error of the fit. A brief review of Bayesian approaches for parameter estimation is also provided. A thorough review of available methods for parameter estimation in nonlinear models using different objective functions could be found in books, review papers, and a technical report written by Bates and Watts [16], Björck [13], Jennrich and Ralston [8], Motulsky and Ransnas [15] and Fraley [17], among others.
An iterative least-squares local method is commonly employed to estimate parameter values in nonlinear models since the linear transformation approaches usually cannot produce the optimal least-squares parameter estimators, and many nonlinear models even cannot be transformed into a linear model. An iterative local approach begins with an initial guess for the value of each parameter in a nonlinear model and then alters the values of parameters iteratively to reduce the squared error of the fit. Among currently available local methods that have been developed to direct these iterations, the Gauss–Newton method and the Levenberg–Marquardt method are two popular methods in science and engineering fields [9,13].
The Gauss–Newton method has the advantage that it is only necessary to solve linear equations represented by Eq. (8) in each iteration step. However, the Gauss–Newton method may converge slowly for large residual problems [13,20]. Moreover, the method fails to converge when the matrix JTJ is singular [21]. To solve the convergence issue of the Gauss–Newton method, several modified Gauss–Newton methods, also known as Newton-type methods [13], have been developed. These modified Gauss–Newton methods [22] take second-order derivatives (i.e., Hessian matrices) into account when the residuals of a nonlinear fit are approximated compared to the first-order approximation in Eq. (3). It is usually expensive to compute Hessian matrices of the residuals in each iteration step, and researchers have therefore suggested different methods of approximating Hessian matrices [23,24].
Of note, least-squares parameter estimation in nonlinear models could be regarded as a special case of engineering optimization, where least-squares error is the optimization objective and parameters in a specified nonlinear model are the design variables. Parameter estimation methods, especially trust-region methods, have been generalized and commonly employed to solve unconstrained and constrained optimization problems [30,31]. Similarly, several guided random search techniques invented for engineering optimization [32], such as genetic algorithms, simulated annealing, and particle swarm optimization, also have been applied for parameter estimation in nonlinear models [33–36]. These guided random search techniques are iterative methods that usually begin with multiple initial guesses (i.e., initial population) for the value of each parameter in a nonlinear model. A process involving random variation in parameter values is then performed to improve an objective function in each iteration step [32].
In most applications involving nonlinear models, it is difficult to evaluate the integrations in Eq. (11) analytically. Laplace approximation [37] and Markov chain Monte Carlo methods [38] have been applied to obtain the posterior expectation for each parameter in a nonlinear model.
Iterative least-squares local methods (e.g., the Gauss–Newton method, the Levenberg–Marquardt method, and guided random search techniques) and Bayesian approaches have been applied to estimate parameter values in nonlinear models in many science and engineering fields [13,14,32,39]. A successful application of any method mentioned previously relies on a smart initial guess. Specifically, an initial value for each parameter in a nonlinear model must be specified when an iterative least-squares local method is employed. Bayesian approaches require a user-specified prior distribution for each parameter. A poor initial guess can contribute to non-convergence of these methods or lead these methods to converge to a wrong solution or a sub-optimal solution (e.g., local optimum) [15,40]. However, there is no standard procedure available to specify initial values for least-squares iterative local methods or prior distributions for Bayesian approaches [15,41,42]. In practice, an initial guess for parameter estimation in nonlinear models is made on a case-by-case basis and is often based on previous experience, expert advice, linear least-squares estimators (if linear transformation is feasible for a nonlinear model), graphical exploration, or a grid search (i.e., “brute force” approach) [15,40–42].
3 A Solution Interval Method to Estimate Parameter Values in Nonlinear Models
Based on the research gap discussed in Sec. 2, a solution interval method for least-squares parameter estimation in nonlinear models is introduced in this section. This method includes three algorithms that require different levels of computational power to find the optimal parameter estimators. Importantly, it is not necessary to guess the initial value for each parameter in a nonlinear model using the method.
The n equations represented by Eq. (14) are called parameter solution equations. The unknown variables in these n equations are θ = [θ1, θ2, …, θn]. If the n equations represented by Eq. (14) only have one solution for one combination as θ(s) = [θ1(s), θ2(s), …, θn(s)], where s ∈ {1, 2, …, h}, there are h solutions for each parameter θj in total, where j ∈ {1, 2, …, n}. Of note, although y = f (x, θ) is a nonlinear model, the n equations represented by Eq. (14) may be linear equations or nonlinear equations that have analytical solutions. Numerical methods, such as the bisection method, Brent’s method, and the Newton–Raphson method [19], could be employed to solve these n equations when analytical solutions are not available. If there are any bounds applied to parameter θj (e.g., θlb < θj < , where θlb and are the lower bound and upper bound of parameter θj), it is only necessary to find the solution of Eq. (14) for θj that satisfies these bounds.
It is proven in Appendix A that the optimal estimator for each parameter θj that minimizes the squared error S belongs to the interval [θjmid − Lj, θjmid + Lj] when the monotonic conditions and the symmetric condition defined by Eqs. (A8)–(A10) are satisfied. [θjmid − Lj, θjmid + Lj] is called the solution interval of parameter θj. Future research may relax these conditions given by Eqs. (A8)–(A10) in Appendix A and define a solution interval for the more general case or derive the solution interval on a case-by-case basis.
Here, three algorithms that require different levels of computational power are provided to find the optimal estimator for each parameter in a nonlinear model using the solutions of parameter solution equations, θj(s), and the solution interval [θjmid − Lj, θjmid + Lj] for each parameter θj. As the first algorithm, the optimal parameter estimators that minimize the squared error S could be found through an exhaustive search, also called grid search, with specified significant digits (e.g., four significant digits) in the solution interval of each parameter θj if abundant computational power is available. The second algorithm could be employed if exhaustive search is not feasible. For example, the solution intervals derived by Eqs. (15) and (16) may be extremely large because of outliers in the data set or numerical ill-conditioning in solving the parameter solution equations. The median of the h solutions of each parameter θj, θjmed could be found in that case. The median of parameter solutions, θmed = [θ1med, θ2med, …, θnmed], is supplied as the initial values to a currently available iterative least-squares local method (e.g., the Gauss–Newton method or the Levenberg–Marquardt method) to search the optimal estimator for each parameter.
It is necessary to solve the n equations represented by Eq. (14)h times using either algorithms mentioned above (i.e., exhaustive search in the solution interval or use the median of h parameter solutions as initial values of an iterative least-squares local method). In some cases, it may be expensive to solve these n equations h times. For example, a nonlinear model with 10 parameters is fit to 1000 data points (i.e., m = 1000, and n = 10). There are 2.63 × 1023 combinations to choose 10 data points from the 1000 data points (i.e., h = 2.63 × 1023). Enormous computational power is required to solve the 10 equations represented by Eq. (14) 2.63 × 1023 times if analytical solutions of these 10 equations are not available, and a numerical method has to be employed. The following stochastic algorithm, as the third algorithm, is an alternative to search for the optimal value for each parameter in a nonlinear model when there is not enough computational power available to solve these n equations h times.
A flowchart of the stochastic algorithm appears in Fig. 1. As defined earlier, there are h combinations to choose n data points from the m data points (n < m). Three combinations are chosen from the h combinations randomly without replacement. Each combination corresponds to one solution for each parameter θj through solving the n equations represented by Eq. (14), where j ∈ {1, 2, …, n}. A solution interval [aj(1), bj(1)] = [θjmid − Lj, θjmid + Lj] could be built for each parameter θj using these three solutions. The median of these three solutions is used as the initial value for each parameter θj of a specified iterative least-squares local method (e.g., the Gauss–Newton method or the Levenberg–Marquardt method) to search for the optimal parameter estimators. The search result for parameter θj is denoted by θj(1). If search result θj(1) for each parameter θj belongs to the corresponding solution interval [aj(1), bj(1)], θ(1) = [θ1(1), θ2(1), …, θn(1)] is the final parameter estimators. If any search result θj(1) does not belong to the corresponding solution interval [aj(1), bj(1)], another two combinations are then chosen from the (h-3) combinations randomly without replacement. A solution interval [aj(2), bj(2)] = [θjmid—Lj, θjmid + Lj] is built using all five available solutions for each parameter θj, and the median of the five solutions is used as the initial value for each parameter θj of the specified iterative least-squares local method. If search result θj(2) for each parameter θj belongs to the corresponding solution interval, [aj(2), bj(2)], θ(2) = [θ1(2), θ2(2), …, θn(2)] is the final parameter estimators. If any search result θj(2) does not belong to the corresponding solution interval [aj(2), bj(2)], another two combinations are then chosen from the remaining combinations randomly without replacement repeatedly until the new search result θj(t) for each parameter θj belongs to the corresponding solution interval [aj(t), bj(t)], where t < h/2. The convergence rate of the stochastic algorithm depends on the specified nonlinear model and the iterative least-squares local method employed in each iteration step. A case study using the stochastic algorithm could be found in Sec. 5.2 where Fick’s second law is fit to experimental data and the Levenberg–Marquardt method is employed in each iteration step. The stochastic algorithm reaches 100% convergence within five iteration steps, and the parameter solution equation represented by Eq. (14) is solved less than 0.2*h times in the case study. The convergence rate of the stochastic algorithm could be investigated through more case studies in future research.
As stated earlier, the solution interval method provides practitioners three algorithms to choose from based on available computational power. The first algorithm is more computationally expensive than the other two algorithms. However, the first algorithm is complete since exhaustive search in the solution interval of each parameter is performed. It is proven in Appendix A that the optimal estimator for each parameter belongs to its corresponding solution interval when the monotonic conditions and the symmetric condition defined by Eqs. (A8)–(A10) are satisfied. The soundness of the second algorithm (i.e., using median of solutions as initial value) and the third algorithm (i.e., the stochastic algorithm shown in Fig. 1) depends on whether a sound iterative local method is employed in these algorithms. Of note, the third algorithm is equivalent to the second algorithm when the stochastic algorithm does not converge before all h combinations are chosen. All h combinations have been chosen when the third algorithm terminates in the end, and h solutions are derived for each parameter. The median of the h parameter solutions is then taken as the initial value for the specified iterative local method, which is identical to the second algorithm.
4 Four Steps to Estimate Parameter Values in a Nonlinear Model
In Sec. 3, a generic least-squares method, called the solution interval method, is introduced to estimate parameter values in a nonlinear model. The method includes three algorithms that require different levels of computational power to find the optimal parameter estimators, and practitioners do not need to guess the initial value for each parameter in a nonlinear model using the method. In this section, the procedure to implement the solution interval method is summarized in four steps. These four steps are illustrated in Fig. 2, and the three algorithms appear as the two blocks on the right side of step 2 and step 3 and the block of step 4 in Fig. 2. Practitioners can follow these guidelines to choose an appropriate algorithm and find the optimal estimator for each parameter in a nonlinear model that minimizes the squared error of the fit.
Step 1—Data collection and model selection. Practitioners first collect data (i.e., m data points) from scientific experiment, observational study, simulation, or survey. An appropriate mathematical model with n unknown parameters (n < m) is selected from several candidate models to fit the data. If the mathematical model is linear, linear regression is applied to solve the parameter values in the model.
Step 2—Parameter solution equations derivation. If the chosen mathematical model is nonlinear, the expressions of n equations represented by Eq. (14) are derived from the specified nonlinear model. These n equations are then solved analytically. If analytic solutions are not available, a numerical method (e.g., the bisection method, Brent’s method, or the Newton–Raphson method) should be specified to solve these n equations. Practitioners evaluate whether there is abundant computational power to solve these n equations h times using the numerical method, where h is defined in Eq. (13). If limited computational power is available, practitioners should use the stochastic algorithm shown in Fig. 1 to estimate the value for each parameter in the nonlinear model.
Step 3—Solution interval construction. If there is abundant computational power to solve the n equations represented by Eq. (14)h times, n data points are chosen from the m data points exhaustively and plugged into these n equations, respectively. h solutions are derived for each parameter in the nonlinear model. These h solutions are sorted from the smallest solution to the largest solution, and a solution interval [θjmid − Lj, θjmid + Lj] is constructed for each parameter θj using Eqs. (15)–(16). Practitioners evaluate whether there is abundant computational power to perform an exhaustive search within the solution interval of each parameter to find the optimal estimator. If limited computational power is available, practitioners should use the median of these h solutions of each parameter as the initial value of an iterative least-squares local method (e.g., the Gauss–Newton method or the Levenberg–Marquardt method) to search for the optimal estimator for each parameter.
Step 4—Optimal parameter estimator search. If abundant computational power is available for exhaustive search in the solution interval of each parameter, practitioners specify the significant digits (e.g., four significant digits) required for final parameter estimators and perform an exhaustive search, also called grid search, in solution intervals to find the optimal estimator for each parameter in the nonlinear model that minimizes the squared error of the fit.
5 Case Studies Using Michaelis–Menten Model and Fick’s Second Law
To illustrate the four-step procedure outlined in Sec. 4, the Michaelis–Menten model and Fick’s second law are fit to experimental data sets, respectively, as two case studies in this section. These two case studies show that the optimal least-squares parameter estimators belong to the solution intervals defined in Sec. 3 (i.e., [θjmid − Lj, θjmid + Lj] for parameter θj), although it is difficult to prove analytically whether the Michaelis–Menten model and Fick’s second law satisfy the conditions defined by Eqs. (A8)–(A10). These optimal least-squares parameter estimators have also been found more efficiently using the solution interval method compared with using currently available methods (e.g., the linear transformation method and iterative least-squares local methods) in these case studies. These results validate the solution interval method and associated assumptions.
5.1 Fitting Michaelis–Menten Model to Experiment Data.
No. | Substrate concentration, x (parts per million) | Initial velocity, y (counts/min2) |
---|---|---|
1 | 0.02 | 47 |
2 | 0.02 | 76 |
3 | 0.06 | 97 |
4 | 0.06 | 107 |
5 | 0.11 | 123 |
6 | 0.11 | 139 |
7 | 0.22 | 152 |
8 | 0.22 | 159 |
9 | 0.56 | 191 |
10 | 0.56 | 201 |
11 | 1.10 | 200 |
12 | 1.10 | 207 |
No. | Substrate concentration, x (parts per million) | Initial velocity, y (counts/min2) |
---|---|---|
1 | 0.02 | 47 |
2 | 0.02 | 76 |
3 | 0.06 | 97 |
4 | 0.06 | 107 |
5 | 0.11 | 123 |
6 | 0.11 | 139 |
7 | 0.22 | 152 |
8 | 0.22 | 159 |
9 | 0.56 | 191 |
10 | 0.56 | 201 |
11 | 1.10 | 200 |
12 | 1.10 | 207 |
Although exhaustive search in solution intervals is feasible in this case study, it is more efficient to find the optimal least-squares parameter estimators by using the median of the 60 solutions of each parameter. It is not necessary to employ the stochastic algorithm shown in Fig. 1 in this case study because Eqs. (19)–(20) have analytic solutions, and it is not expensive to solve Eqs. (19)–(20) 60 times. The median of the 60 solutions for parameter θ1, θ1med, is 213.7, and the median of the 60 solutions for parameter θ2, θ2med, is 0.06693. Of note, these medians are close to the optimal least-squares estimators for parameters θ1 and θ2 derived by exhaustive search. These two medians are supplied as the initial values of the Levenberg–Marquardt method, and the same optimal parameter estimators (i.e., θ1 = 212.7 and θ2 = 0.06412) are derived by the Levenberg–Marquardt method.
The effectiveness of the solution interval method also could be compared with extant iterative least-squares local methods. Here, the Levenberg–Marquardt method is employed to fit the Michaelis–Menten model to the experimental data set, and 1000 random values are chosen as initial guesses for parameters θ1 and θ2, respectively. The results show that only three out of the 1000 random initial guesses lead the Levenberg–Marquardt iterative algorithm to converge to the optimal parameter estimators (i.e., θ1 = 212.7 and θ2 = 0.06412). In other words, practitioners need to run the Levenberg–Marquardt iterative algorithm with a random initial guess 333 times on average to find the optimal parameter estimators. In practice, practitioners may run the Levenberg–Marquardt iterative algorithm with a random initial guess much more than 333 times to check whether the best available parameter estimators are the optimal estimators. The curve fitting results using the Levenberg–Marquardt method with six different initial guesses are shown in Table 2. These results in Table 2 indicate that the estimators of parameters θ1 and θ2 derived through the Levenberg–Marquardt method are sensitive to the initial guess. The Levenberg–Marquardt iterative algorithm converges to the optimal parameter estimators (i.e., θ1 = 212.7 and θ2 = 0.06412) only when the initial guesses of the parameters are close to their optimal estimators. In contrast, the solution interval method is less computationally expensive than the Levenberg–Marquardt method with random initial guesses in this case study. Practitioners only need to solve the linear Eqs. (19)–(20) 60 times and take the median of the 60 solutions using the solution interval method. As stated earlier, the optimal parameter estimators can be found when the median of parameter solutions provided by the solution interval method is used as the initial value for each parameter.
No. | Initial guess | Solution | RSS | R2 | ||
---|---|---|---|---|---|---|
θ10 | θ20 | θ1 | θ2 | |||
1 | 1.0 | 1.0 | −3.884 × 108 | −1.583 × 106 | 80272 | −17.75 |
2 | 1.0 | 0.1 | 25.98 | −0.4894 | 181002 | −41.30 |
3 | 10 | 0.1 | 68.16 | −0.03782 | 130086 | −29.40 |
4 | 10 | 1.0 | 68.15 | −0.03782 | 130086 | −29.40 |
5 | 100 | 0.1 | 212.7 | 0.06412 | 1195 | 0.7206 |
6 | 213.7 | 0.06693 | 212.7 | 0.06412 | 1195 | 0.7206 |
No. | Initial guess | Solution | RSS | R2 | ||
---|---|---|---|---|---|---|
θ10 | θ20 | θ1 | θ2 | |||
1 | 1.0 | 1.0 | −3.884 × 108 | −1.583 × 106 | 80272 | −17.75 |
2 | 1.0 | 0.1 | 25.98 | −0.4894 | 181002 | −41.30 |
3 | 10 | 0.1 | 68.16 | −0.03782 | 130086 | −29.40 |
4 | 10 | 1.0 | 68.15 | −0.03782 | 130086 | −29.40 |
5 | 100 | 0.1 | 212.7 | 0.06412 | 1195 | 0.7206 |
6 | 213.7 | 0.06693 | 212.7 | 0.06412 | 1195 | 0.7206 |
5.2 Fitting Fick’s Second Law to Experiment Data.
The left bound of the solution interval could be truncated at D = 0 because mass diffusivity is always non-negative. Exhaustive search with four significant digits in this truncated solution interval (i.e., 0 ≤ D ≤ 5.947 × 10−7) yields the optimal least-squares estimator for the mass diffusivity as D = 6.414 × 10−8 mm2/s, which is the same result as exhaustive search in the wide range [0, 1] (i.e., 0 ≤ D ≤ 1), as shown in Fig. 8. The coefficient of determination, R2, for this result is 0.9555. The residual sum of squares, RSS, is 1.511 × 10−4.
It is expensive to perform an exhaustive search in the solution interval in this case study because Eq. (37) includes the summation of infinite terms. The median of the 43 solutions, Dmed = 5.662 × 10−8 mm2/s, could be supplied as the initial value of the Levenberg–Marquardt method, and the same optimal mass diffusivity estimator (i.e., D = 6.414 × 10−8 mm2/s) is derived by the Levenberg–Marquardt method. Like the case study using the Michaelis–Menten model in Sec. 5.1, the median is close to the optimal estimator for the mass diffusivity, D, derived by exhaustive search in this case study.
The stochastic algorithm shown in Fig. 1 could be employed to find the optimal estimator for the mass diffusivity in Eq. (37) more efficiently because the stochastic algorithm usually does not require solving Eq. (37) numerically 57 times. The Levenberg–Marquardt method is used as the iterative least-squares local method in the stochastic algorithm. In each iteration step, a new combination is chosen without replacement and plugged into Eq. (37) to solve D if Eq. (37) has no solution for the previous combination. To test the convergence of the stochastic algorithm, the stochastic algorithm shown in Fig. 1 has been performed 100 times, and the results show that the stochastic algorithm always converges to the optimal mass diffusivity estimator (i.e., D = 6.414 × 10−8 mm2/s). Specifically, the algorithm converges within one iteration step (i.e., using three solutions) 86 times, converges within two iteration steps (i.e., using five solutions) 98 times, and converges within five iteration steps (i.e., using 11 solutions) 100 times. The solution interval derived by five iteration steps using 11 solutions covers 51.36% of the full solution interval using all 43 solutions (i.e., −1.661 × 10−7 ≤ D ≤ 5.947 × 10−7) on average in this case study.
No. | Linear portion of the curve | Slope of the linear portion | Mass diffusivity (mm2/s) | RSS | R2 |
---|---|---|---|---|---|
1 | First 6 data points | 0.1298 | 1.655 × 10−7 | 6.596 × 10−4 | 0.8056 |
2 | First 7 data points | 0.1237 | 1.503 × 10−7 | 5.764 × 10−4 | 0.8301 |
3 | First 8 data points | 0.1168 | 1.340 × 10−7 | 4.837 × 10−4 | 0.8575 |
4 | First 9 data points | 0.1116 | 1.224 × 10−7 | 4.158 × 10−4 | 0.8775 |
5 | First 10 data points | 0.1081 | 1.148 × 10−7 | 3.717 × 10−4 | 0.8905 |
6 | First 11 data points | 0.1044 | 1.071 × 10−7 | 3.270 × 10−4 | 0.9036 |
7 | First 12 data points | 0.1000 | 9.824 × 10−8 | 2.773 × 10−4 | 0.9183 |
No. | Linear portion of the curve | Slope of the linear portion | Mass diffusivity (mm2/s) | RSS | R2 |
---|---|---|---|---|---|
1 | First 6 data points | 0.1298 | 1.655 × 10−7 | 6.596 × 10−4 | 0.8056 |
2 | First 7 data points | 0.1237 | 1.503 × 10−7 | 5.764 × 10−4 | 0.8301 |
3 | First 8 data points | 0.1168 | 1.340 × 10−7 | 4.837 × 10−4 | 0.8575 |
4 | First 9 data points | 0.1116 | 1.224 × 10−7 | 4.158 × 10−4 | 0.8775 |
5 | First 10 data points | 0.1081 | 1.148 × 10−7 | 3.717 × 10−4 | 0.8905 |
6 | First 11 data points | 0.1044 | 1.071 × 10−7 | 3.270 × 10−4 | 0.9036 |
7 | First 12 data points | 0.1000 | 9.824 × 10−8 | 2.773 × 10−4 | 0.9183 |
The effectiveness of the solution interval method also could be compared with extant iterative least-squares local methods. As an example, the Levenberg–Marquardt method is employed to fit Eq. (37) to the experimental data set, and 1000 random values are chosen as initial guesses for mass diffusivity, D. The last data point is employed as effective moisture equilibrium content, Mm [6]. The results show that none of the 1000 random initial guesses lead the Levenberg–Marquardt iterative algorithm to converge to the optimal parameter estimators (i.e., D = 6.414 × 10−8 mm2/s). The curve fitting results using the Levenberg–Marquardt method with nine different initial guesses are shown in Table 4. The Levenberg–Marquardt iterative algorithm converges to the optimal mass diffusivity estimator (i.e., D = 6.414 × 10−8 mm2/s) only when the initial guess is close to the optimal estimator. As shown in Fig. 8, the curve of RSS is flat when the value of the mass diffusivity is larger than 10−4 mm2/s, and the Levenberg–Marquardt iterative algorithm therefore cannot converge to the optimal mass diffusivity estimator if the initial guess is chosen in that flat region. In contrast, the solution interval method establishes a solution interval for the mass diffusivity, shown as the gray area in Fig. 8. The solution interval significantly reduces the search space for the optimal estimator of the mass diffusivity in this case study. As stated earlier, any of the three algorithms associated with the solution interval method (i.e., exhaustive search in solution interval, using median of solutions as the initial value, and the stochastic algorithm shown in Fig. 1) can find the optimal estimator of the mass diffusivity more effectively in this case study.
No. | Initial guess D0 | Solution D | RSS | R2 |
---|---|---|---|---|
1 | 1.0 | 1.0 | 4.789 × 10−3 | −0.4114 |
2 | 0.1 | 0.1 | 4.789 × 10−3 | −0.4114 |
3 | 0.01 | 0.01 | 4.789 × 10−3 | −0.4114 |
4 | 1 × 10−3 | 1 × 10−3 | 4.789 × 10−3 | −0.4114 |
5 | 1 × 10−4 | Non-convergence | N/A | N/A |
6 | 1 × 10−5 | Non-convergence | N/A | N/A |
7 | 1 × 10−6 | Non-convergence | N/A | N/A |
8 | 1 × 10−7 | 6.414 × 10−8 | 1.511 × 10−4 | 0.9555 |
9 | 5.662 × 10−8 | 6.414 × 10−8 | 1.511 × 10−4 | 0.9555 |
No. | Initial guess D0 | Solution D | RSS | R2 |
---|---|---|---|---|
1 | 1.0 | 1.0 | 4.789 × 10−3 | −0.4114 |
2 | 0.1 | 0.1 | 4.789 × 10−3 | −0.4114 |
3 | 0.01 | 0.01 | 4.789 × 10−3 | −0.4114 |
4 | 1 × 10−3 | 1 × 10−3 | 4.789 × 10−3 | −0.4114 |
5 | 1 × 10−4 | Non-convergence | N/A | N/A |
6 | 1 × 10−5 | Non-convergence | N/A | N/A |
7 | 1 × 10−6 | Non-convergence | N/A | N/A |
8 | 1 × 10−7 | 6.414 × 10−8 | 1.511 × 10−4 | 0.9555 |
9 | 5.662 × 10−8 | 6.414 × 10−8 | 1.511 × 10−4 | 0.9555 |
6 Conclusions and Future Work
A solution interval method is introduced for least-squares parameter estimation in nonlinear models. The novelty of this method is that practitioners do not need to guess the value for each parameter in a nonlinear model in order to initialize the method. The solution interval method includes three algorithms (i.e., blue blocks in Fig. 2) that require different levels of computational power to find the optimal parameter estimators. The method provides the empirical probability distribution of each parameter in a nonlinear model. A solution interval for each parameter in a nonlinear model is also constructed by the method. It is proved that the optimal estimator for each parameter in a nonlinear model that minimizes the squared error of the fit belongs to its corresponding solution interval under certain conditions. Four steps to implement the solution interval method in practice are outlined. The Michaelis–Menten model and Fick’s second law are fit to experimental data sets respectively as two case studies to illustrate these steps and validate the method.
Based on available computational power, practitioners can find the optimal least-squares parameter estimators in a specified nonlinear model efficiently using one of the three algorithms associated with the solution interval method. The solution intervals constructed by the method significantly reduce the search space for the optimal parameter estimators. The empirical probability distribution of each parameter is valuable for researchers to assess parameter uncertainty that pertains to curve fitting.
This work has limitations that provide opportunities for future research. As stated in Sec. 3, parameter solution equations represented by Eq. (14) are derived when the chosen data points are plugged into the nonlinear model. In this paper, it is assumed that these parameter solution equations only have one solution for each parameter. Future research may develop methodologies to find the optimal parameter estimators when these equations have more than one solution for each parameter. Moreover, the solution interval method presented in this paper provides an empirical probability distribution for each parameter in the nonlinear model. Future research could explore how to utilize these probability distributions. To assess parameter uncertainty, these probability distributions may be employed as prior distributions of a Bayesian approach. These probability distributions may be used to identify potential outliers in a data set as well. For example, if a data point leads to its corresponding parameter solution equation(s) which have no solution or the solution locates at the margin of the probability distribution for each parameter, practitioners need to check whether that data point is an outlier. Practitioners may also allocate the computational resource in an exhaustive search (i.e., the first algorithm in Sec. 3) based on the empirical probability distribution of each parameter in the nonlinear model when the solution interval is extremely large. For example, a finer grid may be applied in the high-density regions of the probability distribution for each parameter. In addition, the solution interval of each parameter is defined as [θjmid − Lj, θjmid + Lj] rather than [θjmin, θjmax] in this paper because it is proven in Appendix A that the optimal estimator for each parameter θj that minimizes the squared error of the fit belongs to the interval [θjmid − Lj, θjmid + Lj] when conditions given by Eqs. (A8)–(A10) are satisfied. However, two case studies in Sec. 5 show that the optimal parameter estimator for each parameter belongs to [θjmin, θjmax]. It may not be necessary to extend the solution interval from [θjmin, θjmax] to [θjmid − Lj, θjmid + Lj]. Future research may relax the conditions given by Eqs. (A8)–(A10), which are stronger than convexity for the least-squares error , and define a solution interval for the more general case or develop a procedure to derive the solution interval on a case-by-case basis. Lastly, both the number of unknown parameters in the nonlinear models and the number of data points are small (i.e., n < 3 and m < 60) in the Michaelis–Menten model case study and Fick’s second law case study in Sec. 5. The solution interval method presented in this paper may be applied to high-dimensional (i.e., large n) and big data (i.e., large m) engineering problems in future research.
Footnote
There is a typo in Ref. [47] for this equation. The summation should begin with j = 0 rather than j = 1.
Acknowledgment
This material is based upon a work supported by the Defense Advanced Research Projects Agency through cooperative agreement No. N66001-17-1-4064. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsor.