Data Collection Thedatasets used in this research are collected from two previous studies, Leonardmodel (1988) 6 and Moselhi et al., (2005) 8.
123 data points are generated bycombining these two datasets that can be considered sufficient data inquantification loss of productivity domain and Table 1 shows the distributionof the combined dataset. Table 1: Distribution of Combined Dataset Type of Projects Number of CO’s Value of Original Contract Value of Change Orders Original Estimated Hrs. Actual Hrs.
Cos Hrs. Electrical 37 $ 91,984,837 $ 42,530,607 1395330 2324107 447425 Mechanical 54 $ 168,183,744 $ 15,518,911 1815085 2878130 427145 Architectural 5 $ 6,410,000 $ 914,273 95280 128787 17116 Mech./Elec. 5 $ 30,552,000 $ 6,452,000 883430 1190742 143650 Civil 22 $ 42,538,755 $ 9,323,214 691136 1161878 190958 Grand Total 123 $ 339,669,337 $ 74,739,006 4880263 7683645 1226294 Research MethodologyThedeveloped model for data non-linear regression has several steps. First step isdata preprocessing and enhancement, then use the refined data for feeding into thedeveloped nonlinear regression model. The last step is to compare and report thegenerated results of the developed model with other existing models against acase study. Figure 4 shows the general overview of the developed model.
Figure 5: General Overview of Developed Model DataPreprocessing and EnhancementThecombined dataset has 14 unique parameters with diverse types and scales, namelytype of impact, type of work, original duration, actual duration, extendedduration, original estimated hours, earned hours, actual hours, number ofchange orders, frequency, change hours, schedule performance index, averagesize, and % of change orders. The values associated with these parameters arenot comparable since they are not aligned. Thus, the process of aligning thedataset starts off by reordering the big values in the dataset such as actualand original estimated hours. The pseudocode associated with the aligningprocess is as the following. Table 2: Algorithm for Pseudocode for Aligningthe Given Dataset input = dataset; int ratio = 100; int aspect_ratio = 1.25; m, n = input.size(); for (int i=0; i
25. This value is achieved by grid searchmethodology and is dependent to the given input dataset and by changing thegiven input, this value should be updated as well. Asa second step for enhancing our dataset, an augmented apriori-like algorithm isused to maximize the margin around the features especially the ones which areso close to each other in terms of value.
This algorithm firstly finds thelocal and global extremum values for scaling the records up. Then, assumesthat, there are arrows drawn from origin to the records with respect to theextremums. The functionality of these extremums for arrows is setting a knotthat nonlinearly bias them. In other words, the values will be mapped toanother space all the records are represented by arrows and knots.
Finding themaximum margins between these arrows is an easier job by solving the Jacobian matrix.After all these computation, some hanger values will be generated that theirtensor product with the original records will maximize their cartesian distanceand finally will help the regression algorithm to tune its parameters. Specifically,for the records with percentage values such as extended duration feature, thecentroid corresponding is computed to all the values along that feature. Forthis aim, the gaussian distribution approximation is utilized to find the beststatistical expectation and ideally set to zero and looking for the properstandard deviation (SD).
Finally, 0.25 is reached as mean value and 1.24 as theSD value. If it is assumed that each row of the dataset is a 14-D vector innon-cartesian space, then should be able to find its basis vector usingalgebraic theorem like Cholesky. The given rank of this factorization will giveus the degree of the nonlinear 6 degree of freedom (6-DOF) to be solved byJacobian.
Finally, each row is not consistent can be replaced to the mean ofthe rows, by the approx. 6-DOF polynomial. For the current dataset, thistechnique is applied for tuple 64, 57, 87, and 110. Nonlinear RegressionThereare several ways for finding a polynomial curve that represent the data assmooth as possible.
Though linear regression is a fast and accurate method forbalanced and normalized dataset as created in the previous section, itsfunctionality varies from dataset to dataset. The simple following rule is forthe processed dataset:Equation1Wheredenotesthe hypothesized line that we would like to achieve it and is the given input. Based on the achievedresults, the RMSE associated with this algorithm was about 21.32% which isquite high. The next step after linear regression was its nonlinearcounterpart. The common approach for handling nonlinear regression isapproximating it by piecewise linear function. In other words, since innonlinear regression the achieved function is no longer a line, thusnonlinearity will be implemented by several linear functions. Regarding ourimplementation, this approach will result in the RMSE value of 17.
34%.Thenonlinear regression can be articulated by formulating the nonlinearity withbunch of nonlinearity. Firstly, the dataset is patched into seven 2×2 patches(this is the total number of given features in the dataset) and assigned anonlinear sigmoid-like function on each of them. Formula (2) depicts thisnonlinear function.