Data Fitting Using Excel.

The Sound Velocity Experiment as an example.

When students use Excel to draw a trendline to their data, they often are confused by how one can evaluate the quality of that fit and how one can introduce a different function for the fit. This page is to help you overcome the shortcomings. I will describe the procedure using data from one of the experiments: the sound velocity measurements.

Part 1: Fitting

Step 1: Enter the data. Spread sheet is shown below.

It is always helpful to label the data, so that you can keep track of your manipulations with minimal confusion. In the case of our example, there is a set of data for the time delay in the sound front arrival as a function of the distance between the microphone and the speaker:

Step 2Defining the model/formula for fitting. Since we are fitting to the linear equation:

distance = a + velocity*time or y = a + b*x                  (1)

our formula would have only two parameters, a and b, that will be used to "fit" the data using the Solver in Excel. We create a line where these parameters are listed. I used cells "D2" and "E2" to put the values for a and b, respectively.

Step 3Use your model with the first guess for parameter values (1 and 1 on the spreadsheet below) to compute a distance value at the time values given in the initial data. Create a chart that displays both the experimental and the model values of distance. Then you can start developing a good initial guess for parameter values used in the Solver.

Look at the screen snapshot below for an example. Notice that no number values were used in developing this formula.  Instead, the cell address was used.  Cell “A4” holds the time value used in this model to compute the model value for distance at time x = 0.00334 (s).  The other cell addresses contain the parameter values that help define the model.  The “$” sign in front of the column and row identifiers “fix” the cell reference to that specific location during copying and pasting the formula in a different location on the spread sheet.  (For example, if it were copied into the cell directly below, where the formula is typed in, the fixed cell values (with "$") will copy without change.  The “A4” cell, not fixed, will copy and change to reference cell “A5”, one row below.  Spatial relationships are maintained from one row to the next.

Once the model is defined and computes the distance value for initial time value, copy and paste the formula into the remaining cells of the column. It should calculate appropriate 'model' values correctly.

Step 4Compute the error between the model values and the data (true) values

The formula in the cell "D4" for error is entered as: " = C4 – B4". Copy and paste it to remaining cells

Step 5Compute the square of the error value

The formula for error squared in cell "E4" was typed in as:  "= D4^2"   This formula was copied into the remaining cells of that column.

Step 6Create a graph showing both data and model values.

You can use this to adjust model parameters to develop a good initial guess for the solver.

Step 7: Compute the sum of the squared error terms.

In the example, I placed in the "E21" cell as "=SUM(E4:E17)" and titled "sum Err sq" in "E20".

Step 8Use the Solver to minimize the sum of the squared error terms by varying the values of the model parameters. The Solver command is listed under Tools on the menu bar.  If the Solver is not already included in your Tools list, then you would have to Add it in.

When you open the Solver Tool, you get a box titled “Solver Parameters”:
- The first field must reference the value that you are trying to optimize – in this case the sum of the squared error terms, cell "$E$21" (if you set up your sheet like mine).
- The second field is a bubble entry for what you want to do – in this case "Min" - for MINIMIZE
- The third field identifies the cells that you want to adjust to minimize your target cell – these are the model parameters or cell $D$2 and $E$2.

Then, hit the Solve button - optimized fitting will appear.

You can use similar approach to introduce any other fitting function with more than two parameters that were used here.

Part 2: Evaluating Quality of Your Fit

As we saw above, upon minimization of the sum of errors squared, its value did not get to be zero. If you use the built in trend line option in Excel graphics, it characterizes the quality of the trendline by R2 value. That value does not tell you how accurate the slope or the intercept are. It is more appropriate to introduce the standard deviation for each of them, similar to the analysis of standard deviation in measurements for a single value. For details of general treatment I refer you to GNS and will give here only the treatment for the example above of a linear function.

In this case, the standard deviation of the slope, sb, and the standard deviation of the intercept, sa, can be calculated using standard deviation of the fit, s, and D, as shown in the equations below:

To implement these features, following additional steps are introduced.

Step 9Compute the square of the x value, Sum_x2, and the sum of x, Sumx:

The formula in cell F4 was typed in as:  "= A4^2" and copied into the remaining cells. Then in F21 cell Sum_x2 was calculated as "=SUM(F4:F17)". Similarly, in A21 cell Sumx was calculated as "=SUM(A4:A17)"

Step 10Compute D, and the standard deviations s, sa, and sb:

In cell C23 delta (D) was calculated as:  = $B$21*$F$21-$A$21^2. Standard deviations were calculated using formulas shown in red next to them.

The error for each parameter is obtained by multiplying its standard deviation by the student coefficient for this number of degrees of freedom (14-2=12), t ~ 2.2 in $C$27.

Thus, for the example given, the sound velocity is measured : v = 320.1 ± 1.0 m/s

On the screenshot I also included the trendline with its equation that Excel generates.