While Excel is a popular tool for basic data analysis thanks to its ease of use and formulas, it is also a surprisingly powerful tool for statistical analysis – including multiple regression.
In this guide, we are going to run through an overview and steps taken to perform multiple regression in Excel.
We will cover:
- The basics of multiple regression
- How to perform multiple regression in Excel step by step
The basics of Multiple Regression analysis
For the purposes of this tutorial, a brief summary of regression analysis, or specifically multiple regression analysis is using statistics to find the relationship between two independent variables, and a dependent variable.
The actual math and statistics that goes into this is a whole article in itself, so rather than give a long overview, this article goes into depth, along with many many links to outside resources.
For the purposes of this example, we are going to use a relatively small dataset, looking at hotels. In this dataset we have three columns – hotel price, distance to the city centre, and the overall average review scores.
In this case, the review score is our dependent variable, and we are looking to see if the price of a hotel, along with its distance to the city centre has any impact on the overall review scores given (obviously not factoring in other variables around the quality of hotel).
Multiple Regression in Excel – Steps taken
Once we have our dataset, the first thing we need to do is open up the Data Analysis function. This isnt always enabled by default, so if you do not see this under the data tab, then perform the following steps:
Under file, open up options:
From here go to Add-Ins:
At the bottom select Manage Excel Add-Ins and press Go.
Next, tick the Analysis ToolPak option and press OK.
Now that we have Data Analysis enabled, select it on the far right of the Data tab of the ribbon, and then select Regression:
Now we need to select the data to use in our regression analysis. There are multiple fields on the below box that we will need to fill out:
Input Y Range is our dependant variable, so in the hotels example we will select all of the cells under the review score column.
Input X Range is our independent variables, so we need to select both of the columns we are working with here for price and distance. In this case we drag from cell B3 down to C12.
Finally, in terms of the outputs, we have a few options to work with here, most of which are optional, however for the output itself the easiest approach is normally just ticking the New Workbook option so it appears nice and clean on another page.
Once you have made your selections, click OK, and you will be taken to a new tab that looks something like this:
The most important initial value to look at here is the R Square value – in this case 0.335, meaning that only 34% of the review scores can be explained by the independent variables – so not the highest score in the world.
There are quite a few different outputs, and if you ticked some of the plot buttons in the inputs there are some nice visuals that can be generated as well. As mentioned earlier the Wiki link will provide a lot more information on how to interpret each and every statistic that is generated, but for the purposes of this guide the R Square is the initial figure to look at.
This sums up our guide on how to easily perform multiple regression analysis in Excel. When working with file sizes that aren’t too large for the software to handle (which does have its limits!) then Excel can be a far more powerful tool than you might think, eliminating the need to jump into specialised statistical software, or writing R or Python code.
For more handy guides on working with Excel, be sure to check out our Excel Tips page.