I’ve worked on this project quite a bit but just don’t understand it. I’m willing to negotiate compensation for doing this assignment and I would really appreciate it. It is a regression analysis using several variables and the relevant files are attached below.?

OPTION A: Gender and Racial differences in payments in federal jobs
Mr. Cartwright is the director of the Office of Personnel and Management (OPM).
One employee recently complained to him that even in Federal jobs there exist
unexplained salary differences between male and female employees and employees
of different racial background. Mr. Cartwright is of the opinion that educational and
other ?quality? measures are reasons of such differences and there is no gender or
race bias in the federal salaries. He has asked you, the brilliant analysts from MGS
3100 to create a regression model to prove or disprove his claim. He has supplied
you with the dataset salary_federal.xls which contains the salary and other
information about some randomly sampled federal employees whose names are
withheld for reasons of anonymity and to protect their privacy.
Your task is to create a regression model that contains at least FIVE (more would
be better, but five is the minimum necessary) explanatory variables (that
explain/predict) salary of these employees. From this regression model you have to
conclude whether you have found any evidence to support the claim of gender and
2

racial discrimination in federal payment structure or not. Description of the dataset
is provided at the end of this document.
What you need to do in your project as part of the data analysis (whether you
choose Option A or Option B)
1. Show descriptive statistics of all the variables. [to get some feel for the
data]. Descriptive statistics are average or mean, standard deviation,
maximum, minimum, etc.
3

2. Show relationships of each independent variable individually with the
dependent variable using scatter plots. Remember to correctly label
the horizontal and vertical axes in each diagram. [this is to get some
initial feeling about which variables are more related to the dependent
variables, and look for possible outliers or influence points] (In Excel:
Insert ? Charts ? Scatter)
3. Perform regression analysis to show overall model for predicting the
value of the dependent variable. (You might need to activate the
done that regression can be run by Data ? Analysis ? Regression). But
as we discussed in class, the first model that you estimate is most
likely not be your final model. You will need to drop (sometimes add as
well) bad variables from the model and reestimate the model again.
This is a crucial step. Do not ask me how many times you should
reestimate or how many models should be there ? there is no answer to
that. What I want to see indication of the fact that you have
understood the idea of how regression analysis is done through trial
and error of including and dropping of variables. So start with a
handful (much more than 5) of variables that can be expected to affect
your dependent variable. Then drop the ones that does not seem good
after you do your initial regression or regressions. This is an iterative

process and will take both time and patience. We can talk more about
this step during office hour.
4. Interpret the results and write a report using those interpretations
that conforms to the sample memo format.
Submittals:
Report – Single spaced one or two pages Word document (do not embed in Excel)
that conforms to the sample memo format and contains:
Introduction should contain
1. Why you are interested in the topic: the background of your project
2. What you are trying to predict ? your dependent variable.
3. Why you choose certain independent variables for your project. Why
do you think those independent or explanatory variables are going to
affect the dependent of explained variable? [Logical explanation is
required.] 4. How you collected the data (e.g., survey or from Web or from some
book) [You should clearly write the source of the data.] Analysis [do not insert any graph or diagram] 4

5. Findings from descriptive statistics and scatter plot [It does not have
to be thorough; write what variables are expected to have relationship
with the dependent variable.] 6. List of insignificant independent variables in a full model (i.e., a
model with all the independent variables)
7. Order of the dropped independent variables in a subsequent
regression analyses with reasoning of such order
8. Equation of the final model (i.e., a model with only significant
independent variable(s))
9. Performance of the final model. i.e., how good your model is? Look at
[Low R-squared does not mean low grade.] 10. Findings from the final model. i.e., interpretation of the coefficients of
the independent variables [Make sure that all the coefficients make
sense. If it does not, explain further how such odd coefficient can be
justified] Excel file
Your excel file should have raw data and outputs of all the regression analyses you
did. Each worksheet should be clearly named &quot;raw data,&quot; &quot;scatter plot,&quot; &quot;regression
1, (Full Model)&quot; &quot;regression 2,&quot; and so on? finishing with ?Final Regression?. But
note that I am not saying you will run three regressions only (Full Model,
Regression 2, and final model). You may need more depending on the iterative
process that I discussed in point 3 above.
[Example] Your report should be written clearly about what you want to do in the project,
what you have found, and how you have found. Your report should be easy to read
and understand without referring to your Excel file.