Application
Comparing Software Development Workloads
Estimating the cost of developing software in terms of work load is difficult since it is a challenge to quantify the size and complexity of a software system. The article Analysis of Size Metrics and Effort Performance Criterion in Software Cost Estimation provides an overview of different metrics used to assess size and complexity (Malathi & Sridhar, 2012). The metrics include counts of lines of code, function point counts, and operation counts. Function point counts are often utilized because they can be estimated based on project design specifications.
The dataset pointworkload.cvs contains data collected from 104 programming projects at AT&T between 1986 and 1991 (Matson & Huguenard, 2005). This dataset include number of work hours for each project, the function point count for each project, and identifiers for operating system, data management system, and programming language utilized. In this application, you will investigate whether operating system, data management system and programming language impact the number of work hours per function point for a project.
Open the dataset pointworkload.csv in Excel. Create a new column that calculates the number of work hours per function point for each project. Save the file with this new data column.
Next, you would want to look at the distribution of work hours per function point in a frequency diagram. Doing so in Excel requires either binning and counting the data yourself or installing the Data Analysis Toolpak Add-On. However, even with the add-on, simply getting a histogram requires multiple steps. Excel is designed for data presentation not for significant statistical analysis. It is capable of the statistical analysis but only with add-ons, macros, or programming. Instead of taking these steps, you will switch now to a software tool designed for statistical analysis, SPSS.
Go to the Resources section for Unit 4, and download the document IBM_SPSS_Installation_and_Registration_Instructions. This will guide you through the process of installing the statistical analysis platform SPSS which you will utilize for the remainder of this assignment.
- Import the file you revised in Excel to include work hours per function point into SPSS (be sure to tell it that yes there are variable names included at the top of your file) and take a screenshot showing your successful installation and import. This screen shot should be pasted into your overall document.
- ) In the top tool-bar, select Analyze, Descriptive Statistics, Frequencies. Put the work hours per function point variable you created in the Variable(s) column. Click Charts and select Histogram. Then,click Continue and OK. SPSS will now run the requested analysis. In the Output, scroll down to the histogram and copy-paste it into your overall document. Describe the distribution of the data. Does it appear to be normally distributed? What are the average and standard deviation? Are there any outliers?
Now, you are ready to determine whether operating system, data management system, or language impact the work hours per function point. To do this, you will utilize two different statistical tools. The t-test for difference in means between two independent samples and the analysis of variance.
- There are two different operating systems utilized. A 0 indicates UNIX, and a 1 indicates MVS. The t-test will allow you to assess the null hypothesis that the two operating systems give the same average work load per function point. Select Analyze, Compare Means, Independent-Samples T-Test. Your test variable is work hours per function point. Your grouping variable is OS. You will need to click Define Groups and make Group 1 = 0 (UNIX) and Group 2 = 1 (MVS). With these defined, click Continue and OK to get both the group statistics and the t-test results. Use the group statistics to calculate the t-value. Show all of your work for the calculation. For α=0.05, what is the p-value for the hypothesis? Based on this result, draw a conclusion as to whether or not the different operating systems result in a significant difference in work load per function point.
- By examining the t-test results from the previous question, you can see that both the t-statistic and the p-value are calculated there. You will be running several tests to determine if programming language impacts work load per function point, and you should draw your data from these charts rather than calculating by hand. Go back to your Independent-Samples T-Test and change the Grouping Variable to Language. Define the groups as 1 (Cobol) and 2 (PLI). Copy the t-test results to your overall document. Repeat this process for groups 1 (Cobol) and 3 (C), groups 1 (Cobol) and 4 (Other), groups 2 (PLI) and 3 (C), groups 2 (PLI) and 4 (Other), and groups 3 (C) and 4 (Other). Copy all six t-test results to your overall document. Based on these result, draw a conclusion as to whether or not the different programming languages result in a significant difference in work load per function point. Be sure to state the different null hypotheses considered and which are rejected and accepted at α=0.05.
- Running six different t-tests certainly answers the question of whether or not programming language effects work load per function point, but it is relatively time consuming to run and assess each of these results separately. Analysis of variance (ANOVA) allows this multiple group comparison. Go to Analyze, Compare Means, One-Way ANOVA. Select work hours per function point as your dependent variable and Language as factor then click OK. Copy the ANOVA table to your overall document. Explain what the ANOVA table tells you and what conclusions can be drawn.
- ANOVA has the down side that it only tells if some group is significantly different from some other group but does not identify those groups. You can obtain that information by adding a post hoc test to compare means. Go back to the One-Way ANOVA and click on Post Hoc. You will see numerous options. These are all different methods for comparing the groups. Each approaches the comparison differently. You will utilize the Tukey comparison here. Select Tukey then click Continue and OK. You will see both a comparison table and a table creating homogenous subsets. From this data you should be able to conclude that there is a significant difference between 1 (Cobol) and 2 (PLI). Copy these charts to your overall document and explain how that conclusion may be drawn. How does this compare to your t-test conclusions?
- Utilize t-test and/or ANOVA to determine the impact of database management system on work load per function point. The values are 1 (IDMS), 2 (IMS), 3 (INFORMIX), 4 (INGRESS), and 5 (Other). You should present your data, draw conclusions, and explain those conclusions.
Malathaim S. & Sridhar, S. (2012). Analysis of size effect metrics and effort performance criterion in software cost estimation. Indian Journal of Computer Science and Engineering, 3(1), pp. 24-31. Retrieved from https://www.ijcse.com/docs/INDJCSE12-03-01-101.pdf
Matson, J. E. & Huguenar, B. R. (2005). Evaluating aptness of a regression model. Journal of Statistics Education Data Archive. Retrieved from https://www.amstat.org/publications/jse/jse_data_archive.htm