Sit718 real world analytics - analyse datasets by


Assignment assesses :

1: Apply the concepts of multivariate functions to summarise datasets.
2: Analyse datasets by interpreting model and function parameters of impor- tant families of multivariate functions.
3: Transform a real-life problem into a mathematical model.
4: Apply linear programming concepts to make optimal decisions.
5: Obtain optimal solutions for quantities that are either continuous or discrete.

Part A: Analysis of Journal Ranking Dataset Description:

The Dataset for this assignment is a modified version of Academic Journal Ranking Dataset. The Dataset comprises 13 features (variables) and a variable of interest.

The numeric variables are:
Journal ID (numbers 1-161 assigned to journal names)
Number of citations Impact Factor (IF) 5-years IF
ii Articles Half-life
Eigenfactor Art-i
Y: Rank ID (this ranks from 1 to 4; where 4 is the highest rank and 1 the lowest rank)
Number

The categoric variables are:
ISSN
Rank, the ranks are C, B, A, A*, where A* is the highest rank (corresponds to 4) and C the lowest rank (corresponds to 1)).

Tasks:

1. Understand the data
(i) Download the txt file from CloudDeakin and save it to your R working directory. You can also use the Excel file provided for further information about the data.

(ii) Assign the data to a matrix, e.g. using

(iii) Your variable of interest is Y -Rank ID. Generate a subset of 100 data, e.g. using:

(iv) Using scatterplots and histograms, report on the general relationship between each of the variables and your variable of interest Y . Include a plot and 1 or 2 sentences for each of the variables including the variable Y .

Note: In case you need to transform your data into numeric values use the following command in R;

After this, you'll need to reconstruct your matrix with the same dimensions as the orig- inal matrix, using for example the command in R,

2. Transform the data
(i) Choose any four from the variables, for example Journal ID, Number of citations, Impact Factor, and Half-life, or another combination that you like.

Make appropriate transformations to the variables (including Y) so that the values can be aggregated in order to predict the variable of interest. Assign your transformed data along with your transformed variable of interest to an array (it should be 100 rows and 5 columns). Save it to a txt file titled "name-transformed.txt" using

(ii) Briefly explain the general relationship between each of your transformed variables and the variable of interest (1- 2 sentences each).

3. Build models and investigate the importance of each variable.

(i) Download the AggWaFit R file (from CloudDeakin) to your working directory and load into the R workspace using,

(ii) Use the fitting functions to learn the parameters for

- Weighted arithmetic mean (WAM),
- Weighted power means (PM) with p = 0.5, and p = 2,
- Ordered weighted averaging function (OWA), and
- Choquet integral.

(iii) Include two tables in your report - one with the error measures, and one summarising the weights/parameters that were learned for your data.
(iv) Compare and interpret the data in your tables. Be sure to comment on:
(a) How good the model is,
(b) The importance of each of the variables,
(c) Any interaction between any of the variables (are they complementary or redundant?)
(d) better models favour higher or lower inputs (1-3 paragraphs).

4. Use your model for prediction.
(i) Using your best fitting model, predict the Rank with the following values:

Journal ID=110, Number of citations=116, IF=0.694, 5 yrs IF= 0.762, ii=0.132, Arti- cles=38, Half-life=5, Art-i=0.00175, Eigenfactor=0.894, Number=0.7.

Give your result and comment on whether you think it is reasonable. (1-2 sentences)
(ii) Comment generally on the ideal conditions (in terms of your 4 variables) under which a high Rank will occur. (1-2 sentences)

For this part, your submission, which can be submitted to the SIT718 Clouddeakin Dropbox, should include two files.

1. A report (created in any word processor), covering all of the items in above. With plots and tables it should only be 2 - 5 pages.
2. A data file named "name-transformed.txt" (where ‘name' is replaced with your name -you can use your surname or first name - just to help me distinguish them!).

Part B: Optimisation

garment factory produces T-shirts and Shorts for Coles supermarket stores in Victoria. Coles will accept all the production supplied by the garment factory. The production process includes cutting, sewing and packaging. The factory employs 22 workers in the cutting department, 50 workers in the sewing department and 12 workers in the packaging department. The factory works 8 hours a day (these are productive hours). There is a daily demand for at least 100 T-shirts. The table below gives the time requirements (in minutes) and profit per unit for the two garments.

 

minutes

per

unit

 

 

Cutting

Sewing

Packaging

Unit profit

($)

T-shirts

20

20

10

6

Shorts

10

50

10

10

a) Formulate a Linear Programming model to help the CEO of the factory determine the optimal daily production schedule.

b) Use the graphical method to find the optimum solution. Show the feasible region and the optimal solution on the graph. Annotate your graph. What is the optimum profit?

c) Find a range for the profit ($) of a shirt that can be changed without affecting the optimum solution obtained above.

food factory makes three types of cereals A, B, and C from a mix of several ingredients Oates, Raisins, Coconuts and Almonds. The cereals are produced in 2kg boxes. The following table provides details of the sales price per box of cereals and the production cost per ton (1000 kg) of cereals respectively.

 

Sales price per box

Production cost per ton

Cereal A

$2.50

$4.00

Cereal B

$2.00

$2.80

Cereal C

$3.50

$3.00

The following table provides the purchase price per ton of ingredients and the maximum availability of the ingredients in tons respectively.

Ingredients

Purchase price per ton

Maximum availability in tons

Oates

$100

10

Raisins

$80

5

Coconut

$120

2

Almonds

$200

2

The minimum daily demand (in boxes) for each cereal and the proportion of the Oates, raisins, coconut and almonds in each cereal is detailed in the following table.

 

 

Minimum demand (boxes)

proportion of

Oates

Raisins

Coconut

Almonds

Cereal A

1000

0.8

0.1

0.05

0.05

Cereal B

700

0.65

0.2

0.05

0.1

Cereal C

750

0.5

0.1

0.1

0.3

a) Let xij ≥ 0 be a decision variable that denotes the kg of ingredient i ∈ {Oates, Raisins, Coconut, Almonds} used to produce the Cereal j ∈ {A, B, C} (in boxes). Formulate a linear programming (LP) model to determine the optimal production mix of cereals and the associated amounts of ingredients that maximizes the profit, while satisfying the constraints.

b) Find the optimal solution using the IBM CPLEX software.

Attachment:- Data.zip

Request for Solution File

Ask an Expert for Answer!!
Dissertation: Sit718 real world analytics - analyse datasets by
Reference No:- TGS02583367

Expected delivery within 24 Hours