Reliability analysis
Call: psych::alpha(x = podaci, check.keys = TRUE)
raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
0.73 0.79 0.75 0.32 3.7 0.0091 17 2.1 0.25
95% confidence boundaries
lower alpha upper
Feldt 0.67 0.73 0.78
Duhachek 0.71 0.73 0.75
Reliability if an item is dropped:
raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
KOL1 0.71 0.79 0.86 0.35 3.8 0.0104 0.038 0.27
KOL2 0.70 0.78 0.87 0.33 3.4 0.0104 0.043 0.27
TEST 0.72 0.76 0.94 0.31 3.2 0.0099 0.046 0.23
AKTIVNOST 0.72 0.80 0.89 0.36 3.9 0.0096 0.037 0.27
PROJEKT1 0.68 0.76 0.88 0.31 3.2 0.0112 0.035 0.26
PROJEKT2 0.67 0.75 0.87 0.30 2.9 0.0116 0.035 0.23
Procjena 0.73 0.78 0.96 0.34 3.6 0.0096 0.045 0.27
UKUPNO 0.65 0.68 0.68 0.23 2.1 0.0371 0.016 0.22
Item statistics
n raw.r std.r r.cor r.drop mean sd
KOL1 191 0.54 0.50 0.35 0.42 9.5 2.6
KOL2 191 0.59 0.58 0.47 0.48 10.2 2.4
TEST 191 0.59 0.65 0.58 0.54 4.8 1.2
AKTIVNOST 191 0.43 0.46 0.33 0.32 7.8 2.0
PROJEKT1 191 0.69 0.65 0.62 0.60 15.6 2.4
PROJEKT2 191 0.75 0.72 0.70 0.67 16.2 2.6
Procjena 191 0.44 0.53 0.41 0.39 2.4 1.0
UKUPNO 191 1.00 0.99 0.92 1.00 66.6 8.4
Prediktori
Response - Klasifikacija studenata na temelju broja bodova dobivenih na 1. projektu
Hiperparametri - za odabir optimalne kombinacije hiperparametara napravljen je cross-validation s 10 kutija pri čemu je na slučajni način isprobano 500 kombinacija hiperparametara mtry, min_n i trees. Najbolji model je odabran s obzirom na roc_auc metriku.
Prediktori
Response - Klasifikacija studenata na temelju broja bodova dobivenih na 1. projektu
Hiperparametri - za odabir optimalne kombinacije hiperparametara napravljen je cross-validation s 10 kutija pri čemu je na slučajni način isprobano 5000 kombinacija hiperparametara tree_depth, min_n i cost_complexity preko uzorkovanja na latinskoj hiperkocki. Najbolji model je odabran s obzirom na roc_auc metriku.
prob | KOL1 | KOL2 | TEST | AKTIVNOST | PROJEKT1 | PROJEKT2 | Procjena | UKUPNO |
---|---|---|---|---|---|---|---|---|
0% | 3.0 | 3.5 | 0.750 | 0 | 10.0 | 6.00 | 0.0 | 36.850 |
33% | 8.0 | 9.0 | 4.407 | 7 | 15.0 | 15.85 | 2.5 | 63.656 |
67% | 10.5 | 11.5 | 5.506 | 9 | 16.5 | 17.50 | 3.0 | 70.107 |
100% | 17.5 | 16.0 | 7.000 | 10 | 20.0 | 20.00 | 3.0 | 87.600 |
KOL1 | KOL2 | TEST | AKTIVNOST | PROJEKT1 | PROJEKT2 | Procjena | UKUPNO | |
---|---|---|---|---|---|---|---|---|
Min. : 3.000 | Min. : 3.50 | Min. :0.750 | Min. : 0.00 | Min. :10.00 | Min. : 6.00 | Min. :0.000 | Min. :36.85 | |
1st Qu.: 7.500 | 1st Qu.: 8.50 | 1st Qu.:4.105 | 1st Qu.: 7.00 | 1st Qu.:14.00 | 1st Qu.:15.00 | 1st Qu.:2.500 | 1st Qu.:61.65 | |
Median : 9.500 | Median :10.00 | Median :4.980 | Median : 8.00 | Median :15.50 | Median :16.50 | Median :3.000 | Median :66.55 | |
Mean : 9.516 | Mean :10.22 | Mean :4.763 | Mean : 7.77 | Mean :15.64 | Mean :16.22 | Mean :2.429 | Mean :66.57 | |
3rd Qu.:11.500 | 3rd Qu.:12.00 | 3rd Qu.:5.680 | 3rd Qu.:10.00 | 3rd Qu.:17.00 | 3rd Qu.:18.00 | 3rd Qu.:3.000 | 3rd Qu.:71.64 | |
Max. :17.500 | Max. :16.00 | Max. :7.000 | Max. :10.00 | Max. :20.00 | Max. :20.00 | Max. :3.000 | Max. :87.60 |
# A tibble: 20 × 8
trees min_n .metric .estimator mean n std_err .config
<int> <int> <chr> <chr> <dbl> <int> <dbl> <chr>
1 601 5 roc_auc hand_till 0.483 10 0.0261 Preprocessor1_Model152
2 1081 25 roc_auc hand_till 0.479 10 0.0373 Preprocessor1_Model270
3 917 31 roc_auc hand_till 0.478 10 0.0390 Preprocessor1_Model001
4 540 37 roc_auc hand_till 0.478 10 0.0403 Preprocessor1_Model433
5 506 23 roc_auc hand_till 0.478 10 0.0355 Preprocessor1_Model400
6 424 2 roc_auc hand_till 0.476 10 0.0284 Preprocessor1_Model349
7 1082 25 roc_auc hand_till 0.476 10 0.0318 Preprocessor1_Model478
8 1046 24 roc_auc hand_till 0.475 10 0.0334 Preprocessor1_Model344
9 1243 30 roc_auc hand_till 0.475 10 0.0381 Preprocessor1_Model002
10 627 4 roc_auc hand_till 0.474 10 0.0223 Preprocessor1_Model169
11 816 17 roc_auc hand_till 0.474 10 0.0283 Preprocessor1_Model303
12 842 3 roc_auc hand_till 0.474 10 0.0254 Preprocessor1_Model062
13 609 4 roc_auc hand_till 0.474 10 0.0267 Preprocessor1_Model133
14 868 7 roc_auc hand_till 0.474 10 0.0248 Preprocessor1_Model126
15 435 4 roc_auc hand_till 0.473 10 0.0250 Preprocessor1_Model165
16 802 26 roc_auc hand_till 0.473 10 0.0352 Preprocessor1_Model462
17 657 32 roc_auc hand_till 0.473 10 0.0421 Preprocessor1_Model281
18 1081 37 roc_auc hand_till 0.473 10 0.0417 Preprocessor1_Model189
19 982 4 roc_auc hand_till 0.473 10 0.0257 Preprocessor1_Model283
20 1345 18 roc_auc hand_till 0.473 10 0.0320 Preprocessor1_Model257
# A tibble: 9 × 4
.metric .estimator trening test
<chr> <chr> <dbl> <dbl>
1 sens macro 1 0.383
2 precision macro 1 0.402
3 spec macro 1 0.691
4 accuracy multiclass 1 0.449
5 f_meas macro 1 0.369
6 mcc multiclass 1 0.0871
7 kap multiclass 1 0.0806
8 roc_auc hand_till 1.00 0.511
9 mn_log_loss multiclass 0.440 1.15
# A tibble: 20 × 9
cost_complexity tree_depth min_n .metric .estimator mean n std_err
<dbl> <int> <int> <chr> <chr> <dbl> <int> <dbl>
1 0.00984 13 27 roc_auc hand_till 0.579 10 0.0451
2 0.00801 11 27 roc_auc hand_till 0.579 10 0.0451
3 0.0105 8 27 roc_auc hand_till 0.579 10 0.0451
4 0.00924 7 27 roc_auc hand_till 0.579 10 0.0451
5 0.0108 9 31 roc_auc hand_till 0.576 10 0.0358
6 0.00927 9 31 roc_auc hand_till 0.576 10 0.0358
7 0.00997 12 31 roc_auc hand_till 0.576 10 0.0358
8 0.00815 6 31 roc_auc hand_till 0.576 10 0.0358
9 0.0144 13 31 roc_auc hand_till 0.572 10 0.0354
10 0.00124 4 20 roc_auc hand_till 0.571 10 0.0466
11 0.0000175 4 20 roc_auc hand_till 0.571 10 0.0466
12 0.000280 4 20 roc_auc hand_till 0.571 10 0.0466
13 0.000150 4 20 roc_auc hand_till 0.571 10 0.0466
14 0.000784 4 20 roc_auc hand_till 0.571 10 0.0466
15 0.000407 4 20 roc_auc hand_till 0.571 10 0.0466
16 0.0000147 4 20 roc_auc hand_till 0.571 10 0.0466
17 0.00159 4 20 roc_auc hand_till 0.571 10 0.0466
18 0.0103 7 37 roc_auc hand_till 0.571 10 0.0368
19 0.0122 8 37 roc_auc hand_till 0.571 10 0.0368
20 0.0129 15 37 roc_auc hand_till 0.571 10 0.0368
# ℹ 1 more variable: .config <chr>
# A tibble: 9 × 4
.metric .estimator trening test
<chr> <chr> <dbl> <dbl>
1 sens macro 0.505 0.377
2 precision macro 0.569 0.383
3 spec macro 0.754 0.674
4 accuracy multiclass 0.556 0.388
5 f_meas macro 0.511 0.379
6 mcc multiclass 0.280 0.0349
7 kap multiclass 0.273 0.0348
8 roc_auc hand_till 0.707 0.552
9 mn_log_loss multiclass 0.920 1.84
n= 142
node), split, n, loss, yval, (yprob)
* denotes terminal node
1) root 142 77 1 (0.45774648 0.21830986 0.32394366)
2) TEST< 3.81 31 9 1 (0.70967742 0.12903226 0.16129032) *
3) TEST>=3.81 111 68 1 (0.38738739 0.24324324 0.36936937)
6) KOL2< 10.25 52 30 1 (0.42307692 0.32692308 0.25000000)
12) TEST>=4.79 29 16 2 (0.41379310 0.44827586 0.13793103)
24) AKTIVNOST< 8.835 16 8 1 (0.50000000 0.31250000 0.18750000) *
25) AKTIVNOST>=8.835 13 5 2 (0.30769231 0.61538462 0.07692308) *
13) TEST< 4.79 23 13 1 (0.43478261 0.17391304 0.39130435) *
7) KOL2>=10.25 59 31 3 (0.35593220 0.16949153 0.47457627)
14) TEST>=5.615 27 14 1 (0.48148148 0.11111111 0.40740741)
28) KOL1>=10.75 10 5 1 (0.50000000 0.30000000 0.20000000) *
29) KOL1< 10.75 17 8 3 (0.47058824 0.00000000 0.52941176) *
15) TEST< 5.615 32 15 3 (0.25000000 0.21875000 0.53125000) *
Call:
rpart::rpart(formula = ..y ~ ., data = data, cp = ~0.0098370397926362,
maxdepth = ~13, minsplit = min_rows(27, data))
n= 142
CP nsplit rel error xerror xstd
1 0.04545455 0 1.0000000 1.000000 0.07710227
2 0.02597403 2 0.9090909 1.155844 0.07485107
3 0.01298701 5 0.8311688 1.142857 0.07512826
4 0.00983704 6 0.8181818 1.116883 0.07563248
Variable importance
TEST KOL2 KOL1 AKTIVNOST
56 18 14 13
Node number 1: 142 observations, complexity param=0.04545455
predicted class=1 expected loss=0.5422535 P(node) =1
class counts: 65 31 46
probabilities: 0.458 0.218 0.324
left son=2 (31 obs) right son=3 (111 obs)
Primary splits:
TEST < 3.81 to the left, improve=3.8823180, (0 missing)
KOL1 < 9.25 to the left, improve=3.3350410, (0 missing)
KOL2 < 10.75 to the left, improve=3.2386980, (0 missing)
AKTIVNOST < 9.5 to the left, improve=0.8216996, (0 missing)
Surrogate splits:
AKTIVNOST < 2.5 to the left, agree=0.796, adj=0.065, (0 split)
Node number 2: 31 observations
predicted class=1 expected loss=0.2903226 P(node) =0.2183099
class counts: 22 4 5
probabilities: 0.710 0.129 0.161
Node number 3: 111 observations, complexity param=0.04545455
predicted class=1 expected loss=0.6126126 P(node) =0.7816901
class counts: 43 27 41
probabilities: 0.387 0.243 0.369
left son=6 (52 obs) right son=7 (59 obs)
Primary splits:
KOL2 < 10.25 to the left, improve=2.203642, (0 missing)
KOL1 < 9.25 to the left, improve=1.790227, (0 missing)
AKTIVNOST < 9.5 to the left, improve=1.225692, (0 missing)
TEST < 4.435 to the right, improve=1.087152, (0 missing)
Surrogate splits:
TEST < 5.055 to the left, agree=0.658, adj=0.269, (0 split)
KOL1 < 7.25 to the left, agree=0.568, adj=0.077, (0 split)
AKTIVNOST < 6.5 to the left, agree=0.568, adj=0.077, (0 split)
Node number 6: 52 observations, complexity param=0.02597403
predicted class=1 expected loss=0.5769231 P(node) =0.3661972
class counts: 22 17 13
probabilities: 0.423 0.327 0.250
left son=12 (29 obs) right son=13 (23 obs)
Primary splits:
TEST < 4.79 to the right, improve=1.7946600, (0 missing)
KOL1 < 10.75 to the left, improve=1.5643540, (0 missing)
AKTIVNOST < 9.5 to the left, improve=0.5512821, (0 missing)
KOL2 < 9.25 to the right, improve=0.5193841, (0 missing)
Surrogate splits:
KOL1 < 5.75 to the right, agree=0.577, adj=0.043, (0 split)
AKTIVNOST < 5.5 to the right, agree=0.577, adj=0.043, (0 split)
Node number 7: 59 observations, complexity param=0.02597403
predicted class=3 expected loss=0.5254237 P(node) =0.415493
class counts: 21 10 28
probabilities: 0.356 0.169 0.475
left son=14 (27 obs) right son=15 (32 obs)
Primary splits:
TEST < 5.615 to the right, improve=1.1789470, (0 missing)
KOL2 < 12.75 to the right, improve=0.8257062, (0 missing)
AKTIVNOST < 9.5 to the left, improve=0.7188435, (0 missing)
KOL1 < 9.25 to the left, improve=0.6399919, (0 missing)
Surrogate splits:
KOL1 < 12.75 to the right, agree=0.593, adj=0.111, (0 split)
KOL2 < 13.25 to the right, agree=0.593, adj=0.111, (0 split)
AKTIVNOST < 7.335 to the right, agree=0.559, adj=0.037, (0 split)
Node number 12: 29 observations, complexity param=0.02597403
predicted class=2 expected loss=0.5517241 P(node) =0.2042254
class counts: 12 13 4
probabilities: 0.414 0.448 0.138
left son=24 (16 obs) right son=25 (13 obs)
Primary splits:
AKTIVNOST < 8.835 to the left, improve=1.0109420, (0 missing)
TEST < 5.115 to the left, improve=0.9551724, (0 missing)
KOL1 < 10.75 to the left, improve=0.7393829, (0 missing)
KOL2 < 9.25 to the left, improve=0.5669371, (0 missing)
Surrogate splits:
KOL1 < 9.75 to the left, agree=0.690, adj=0.308, (0 split)
TEST < 5.535 to the left, agree=0.655, adj=0.231, (0 split)
KOL2 < 9.25 to the left, agree=0.621, adj=0.154, (0 split)
Node number 13: 23 observations
predicted class=1 expected loss=0.5652174 P(node) =0.1619718
class counts: 10 4 9
probabilities: 0.435 0.174 0.391
Node number 14: 27 observations, complexity param=0.01298701
predicted class=1 expected loss=0.5185185 P(node) =0.1901408
class counts: 13 3 11
probabilities: 0.481 0.111 0.407
left son=28 (10 obs) right son=29 (17 obs)
Primary splits:
KOL1 < 10.75 to the right, improve=1.2553380, (0 missing)
TEST < 6.005 to the left, improve=1.1141610, (0 missing)
AKTIVNOST < 8 to the left, improve=0.3463805, (0 missing)
KOL2 < 11.75 to the left, improve=0.1481481, (0 missing)
Surrogate splits:
TEST < 5.735 to the left, agree=0.704, adj=0.2, (0 split)
AKTIVNOST < 6.5 to the left, agree=0.704, adj=0.2, (0 split)
Node number 15: 32 observations
predicted class=3 expected loss=0.46875 P(node) =0.2253521
class counts: 8 7 17
probabilities: 0.250 0.219 0.531
Node number 24: 16 observations
predicted class=1 expected loss=0.5 P(node) =0.1126761
class counts: 8 5 3
probabilities: 0.500 0.312 0.188
Node number 25: 13 observations
predicted class=2 expected loss=0.3846154 P(node) =0.0915493
class counts: 4 8 1
probabilities: 0.308 0.615 0.077
Node number 28: 10 observations
predicted class=1 expected loss=0.5 P(node) =0.07042254
class counts: 5 3 2
probabilities: 0.500 0.300 0.200
Node number 29: 17 observations
predicted class=3 expected loss=0.4705882 P(node) =0.1197183
class counts: 8 0 9
probabilities: 0.471 0.000 0.529
count ncat improve index adj
TEST 142 -1 3.8823180 3.810 0.00000000
KOL1 142 -1 3.3350405 9.250 0.00000000
KOL2 142 -1 3.2386978 10.750 0.00000000
AKTIVNOST 142 -1 0.8216996 9.500 0.00000000
AKTIVNOST 0 -1 0.7957746 2.500 0.06451613
KOL2 111 -1 2.2036424 10.250 0.00000000
KOL1 111 -1 1.7902266 9.250 0.00000000
AKTIVNOST 111 -1 1.2256924 9.500 0.00000000
TEST 111 1 1.0871524 4.435 0.00000000
TEST 0 -1 0.6576577 5.055 0.26923077
KOL1 0 -1 0.5675676 7.250 0.07692308
AKTIVNOST 0 -1 0.5675676 6.500 0.07692308
TEST 52 1 1.7946604 4.790 0.00000000
KOL1 52 -1 1.5643539 10.750 0.00000000
AKTIVNOST 52 -1 0.5512821 9.500 0.00000000
KOL2 52 1 0.5193841 9.250 0.00000000
KOL1 0 1 0.5769231 5.750 0.04347826
AKTIVNOST 0 1 0.5769231 5.500 0.04347826
AKTIVNOST 29 -1 1.0109416 8.835 0.00000000
TEST 29 -1 0.9551724 5.115 0.00000000
KOL1 29 -1 0.7393829 10.750 0.00000000
KOL2 29 -1 0.5669371 9.250 0.00000000
KOL1 0 -1 0.6896552 9.750 0.30769231
TEST 0 -1 0.6551724 5.535 0.23076923
KOL2 0 -1 0.6206897 9.250 0.15384615
TEST 59 1 1.1789470 5.615 0.00000000
KOL2 59 1 0.8257062 12.750 0.00000000
AKTIVNOST 59 -1 0.7188435 9.500 0.00000000
KOL1 59 -1 0.6399919 9.250 0.00000000
KOL1 0 1 0.5932203 12.750 0.11111111
KOL2 0 1 0.5932203 13.250 0.11111111
AKTIVNOST 0 1 0.5593220 7.335 0.03703704
KOL1 27 1 1.2553377 10.750 0.00000000
TEST 27 -1 1.1141612 6.005 0.00000000
AKTIVNOST 27 -1 0.3463805 8.000 0.00000000
KOL2 27 -1 0.1481481 11.750 0.00000000
TEST 0 -1 0.7037037 5.735 0.20000000
AKTIVNOST 0 -1 0.7037037 6.500 0.20000000
Prediktori
Response - Klasifikacija studenata na temelju broja bodova dobivenih na 2. projektu
Hiperparametri - za odabir optimalne kombinacije hiperparametara napravljen je cross-validation s 10 kutija pri čemu je na slučajni način isprobano 500 kombinacija hiperparametara mtry, min_n i trees. Najbolji model je odabran s obzirom na roc_auc metriku.
Prediktori
Prediktori
Response - Klasifikacija studenata na temelju broja bodova dobivenih na 2. projektu
Hiperparametri - za odabir optimalne kombinacije hiperparametara napravljen je cross-validation s 10 kutija pri čemu je na slučajni način isprobano 5000 kombinacija hiperparametara tree_depth, min_n i cost_complexity preko uzorkovanja na latinskoj hiperkocki. Najbolji model je odabran s obzirom na roc_auc metriku.
prob | KOL1 | KOL2 | TEST | AKTIVNOST | PROJEKT1 | PROJEKT2 | Procjena | UKUPNO |
---|---|---|---|---|---|---|---|---|
0% | 3.0 | 3.5 | 0.750 | 0 | 10.0 | 6.00 | 0.0 | 36.850 |
33% | 8.0 | 9.0 | 4.407 | 7 | 15.0 | 15.85 | 2.5 | 63.656 |
67% | 10.5 | 11.5 | 5.506 | 9 | 16.5 | 17.50 | 3.0 | 70.107 |
100% | 17.5 | 16.0 | 7.000 | 10 | 20.0 | 20.00 | 3.0 | 87.600 |
KOL1 | KOL2 | TEST | AKTIVNOST | PROJEKT1 | PROJEKT2 | Procjena | UKUPNO | |
---|---|---|---|---|---|---|---|---|
Min. : 3.000 | Min. : 3.50 | Min. :0.750 | Min. : 0.00 | Min. :10.00 | Min. : 6.00 | Min. :0.000 | Min. :36.85 | |
1st Qu.: 7.500 | 1st Qu.: 8.50 | 1st Qu.:4.105 | 1st Qu.: 7.00 | 1st Qu.:14.00 | 1st Qu.:15.00 | 1st Qu.:2.500 | 1st Qu.:61.65 | |
Median : 9.500 | Median :10.00 | Median :4.980 | Median : 8.00 | Median :15.50 | Median :16.50 | Median :3.000 | Median :66.55 | |
Mean : 9.516 | Mean :10.22 | Mean :4.763 | Mean : 7.77 | Mean :15.64 | Mean :16.22 | Mean :2.429 | Mean :66.57 | |
3rd Qu.:11.500 | 3rd Qu.:12.00 | 3rd Qu.:5.680 | 3rd Qu.:10.00 | 3rd Qu.:17.00 | 3rd Qu.:18.00 | 3rd Qu.:3.000 | 3rd Qu.:71.64 | |
Max. :17.500 | Max. :16.00 | Max. :7.000 | Max. :10.00 | Max. :20.00 | Max. :20.00 | Max. :3.000 | Max. :87.60 |
# A tibble: 20 × 9
mtry trees min_n .metric .estimator mean n std_err .config
<int> <int> <int> <chr> <chr> <dbl> <int> <dbl> <chr>
1 3 1182 23 roc_auc hand_till 0.805 10 0.0380 Preprocessor1_Model…
2 3 630 21 roc_auc hand_till 0.804 10 0.0370 Preprocessor1_Model…
3 3 1013 24 roc_auc hand_till 0.803 10 0.0372 Preprocessor1_Model…
4 3 1200 21 roc_auc hand_till 0.803 10 0.0379 Preprocessor1_Model…
5 3 819 23 roc_auc hand_till 0.802 10 0.0372 Preprocessor1_Model…
6 3 641 23 roc_auc hand_till 0.802 10 0.0373 Preprocessor1_Model…
7 3 1169 22 roc_auc hand_till 0.802 10 0.0375 Preprocessor1_Model…
8 3 1472 21 roc_auc hand_till 0.802 10 0.0377 Preprocessor1_Model…
9 3 1282 20 roc_auc hand_till 0.802 10 0.0388 Preprocessor1_Model…
10 3 1027 27 roc_auc hand_till 0.802 10 0.0378 Preprocessor1_Model…
11 3 413 29 roc_auc hand_till 0.802 10 0.0373 Preprocessor1_Model…
12 3 1348 20 roc_auc hand_till 0.802 10 0.0379 Preprocessor1_Model…
13 3 808 23 roc_auc hand_till 0.802 10 0.0384 Preprocessor1_Model…
14 3 542 39 roc_auc hand_till 0.802 10 0.0373 Preprocessor1_Model…
15 3 1069 40 roc_auc hand_till 0.801 10 0.0378 Preprocessor1_Model…
16 3 405 29 roc_auc hand_till 0.801 10 0.0394 Preprocessor1_Model…
17 3 1417 29 roc_auc hand_till 0.801 10 0.0393 Preprocessor1_Model…
18 3 409 20 roc_auc hand_till 0.801 10 0.0391 Preprocessor1_Model…
19 3 697 30 roc_auc hand_till 0.801 10 0.0373 Preprocessor1_Model…
20 3 632 23 roc_auc hand_till 0.801 10 0.0381 Preprocessor1_Model…
# A tibble: 9 × 4
.metric .estimator trening test
<chr> <chr> <dbl> <dbl>
1 sens macro 0.803 0.632
2 precision macro 0.829 0.625
3 spec macro 0.901 0.811
4 accuracy multiclass 0.804 0.625
5 f_meas macro 0.808 0.628
6 mcc multiclass 0.710 0.438
7 kap multiclass 0.705 0.438
8 roc_auc hand_till 0.955 0.823
9 mn_log_loss multiclass 0.570 0.761
# A tibble: 20 × 9
cost_complexity tree_depth min_n .metric .estimator mean n std_err
<dbl> <int> <int> <chr> <chr> <dbl> <int> <dbl>
1 2.01e- 5 6 8 roc_auc hand_till 0.826 10 0.0271
2 2.42e- 8 6 8 roc_auc hand_till 0.826 10 0.0271
3 1.04e- 5 6 8 roc_auc hand_till 0.826 10 0.0271
4 2.77e-10 6 8 roc_auc hand_till 0.826 10 0.0271
5 5.65e- 4 6 8 roc_auc hand_till 0.826 10 0.0271
6 1.64e- 7 6 8 roc_auc hand_till 0.826 10 0.0271
7 2.78e- 3 6 7 roc_auc hand_till 0.820 10 0.0356
8 2.05e- 8 6 7 roc_auc hand_till 0.820 10 0.0356
9 4.84e- 3 6 6 roc_auc hand_till 0.820 10 0.0356
10 2.37e- 4 6 7 roc_auc hand_till 0.820 10 0.0356
11 7.42e- 8 6 7 roc_auc hand_till 0.820 10 0.0356
12 3.04e-10 6 10 roc_auc hand_till 0.819 10 0.0277
13 1.10e-10 6 10 roc_auc hand_till 0.819 10 0.0277
14 4.11e-10 6 10 roc_auc hand_till 0.819 10 0.0277
15 3.78e- 5 6 10 roc_auc hand_till 0.819 10 0.0277
16 5.63e- 8 6 10 roc_auc hand_till 0.819 10 0.0277
17 2.15e- 9 6 10 roc_auc hand_till 0.819 10 0.0277
18 4.60e- 8 6 9 roc_auc hand_till 0.818 10 0.0285
19 2.72e- 4 6 9 roc_auc hand_till 0.818 10 0.0285
20 8.94e- 9 6 9 roc_auc hand_till 0.818 10 0.0285
# ℹ 1 more variable: .config <chr>
# A tibble: 9 × 4
.metric .estimator trening test
<chr> <chr> <dbl> <dbl>
1 sens macro 0.825 0.727
2 precision macro 0.837 0.781
3 spec macro 0.912 0.863
4 accuracy multiclass 0.825 0.729
5 f_meas macro 0.828 0.731
6 mcc multiclass 0.740 0.606
7 kap multiclass 0.737 0.592
8 roc_auc hand_till 0.934 0.804
9 mn_log_loss multiclass 0.448 2.80
n= 143
node), split, n, loss, yval, (yprob)
* denotes terminal node
1) root 143 92 2 (0.32867133 0.35664336 0.31468531)
2) PROJEKT1< 16.75 96 51 1 (0.46875000 0.41666667 0.11458333)
4) PROJEKT1< 12.5 15 2 1 (0.86666667 0.13333333 0.00000000) *
5) PROJEKT1>=12.5 81 43 2 (0.39506173 0.46913580 0.13580247)
10) TEST< 5.69 62 33 1 (0.46774194 0.35483871 0.17741935)
20) KOL2< 8.25 14 4 1 (0.71428571 0.28571429 0.00000000)
40) KOL1< 11.25 11 2 1 (0.81818182 0.18181818 0.00000000) *
41) KOL1>=11.25 3 1 2 (0.33333333 0.66666667 0.00000000) *
21) KOL2>=8.25 48 29 1 (0.39583333 0.37500000 0.22916667)
42) PROJEKT1< 15.75 35 17 1 (0.51428571 0.28571429 0.20000000)
84) PROJEKT1>=13.75 24 9 1 (0.62500000 0.16666667 0.20833333) *
85) PROJEKT1< 13.75 11 5 2 (0.27272727 0.54545455 0.18181818) *
43) PROJEKT1>=15.75 13 5 2 (0.07692308 0.61538462 0.30769231)
86) KOL2< 12.2 10 2 2 (0.10000000 0.80000000 0.10000000) *
87) KOL2>=12.2 3 0 3 (0.00000000 0.00000000 1.00000000) *
11) TEST>=5.69 19 3 2 (0.15789474 0.84210526 0.00000000)
22) KOL1>=10.75 10 3 2 (0.30000000 0.70000000 0.00000000)
44) KOL1< 12.25 4 1 1 (0.75000000 0.25000000 0.00000000) *
45) KOL1>=12.25 6 0 2 (0.00000000 1.00000000 0.00000000) *
23) KOL1< 10.75 9 0 2 (0.00000000 1.00000000 0.00000000) *
3) PROJEKT1>=16.75 47 13 3 (0.04255319 0.23404255 0.72340426)
6) PROJEKT1< 18.75 28 13 3 (0.07142857 0.39285714 0.53571429)
12) PROJEKT1>=17.25 12 1 2 (0.00000000 0.91666667 0.08333333) *
13) PROJEKT1< 17.25 16 2 3 (0.12500000 0.00000000 0.87500000) *
7) PROJEKT1>=18.75 19 0 3 (0.00000000 0.00000000 1.00000000) *
Call:
rpart::rpart(formula = ..y ~ ., data = data, cp = ~2.0128382857543e-05,
maxdepth = ~6, minsplit = min_rows(8, data))
n= 143
CP nsplit rel error xerror xstd
1 3.043478e-01 0 1.0000000 1.0000000 0.06226201
2 7.065217e-02 1 0.6956522 0.7608696 0.06497624
3 5.434783e-02 3 0.5543478 0.7934783 0.06497624
4 3.804348e-02 5 0.4456522 0.6521739 0.06414441
5 3.260870e-02 7 0.3695652 0.5978261 0.06323636
6 1.086957e-02 9 0.3043478 0.5543478 0.06226201
7 2.012838e-05 12 0.2717391 0.6086957 0.06344506
Variable importance
PROJEKT1 KOL2 TEST KOL1 AKTIVNOST
61 13 13 11 2
Node number 1: 143 observations, complexity param=0.3043478
predicted class=2 expected loss=0.6433566 P(node) =1
class counts: 47 51 45
probabilities: 0.329 0.357 0.315
left son=2 (96 obs) right son=3 (47 obs)
Primary splits:
PROJEKT1 < 16.75 to the left, improve=18.4789500, (0 missing)
KOL2 < 6.75 to the left, improve= 4.1720280, (0 missing)
TEST < 2.55 to the left, improve= 3.8231680, (0 missing)
KOL1 < 14.75 to the left, improve= 2.9006390, (0 missing)
AKTIVNOST < 5.5 to the left, improve= 0.7320712, (0 missing)
Surrogate splits:
KOL1 < 14.75 to the left, agree=0.699, adj=0.085, (0 split)
TEST < 6.355 to the left, agree=0.678, adj=0.021, (0 split)
Node number 2: 96 observations, complexity param=0.07065217
predicted class=1 expected loss=0.53125 P(node) =0.6713287
class counts: 45 40 11
probabilities: 0.469 0.417 0.115
left son=4 (15 obs) right son=5 (81 obs)
Primary splits:
PROJEKT1 < 12.5 to the left, improve=4.4754630, (0 missing)
TEST < 4.145 to the left, improve=3.8039810, (0 missing)
KOL2 < 8.25 to the left, improve=2.8735160, (0 missing)
KOL1 < 12.25 to the left, improve=2.0153110, (0 missing)
AKTIVNOST < 8.165 to the left, improve=0.3175259, (0 missing)
Surrogate splits:
TEST < 2.34 to the left, agree=0.875, adj=0.2, (0 split)
Node number 3: 47 observations, complexity param=0.05434783
predicted class=3 expected loss=0.2765957 P(node) =0.3286713
class counts: 2 11 34
probabilities: 0.043 0.234 0.723
left son=6 (28 obs) right son=7 (19 obs)
Primary splits:
PROJEKT1 < 18.75 to the left, improve=4.2446810, (0 missing)
KOL1 < 11.25 to the left, improve=2.9321810, (0 missing)
TEST < 5.11 to the left, improve=2.0487100, (0 missing)
KOL2 < 13.25 to the left, improve=0.9154126, (0 missing)
AKTIVNOST < 5.5 to the left, improve=0.8422418, (0 missing)
Surrogate splits:
KOL1 < 11.25 to the left, agree=0.745, adj=0.368, (0 split)
TEST < 5.44 to the left, agree=0.660, adj=0.158, (0 split)
KOL2 < 13.25 to the left, agree=0.638, adj=0.105, (0 split)
AKTIVNOST < 8 to the left, agree=0.638, adj=0.105, (0 split)
Node number 4: 15 observations
predicted class=1 expected loss=0.1333333 P(node) =0.1048951
class counts: 13 2 0
probabilities: 0.867 0.133 0.000
Node number 5: 81 observations, complexity param=0.07065217
predicted class=2 expected loss=0.5308642 P(node) =0.5664336
class counts: 32 38 11
probabilities: 0.395 0.469 0.136
left son=10 (62 obs) right son=11 (19 obs)
Primary splits:
TEST < 5.69 to the left, improve=5.306986, (0 missing)
KOL2 < 8.25 to the left, improve=2.562037, (0 missing)
KOL1 < 12.25 to the left, improve=1.884863, (0 missing)
PROJEKT1 < 15.25 to the left, improve=1.601743, (0 missing)
AKTIVNOST < 8.165 to the left, improve=1.030659, (0 missing)
Surrogate splits:
KOL1 < 12.75 to the left, agree=0.778, adj=0.053, (0 split)
Node number 6: 28 observations, complexity param=0.05434783
predicted class=3 expected loss=0.4642857 P(node) =0.1958042
class counts: 2 11 15
probabilities: 0.071 0.393 0.536
left son=12 (12 obs) right son=13 (16 obs)
Primary splits:
PROJEKT1 < 17.25 to the right, improve=10.1666700, (0 missing)
KOL1 < 11.25 to the left, improve= 1.7500000, (0 missing)
TEST < 6.005 to the left, improve= 1.7500000, (0 missing)
KOL2 < 10.25 to the left, improve= 0.5416667, (0 missing)
AKTIVNOST < 6.5 to the left, improve= 0.4523810, (0 missing)
Surrogate splits:
KOL2 < 8.75 to the left, agree=0.643, adj=0.167, (0 split)
TEST < 3.265 to the left, agree=0.607, adj=0.083, (0 split)
AKTIVNOST < 6.5 to the left, agree=0.607, adj=0.083, (0 split)
Node number 7: 19 observations
predicted class=3 expected loss=0 P(node) =0.1328671
class counts: 0 0 19
probabilities: 0.000 0.000 1.000
Node number 10: 62 observations, complexity param=0.03804348
predicted class=1 expected loss=0.5322581 P(node) =0.4335664
class counts: 29 22 11
probabilities: 0.468 0.355 0.177
left son=20 (14 obs) right son=21 (48 obs)
Primary splits:
KOL2 < 8.25 to the left, improve=1.7548000, (0 missing)
TEST < 5.07 to the right, improve=1.7045930, (0 missing)
PROJEKT1 < 15.25 to the left, improve=1.4819650, (0 missing)
AKTIVNOST < 8.5 to the left, improve=0.8536098, (0 missing)
KOL1 < 8.75 to the left, improve=0.5369511, (0 missing)
Node number 11: 19 observations, complexity param=0.01086957
predicted class=2 expected loss=0.1578947 P(node) =0.1328671
class counts: 3 16 0
probabilities: 0.158 0.842 0.000
left son=22 (10 obs) right son=23 (9 obs)
Primary splits:
KOL1 < 10.75 to the right, improve=0.85263160, (0 missing)
KOL2 < 9.75 to the left, improve=0.79548870, (0 missing)
TEST < 5.765 to the right, improve=0.25263160, (0 missing)
PROJEKT1 < 14.25 to the right, improve=0.25263160, (0 missing)
AKTIVNOST < 8.665 to the right, improve=0.02990431, (0 missing)
Surrogate splits:
TEST < 5.9 to the left, agree=0.684, adj=0.333, (0 split)
AKTIVNOST < 9.5 to the left, agree=0.684, adj=0.333, (0 split)
PROJEKT1 < 13.5 to the right, agree=0.632, adj=0.222, (0 split)
KOL2 < 8.25 to the left, agree=0.579, adj=0.111, (0 split)
Node number 12: 12 observations
predicted class=2 expected loss=0.08333333 P(node) =0.08391608
class counts: 0 11 1
probabilities: 0.000 0.917 0.083
Node number 13: 16 observations
predicted class=3 expected loss=0.125 P(node) =0.1118881
class counts: 2 0 14
probabilities: 0.125 0.000 0.875
Node number 20: 14 observations, complexity param=0.01086957
predicted class=1 expected loss=0.2857143 P(node) =0.0979021
class counts: 10 4 0
probabilities: 0.714 0.286 0.000
left son=40 (11 obs) right son=41 (3 obs)
Primary splits:
KOL1 < 11.25 to the left, improve=1.10822500, (0 missing)
TEST < 3.265 to the right, improve=1.10822500, (0 missing)
PROJEKT1 < 15.75 to the right, improve=0.91428570, (0 missing)
KOL2 < 6.75 to the left, improve=0.29761900, (0 missing)
AKTIVNOST < 8.5 to the left, improve=0.01731602, (0 missing)
Surrogate splits:
TEST < 3.265 to the right, agree=0.857, adj=0.333, (0 split)
Node number 21: 48 observations, complexity param=0.03804348
predicted class=1 expected loss=0.6041667 P(node) =0.3356643
class counts: 19 18 11
probabilities: 0.396 0.375 0.229
left son=42 (35 obs) right son=43 (13 obs)
Primary splits:
PROJEKT1 < 15.75 to the left, improve=2.9533880, (0 missing)
TEST < 5.04 to the right, improve=1.8958330, (0 missing)
KOL2 < 8.75 to the left, improve=0.7553717, (0 missing)
AKTIVNOST < 8.5 to the left, improve=0.6884092, (0 missing)
KOL1 < 8.5 to the left, improve=0.6083333, (0 missing)
Node number 22: 10 observations, complexity param=0.01086957
predicted class=2 expected loss=0.3 P(node) =0.06993007
class counts: 3 7 0
probabilities: 0.300 0.700 0.000
left son=44 (4 obs) right son=45 (6 obs)
Primary splits:
KOL1 < 12.25 to the left, improve=2.70000000, (0 missing)
KOL2 < 9.75 to the left, improve=1.15238100, (0 missing)
AKTIVNOST < 8.665 to the right, improve=0.20000000, (0 missing)
TEST < 5.84 to the right, improve=0.03333333, (0 missing)
PROJEKT1 < 15.25 to the left, improve=0.03333333, (0 missing)
Surrogate splits:
KOL2 < 9.75 to the left, agree=0.9, adj=0.75, (0 split)
Node number 23: 9 observations
predicted class=2 expected loss=0 P(node) =0.06293706
class counts: 0 9 0
probabilities: 0.000 1.000 0.000
Node number 40: 11 observations
predicted class=1 expected loss=0.1818182 P(node) =0.07692308
class counts: 9 2 0
probabilities: 0.818 0.182 0.000
Node number 41: 3 observations
predicted class=2 expected loss=0.3333333 P(node) =0.02097902
class counts: 1 2 0
probabilities: 0.333 0.667 0.000
Node number 42: 35 observations, complexity param=0.0326087
predicted class=1 expected loss=0.4857143 P(node) =0.2447552
class counts: 18 10 7
probabilities: 0.514 0.286 0.200
left son=84 (24 obs) right son=85 (11 obs)
Primary splits:
PROJEKT1 < 13.75 to the right, improve=2.0235930, (0 missing)
TEST < 5.2 to the right, improve=1.8057140, (0 missing)
AKTIVNOST < 8.5 to the left, improve=1.3248750, (0 missing)
KOL2 < 9.75 to the right, improve=0.9659774, (0 missing)
KOL1 < 10.75 to the right, improve=0.4258852, (0 missing)
Surrogate splits:
TEST < 3.32 to the right, agree=0.771, adj=0.273, (0 split)
Node number 43: 13 observations, complexity param=0.0326087
predicted class=2 expected loss=0.3846154 P(node) =0.09090909
class counts: 1 8 4
probabilities: 0.077 0.615 0.308
left son=86 (10 obs) right son=87 (3 obs)
Primary splits:
KOL2 < 12.2 to the left, improve=3.3692310, (0 missing)
KOL1 < 7.75 to the left, improve=1.4358970, (0 missing)
AKTIVNOST < 8 to the left, improve=1.1581200, (0 missing)
TEST < 3.62 to the right, improve=0.8358974, (0 missing)
PROJEKT1 < 16.25 to the left, improve=0.8358974, (0 missing)
Surrogate splits:
PROJEKT1 < 16.25 to the left, agree=0.846, adj=0.333, (0 split)
Node number 44: 4 observations
predicted class=1 expected loss=0.25 P(node) =0.02797203
class counts: 3 1 0
probabilities: 0.750 0.250 0.000
Node number 45: 6 observations
predicted class=2 expected loss=0 P(node) =0.04195804
class counts: 0 6 0
probabilities: 0.000 1.000 0.000
Node number 84: 24 observations
predicted class=1 expected loss=0.375 P(node) =0.1678322
class counts: 15 4 5
probabilities: 0.625 0.167 0.208
Node number 85: 11 observations
predicted class=2 expected loss=0.4545455 P(node) =0.07692308
class counts: 3 6 2
probabilities: 0.273 0.545 0.182
Node number 86: 10 observations
predicted class=2 expected loss=0.2 P(node) =0.06993007
class counts: 1 8 1
probabilities: 0.100 0.800 0.100
Node number 87: 3 observations
predicted class=3 expected loss=0 P(node) =0.02097902
class counts: 0 0 3
probabilities: 0.000 0.000 1.000
count ncat improve index adj
PROJEKT1 143 -1 18.47894969 16.750 0.00000000
KOL2 143 -1 4.17202797 6.750 0.00000000
TEST 143 -1 3.82316757 2.550 0.00000000
KOL1 143 -1 2.90063893 14.750 0.00000000
AKTIVNOST 143 -1 0.73207121 5.500 0.00000000
KOL1 0 -1 0.69930070 14.750 0.08510638
TEST 0 -1 0.67832168 6.355 0.02127660
PROJEKT1 96 -1 4.47546296 12.500 0.00000000
TEST 96 -1 3.80398056 4.145 0.00000000
KOL2 96 -1 2.87351556 8.250 0.00000000
KOL1 96 -1 2.01531124 12.250 0.00000000
AKTIVNOST 96 -1 0.31752587 8.165 0.00000000
TEST 0 -1 0.87500000 2.340 0.20000000
TEST 81 -1 5.30698610 5.690 0.00000000
KOL2 81 -1 2.56203704 8.250 0.00000000
KOL1 81 -1 1.88486312 12.250 0.00000000
PROJEKT1 81 -1 1.60174292 15.250 0.00000000
AKTIVNOST 81 -1 1.03065949 8.165 0.00000000
KOL1 0 -1 0.77777778 12.750 0.05263158
KOL2 62 -1 1.75480031 8.250 0.00000000
TEST 62 1 1.70459327 5.070 0.00000000
PROJEKT1 62 -1 1.48196481 15.250 0.00000000
AKTIVNOST 62 -1 0.85360983 8.500 0.00000000
KOL1 62 -1 0.53695113 8.750 0.00000000
KOL1 14 -1 1.10822511 11.250 0.00000000
TEST 14 1 1.10822511 3.265 0.00000000
PROJEKT1 14 1 0.91428571 15.750 0.00000000
KOL2 14 -1 0.29761905 6.750 0.00000000
AKTIVNOST 14 -1 0.01731602 8.500 0.00000000
TEST 0 1 0.85714286 3.265 0.33333333
PROJEKT1 48 -1 2.95338828 15.750 0.00000000
TEST 48 1 1.89583333 5.040 0.00000000
KOL2 48 -1 0.75537166 8.750 0.00000000
AKTIVNOST 48 -1 0.68840923 8.500 0.00000000
KOL1 48 -1 0.60833333 8.500 0.00000000
PROJEKT1 35 1 2.02359307 13.750 0.00000000
TEST 35 1 1.80571429 5.200 0.00000000
AKTIVNOST 35 -1 1.32487512 8.500 0.00000000
KOL2 35 1 0.96597744 9.750 0.00000000
KOL1 35 1 0.42588523 10.750 0.00000000
TEST 0 1 0.77142857 3.320 0.27272727
KOL2 13 -1 3.36923077 12.200 0.00000000
KOL1 13 -1 1.43589744 7.750 0.00000000
AKTIVNOST 13 -1 1.15811966 8.000 0.00000000
TEST 13 1 0.83589744 3.620 0.00000000
PROJEKT1 13 -1 0.83589744 16.250 0.00000000
PROJEKT1 0 -1 0.84615385 16.250 0.33333333
KOL1 19 1 0.85263158 10.750 0.00000000
KOL2 19 -1 0.79548872 9.750 0.00000000
TEST 19 1 0.25263158 5.765 0.00000000
PROJEKT1 19 1 0.25263158 14.250 0.00000000
AKTIVNOST 19 1 0.02990431 8.665 0.00000000
TEST 0 -1 0.68421053 5.900 0.33333333
AKTIVNOST 0 -1 0.68421053 9.500 0.33333333
PROJEKT1 0 1 0.63157895 13.500 0.22222222
KOL2 0 -1 0.57894737 8.250 0.11111111
KOL1 10 -1 2.70000000 12.250 0.00000000
KOL2 10 -1 1.15238095 9.750 0.00000000
AKTIVNOST 10 1 0.20000000 8.665 0.00000000
TEST 10 1 0.03333333 5.840 0.00000000
PROJEKT1 10 -1 0.03333333 15.250 0.00000000
KOL2 0 -1 0.90000000 9.750 0.75000000
PROJEKT1 47 -1 4.24468085 18.750 0.00000000
KOL1 47 -1 2.93218085 11.250 0.00000000
TEST 47 -1 2.04871016 5.110 0.00000000
KOL2 47 -1 0.91541256 13.250 0.00000000
AKTIVNOST 47 -1 0.84224183 5.500 0.00000000
KOL1 0 -1 0.74468085 11.250 0.36842105
TEST 0 -1 0.65957447 5.440 0.15789474
KOL2 0 -1 0.63829787 13.250 0.10526316
AKTIVNOST 0 -1 0.63829787 8.000 0.10526316
PROJEKT1 28 1 10.16666667 17.250 0.00000000
KOL1 28 -1 1.75000000 11.250 0.00000000
TEST 28 -1 1.75000000 6.005 0.00000000
KOL2 28 -1 0.54166667 10.250 0.00000000
AKTIVNOST 28 -1 0.45238095 6.500 0.00000000
KOL2 0 -1 0.64285714 8.750 0.16666667
TEST 0 -1 0.60714286 3.265 0.08333333
AKTIVNOST 0 -1 0.60714286 6.500 0.08333333
Prediktori
Response - Klasifikacija studenata na temelju broja bodova dobivenih na procjeni rješenja problema
Hiperparametri - za odabir optimalne kombinacije hiperparametara napravljen je cross-validation s 10 kutija pri čemu je na slučajni način isprobano 500 kombinacija hiperparametara mtry, min_n i trees. Najbolji model je odabran s obzirom na roc_auc metriku.
Prediktori
Response - Klasifikacija studenata na temelju broja bodova dobivenih na procjeni rješenja problema
Hiperparametri - za odabir optimalne kombinacije hiperparametara napravljen je cross-validation s 10 kutija pri čemu je na slučajni način isprobano 5000 kombinacija hiperparametara tree_depth, min_n i cost_complexity preko uzorkovanja na latinskoj hiperkocki. Najbolji model je odabran s obzirom na roc_auc metriku.
prob | KOL1 | KOL2 | TEST | AKTIVNOST | PROJEKT1 | PROJEKT2 | Procjena | UKUPNO |
---|---|---|---|---|---|---|---|---|
0% | 3.0 | 3.5 | 0.750 | 0 | 10.0 | 6.00 | 0.0 | 36.850 |
33% | 8.0 | 9.0 | 4.407 | 7 | 15.0 | 15.85 | 2.5 | 63.656 |
67% | 10.5 | 11.5 | 5.506 | 9 | 16.5 | 17.50 | 3.0 | 70.107 |
100% | 17.5 | 16.0 | 7.000 | 10 | 20.0 | 20.00 | 3.0 | 87.600 |
KOL1 | KOL2 | TEST | AKTIVNOST | PROJEKT1 | PROJEKT2 | Procjena | UKUPNO | |
---|---|---|---|---|---|---|---|---|
Min. : 3.000 | Min. : 3.50 | Min. :0.750 | Min. : 0.00 | Min. :10.00 | Min. : 6.00 | Min. :0.000 | Min. :36.85 | |
1st Qu.: 7.500 | 1st Qu.: 8.50 | 1st Qu.:4.105 | 1st Qu.: 7.00 | 1st Qu.:14.00 | 1st Qu.:15.00 | 1st Qu.:2.500 | 1st Qu.:61.65 | |
Median : 9.500 | Median :10.00 | Median :4.980 | Median : 8.00 | Median :15.50 | Median :16.50 | Median :3.000 | Median :66.55 | |
Mean : 9.516 | Mean :10.22 | Mean :4.763 | Mean : 7.77 | Mean :15.64 | Mean :16.22 | Mean :2.429 | Mean :66.57 | |
3rd Qu.:11.500 | 3rd Qu.:12.00 | 3rd Qu.:5.680 | 3rd Qu.:10.00 | 3rd Qu.:17.00 | 3rd Qu.:18.00 | 3rd Qu.:3.000 | 3rd Qu.:71.64 | |
Max. :17.500 | Max. :16.00 | Max. :7.000 | Max. :10.00 | Max. :20.00 | Max. :20.00 | Max. :3.000 | Max. :87.60 |
# A tibble: 20 × 9
mtry trees min_n .metric .estimator mean n std_err .config
<int> <int> <int> <chr> <chr> <dbl> <int> <dbl> <chr>
1 2 526 6 roc_auc hand_till 0.590 10 0.0345 Preprocessor1_Model…
2 2 1019 7 roc_auc hand_till 0.589 10 0.0307 Preprocessor1_Model…
3 2 599 11 roc_auc hand_till 0.586 10 0.0324 Preprocessor1_Model…
4 3 530 2 roc_auc hand_till 0.584 10 0.0338 Preprocessor1_Model…
5 3 1113 2 roc_auc hand_till 0.584 10 0.0318 Preprocessor1_Model…
6 2 1459 3 roc_auc hand_till 0.584 10 0.0353 Preprocessor1_Model…
7 3 428 10 roc_auc hand_till 0.583 10 0.0281 Preprocessor1_Model…
8 2 988 9 roc_auc hand_till 0.581 10 0.0290 Preprocessor1_Model…
9 2 554 3 roc_auc hand_till 0.581 10 0.0338 Preprocessor1_Model…
10 3 446 14 roc_auc hand_till 0.581 10 0.0295 Preprocessor1_Model…
11 2 1441 9 roc_auc hand_till 0.581 10 0.0321 Preprocessor1_Model…
12 3 1040 7 roc_auc hand_till 0.581 10 0.0294 Preprocessor1_Model…
13 2 649 14 roc_auc hand_till 0.581 10 0.0309 Preprocessor1_Model…
14 3 854 14 roc_auc hand_till 0.580 10 0.0308 Preprocessor1_Model…
15 2 1075 5 roc_auc hand_till 0.580 10 0.0349 Preprocessor1_Model…
16 2 1063 7 roc_auc hand_till 0.580 10 0.0294 Preprocessor1_Model…
17 3 1300 4 roc_auc hand_till 0.580 10 0.0337 Preprocessor1_Model…
18 3 573 12 roc_auc hand_till 0.579 10 0.0253 Preprocessor1_Model…
19 3 745 27 roc_auc hand_till 0.579 10 0.0326 Preprocessor1_Model…
20 3 1323 8 roc_auc hand_till 0.579 10 0.0326 Preprocessor1_Model…
# A tibble: 9 × 4
.metric .estimator trening test
<chr> <chr> <dbl> <dbl>
1 sens macro 0.965 0.389
2 precision macro 0.989 0.384
3 spec macro 0.982 0.711
4 accuracy multiclass 0.979 0.625
5 f_meas macro 0.977 0.354
6 mcc multiclass 0.963 0.192
7 kap multiclass 0.962 0.143
8 roc_auc hand_till 1.00 0.651
9 mn_log_loss multiclass 0.384 0.871
# A tibble: 20 × 9
cost_complexity tree_depth min_n .metric .estimator mean n std_err
<dbl> <int> <int> <chr> <chr> <dbl> <int> <dbl>
1 2.76e- 3 7 16 roc_auc hand_till 0.607 10 0.0313
2 1.62e- 9 7 16 roc_auc hand_till 0.607 10 0.0313
3 1.67e- 6 7 16 roc_auc hand_till 0.607 10 0.0313
4 4.49e-10 7 16 roc_auc hand_till 0.607 10 0.0313
5 1.90e- 8 7 16 roc_auc hand_till 0.607 10 0.0313
6 7.98e-10 7 16 roc_auc hand_till 0.607 10 0.0313
7 1.17e- 9 7 16 roc_auc hand_till 0.607 10 0.0313
8 3.87e- 3 7 16 roc_auc hand_till 0.607 10 0.0313
9 1.09e- 8 7 14 roc_auc hand_till 0.605 10 0.0313
10 2.37e- 8 7 15 roc_auc hand_till 0.605 10 0.0313
11 1.48e- 3 7 15 roc_auc hand_till 0.605 10 0.0313
12 3.05e- 7 7 14 roc_auc hand_till 0.605 10 0.0313
13 1.46e- 4 7 14 roc_auc hand_till 0.605 10 0.0313
14 8.42e- 9 7 14 roc_auc hand_till 0.605 10 0.0313
15 1.67e-10 7 15 roc_auc hand_till 0.605 10 0.0313
16 3.75e- 9 7 15 roc_auc hand_till 0.605 10 0.0313
17 4.00e- 5 7 14 roc_auc hand_till 0.605 10 0.0313
18 3.92e-10 7 14 roc_auc hand_till 0.605 10 0.0313
19 2.75e- 9 7 14 roc_auc hand_till 0.605 10 0.0313
20 3.69e- 4 7 14 roc_auc hand_till 0.605 10 0.0313
# ℹ 1 more variable: .config <chr>
# A tibble: 9 × 4
.metric .estimator trening test
<chr> <chr> <dbl> <dbl>
1 sens macro 0.585 0.454
2 precision macro 0.633 0.553
3 spec macro 0.804 0.737
4 accuracy multiclass 0.699 0.604
5 f_meas macro 0.600 0.474
6 mcc multiclass 0.432 0.224
7 kap multiclass 0.425 0.216
8 roc_auc hand_till 0.785 0.595
9 mn_log_loss multiclass 0.714 3.80
n= 143
node), split, n, loss, yval, (yprob)
* denotes terminal node
1) root 143 57 3 (0.20979021 0.18881119 0.60139860)
2) PROJEKT1< 12.5 7 1 1 (0.85714286 0.00000000 0.14285714) *
3) PROJEKT1>=12.5 136 51 3 (0.17647059 0.19852941 0.62500000)
6) KOL1< 5.75 7 2 2 (0.00000000 0.71428571 0.28571429) *
7) KOL1>=5.75 129 46 3 (0.18604651 0.17054264 0.64341085)
14) PROJEKT2< 13.5 12 7 2 (0.33333333 0.41666667 0.25000000) *
15) PROJEKT2>=13.5 117 37 3 (0.17094017 0.14529915 0.68376068)
30) AKTIVNOST< 6.5 25 13 3 (0.28000000 0.24000000 0.48000000)
60) KOL1>=10.75 13 7 1 (0.46153846 0.23076923 0.30769231) *
61) KOL1< 10.75 12 4 3 (0.08333333 0.25000000 0.66666667) *
31) AKTIVNOST>=6.5 92 24 3 (0.14130435 0.11956522 0.73913043)
62) PROJEKT2< 17.75 58 20 3 (0.22413793 0.12068966 0.65517241)
124) PROJEKT2>=17.25 5 2 2 (0.20000000 0.60000000 0.20000000) *
125) PROJEKT2< 17.25 53 16 3 (0.22641509 0.07547170 0.69811321) *
63) PROJEKT2>=17.75 34 4 3 (0.00000000 0.11764706 0.88235294) *
Call:
rpart::rpart(formula = ..y ~ ., data = data, cp = ~0.00276415414014326,
maxdepth = ~7, minsplit = min_rows(16, data))
n= 143
CP nsplit rel error xerror xstd
1 0.087719298 0 1.0000000 1.000000 0.1027173
2 0.052631579 1 0.9122807 1.000000 0.1027173
3 0.035087719 2 0.8596491 1.035088 0.1032816
4 0.017543860 3 0.8245614 1.157895 0.1045862
5 0.002764154 7 0.7543860 1.175439 0.1046891
Variable importance
PROJEKT2 PROJEKT1 KOL1 AKTIVNOST TEST KOL2
35 27 20 10 6 2
Node number 1: 143 observations, complexity param=0.0877193
predicted class=3 expected loss=0.3986014 P(node) =1
class counts: 30 27 86
probabilities: 0.210 0.189 0.601
left son=2 (7 obs) right son=3 (136 obs)
Primary splits:
PROJEKT1 < 12.5 to the left, improve=4.894414, (0 missing)
PROJEKT2 < 13.5 to the left, improve=4.112202, (0 missing)
KOL1 < 5.75 to the left, improve=3.089793, (0 missing)
KOL2 < 10.75 to the left, improve=2.556245, (0 missing)
TEST < 2.7 to the left, improve=2.541816, (0 missing)
Node number 2: 7 observations
predicted class=1 expected loss=0.1428571 P(node) =0.04895105
class counts: 6 0 1
probabilities: 0.857 0.000 0.143
Node number 3: 136 observations, complexity param=0.05263158
predicted class=3 expected loss=0.375 P(node) =0.951049
class counts: 24 27 85
probabilities: 0.176 0.199 0.625
left son=6 (7 obs) right son=7 (129 obs)
Primary splits:
KOL1 < 5.75 to the left, improve=3.042424, (0 missing)
PROJEKT2 < 13.5 to the left, improve=2.800917, (0 missing)
KOL2 < 10.75 to the left, improve=2.484342, (0 missing)
AKTIVNOST < 6.5 to the left, improve=2.049253, (0 missing)
PROJEKT1 < 13.25 to the left, improve=1.766807, (0 missing)
Node number 6: 7 observations
predicted class=2 expected loss=0.2857143 P(node) =0.04895105
class counts: 0 5 2
probabilities: 0.000 0.714 0.286
Node number 7: 129 observations, complexity param=0.03508772
predicted class=3 expected loss=0.3565891 P(node) =0.9020979
class counts: 24 22 83
probabilities: 0.186 0.171 0.643
left son=14 (12 obs) right son=15 (117 obs)
Primary splits:
PROJEKT2 < 13.5 to the left, improve=3.136255, (0 missing)
KOL2 < 10.75 to the left, improve=2.474065, (0 missing)
AKTIVNOST < 6.5 to the left, improve=2.035618, (0 missing)
KOL1 < 12.75 to the left, improve=1.974876, (0 missing)
TEST < 2.7 to the left, improve=1.918555, (0 missing)
Surrogate splits:
TEST < 2.47 to the left, agree=0.922, adj=0.167, (0 split)
AKTIVNOST < 2.5 to the left, agree=0.915, adj=0.083, (0 split)
Node number 14: 12 observations
predicted class=2 expected loss=0.5833333 P(node) =0.08391608
class counts: 4 5 3
probabilities: 0.333 0.417 0.250
Node number 15: 117 observations, complexity param=0.01754386
predicted class=3 expected loss=0.3162393 P(node) =0.8181818
class counts: 20 17 80
probabilities: 0.171 0.145 0.684
left son=30 (25 obs) right son=31 (92 obs)
Primary splits:
AKTIVNOST < 6.5 to the left, improve=1.983300, (0 missing)
KOL2 < 10.75 to the left, improve=1.876884, (0 missing)
KOL1 < 12.75 to the left, improve=1.504570, (0 missing)
PROJEKT2 < 17.75 to the left, improve=1.475624, (0 missing)
TEST < 3.975 to the left, improve=1.161803, (0 missing)
Node number 30: 25 observations, complexity param=0.01754386
predicted class=3 expected loss=0.52 P(node) =0.1748252
class counts: 7 6 12
probabilities: 0.280 0.240 0.480
left son=60 (13 obs) right son=61 (12 obs)
Primary splits:
KOL1 < 10.75 to the right, improve=1.698974, (0 missing)
KOL2 < 9.25 to the left, improve=1.411429, (0 missing)
TEST < 5.315 to the right, improve=1.225965, (0 missing)
PROJEKT2 < 18.25 to the right, improve=1.040000, (0 missing)
PROJEKT1 < 15.25 to the right, improve=0.840000, (0 missing)
Surrogate splits:
TEST < 4.745 to the right, agree=0.72, adj=0.417, (0 split)
PROJEKT2 < 18.25 to the right, agree=0.68, adj=0.333, (0 split)
KOL2 < 12.25 to the left, agree=0.64, adj=0.250, (0 split)
PROJEKT1 < 16.5 to the right, agree=0.64, adj=0.250, (0 split)
Node number 31: 92 observations, complexity param=0.01754386
predicted class=3 expected loss=0.2608696 P(node) =0.6433566
class counts: 13 11 68
probabilities: 0.141 0.120 0.739
left son=62 (58 obs) right son=63 (34 obs)
Primary splits:
PROJEKT2 < 17.75 to the left, improve=2.1833050, (0 missing)
KOL2 < 6.75 to the left, improve=1.8369570, (0 missing)
KOL1 < 12.75 to the left, improve=1.2783150, (0 missing)
PROJEKT1 < 16.25 to the left, improve=1.2344940, (0 missing)
TEST < 3.265 to the left, improve=0.9466204, (0 missing)
Surrogate splits:
PROJEKT1 < 16.75 to the left, agree=0.793, adj=0.441, (0 split)
TEST < 6.005 to the left, agree=0.663, adj=0.088, (0 split)
KOL2 < 13.25 to the left, agree=0.652, adj=0.059, (0 split)
KOL1 < 14.25 to the left, agree=0.641, adj=0.029, (0 split)
AKTIVNOST < 9.5 to the left, agree=0.641, adj=0.029, (0 split)
Node number 60: 13 observations
predicted class=1 expected loss=0.5384615 P(node) =0.09090909
class counts: 6 3 4
probabilities: 0.462 0.231 0.308
Node number 61: 12 observations
predicted class=3 expected loss=0.3333333 P(node) =0.08391608
class counts: 1 3 8
probabilities: 0.083 0.250 0.667
Node number 62: 58 observations, complexity param=0.01754386
predicted class=3 expected loss=0.3448276 P(node) =0.4055944
class counts: 13 7 38
probabilities: 0.224 0.121 0.655
left son=124 (5 obs) right son=125 (53 obs)
Primary splits:
PROJEKT2 < 17.25 to the right, improve=2.393884, (0 missing)
KOL2 < 10.25 to the left, improve=1.932641, (0 missing)
KOL1 < 12.75 to the left, improve=1.462475, (0 missing)
TEST < 3.265 to the left, improve=1.208979, (0 missing)
PROJEKT1 < 13.75 to the left, improve=0.709907, (0 missing)
Node number 63: 34 observations
predicted class=3 expected loss=0.1176471 P(node) =0.2377622
class counts: 0 4 30
probabilities: 0.000 0.118 0.882
Node number 124: 5 observations
predicted class=2 expected loss=0.4 P(node) =0.03496503
class counts: 1 3 1
probabilities: 0.200 0.600 0.200
Node number 125: 53 observations
predicted class=3 expected loss=0.3018868 P(node) =0.3706294
class counts: 12 4 37
probabilities: 0.226 0.075 0.698
count ncat improve index adj
PROJEKT1 143 -1 4.8944144 12.500 0.00000000
PROJEKT2 143 -1 4.1122015 13.500 0.00000000
KOL1 143 -1 3.0897926 5.750 0.00000000
KOL2 143 -1 2.5562446 10.750 0.00000000
TEST 143 -1 2.5418156 2.700 0.00000000
KOL1 136 -1 3.0424239 5.750 0.00000000
PROJEKT2 136 -1 2.8009171 13.500 0.00000000
KOL2 136 -1 2.4843424 10.750 0.00000000
AKTIVNOST 136 -1 2.0492530 6.500 0.00000000
PROJEKT1 136 -1 1.7668067 13.250 0.00000000
PROJEKT2 129 -1 3.1362552 13.500 0.00000000
KOL2 129 -1 2.4740655 10.750 0.00000000
AKTIVNOST 129 -1 2.0356184 6.500 0.00000000
KOL1 129 -1 1.9748760 12.750 0.00000000
TEST 129 -1 1.9185546 2.700 0.00000000
TEST 0 -1 0.9224806 2.470 0.16666667
AKTIVNOST 0 -1 0.9147287 2.500 0.08333333
AKTIVNOST 117 -1 1.9832999 6.500 0.00000000
KOL2 117 -1 1.8768840 10.750 0.00000000
KOL1 117 -1 1.5045699 12.750 0.00000000
PROJEKT2 117 -1 1.4756241 17.750 0.00000000
TEST 117 -1 1.1618028 3.975 0.00000000
KOL1 25 1 1.6989744 10.750 0.00000000
KOL2 25 -1 1.4114286 9.250 0.00000000
TEST 25 1 1.2259649 5.315 0.00000000
PROJEKT2 25 1 1.0400000 18.250 0.00000000
PROJEKT1 25 1 0.8400000 15.250 0.00000000
TEST 0 1 0.7200000 4.745 0.41666667
PROJEKT2 0 1 0.6800000 18.250 0.33333333
KOL2 0 -1 0.6400000 12.250 0.25000000
PROJEKT1 0 1 0.6400000 16.500 0.25000000
PROJEKT2 92 -1 2.1833054 17.750 0.00000000
KOL2 92 -1 1.8369565 6.750 0.00000000
KOL1 92 -1 1.2783145 12.750 0.00000000
PROJEKT1 92 -1 1.2344936 16.250 0.00000000
TEST 92 -1 0.9466204 3.265 0.00000000
PROJEKT1 0 -1 0.7934783 16.750 0.44117647
TEST 0 -1 0.6630435 6.005 0.08823529
KOL2 0 -1 0.6521739 13.250 0.05882353
KOL1 0 -1 0.6413043 14.250 0.02941176
AKTIVNOST 0 -1 0.6413043 9.500 0.02941176
PROJEKT2 58 1 2.3938842 17.250 0.00000000
KOL2 58 -1 1.9326412 10.250 0.00000000
KOL1 58 -1 1.4624746 12.750 0.00000000
TEST 58 -1 1.2089785 3.265 0.00000000
PROJEKT1 58 -1 0.7099070 13.750 0.00000000
---
title: "UIU - projekt2023"
output:
flexdashboard::flex_dashboard:
social: menu
orientation: columns
vertical_layout: fill
source_code: embed
---
```{css, echo=FALSE}
.sidebar { overflow: auto; }
.dataTables_scrollBody {
height:95% !important;
max-height:95% !important;
}
.chart-stage-flex {
overflow:auto !important;
}
```
```{r setup, include=FALSE}
library(psych)
library(tidyverse)
library(readxl)
library(tidymodels)
library(vip)
library(knitr)
library(kableExtra)
library(Boruta)
library(ggbump)
library(ggsankey)
library(ggridges)
library(corrplot)
library(ggsankey)
library(rpart.plot)
library(kableExtra)
library(DALEXtra)
rf_metrike <- metric_set(roc_auc, sens, precision, spec, accuracy, f_meas, mcc, kap, mn_log_loss)
update_geom_defaults(geom = "tile", new = list(color = "black"))
Zadavanje_fit <- readRDS("modeli/RF_fit_PROJEKT1.rds")
Zadavanje_tuning <- readRDS("modeli/RF_tuning_PROJEKT1.rds")
Zadavanje_work <- readRDS("modeli/RF_work_PROJEKT1.rds")
Zadavanje_permfit <- readRDS("modeli/RFperm_fit_PROJEKT1.rds")
Zadavanje_Boruta <- readRDS("modeli/Boruta_PROJEKT1.rds")
Zadavanje_fit_tree <- readRDS("modeli/stablo_fit_PROJEKT1.rds")
Zadavanje_tuning_tree <- readRDS("modeli/stablo_tuning_PROJEKT1.rds")
Zadavanje_work_tree <- readRDS("modeli/stablo_work_PROJEKT1.rds")
Rjesavanje_fit <- readRDS("modeli/RF_fit_PROJEKT2.rds")
Rjesavanje_tuning <- readRDS("modeli/RF_tuning_PROJEKT2.rds")
Rjesavanje_work <- readRDS("modeli/RF_work_PROJEKT2.rds")
Rjesavanje_permfit <- readRDS("modeli/RFperm_fit_PROJEKT2.rds")
Rjesavanje_Boruta <- readRDS("modeli/Boruta_PROJEKT2.rds")
Rjesavanje_fit_tree <- readRDS("modeli/stablo_fit_PROJEKT2.rds")
Rjesavanje_tuning_tree <- readRDS("modeli/stablo_tuning_PROJEKT2.rds")
Rjesavanje_work_tree <- readRDS("modeli/stablo_work_PROJEKT2.rds")
Procjena_fit <- readRDS("modeli/RF_fit_Procjena.rds")
Procjena_tuning <- readRDS("modeli/RF_tuning_Procjena.rds")
Procjena_work <- readRDS("modeli/RF_work_Procjena.rds")
Procjena_permfit <- readRDS("modeli/RFperm_fit_Procjena.rds")
Procjena_Boruta <- readRDS("modeli/Boruta_procjena.rds")
Procjena_fit_tree <- readRDS("modeli/stablo_fit_Procjena.rds")
Procjena_tuning_tree <- readRDS("modeli/stablo_tuning_Procjena.rds")
Procjena_work_tree <- readRDS("modeli/stablo_work_Procjena.rds")
podaci <- read_excel("podaci/UIU_projekt2023.xlsx")
podaci_kor <- podaci%>% select(KOL1:UKUPNO)
testRes <- cor.mtest(podaci_kor, conf.level = 0.975)
qvant <- podaci %>% reframe(across(KOL1:UKUPNO, ~quantile(., prob = c(0,0.33,0.67,1)))) %>%
mutate(prob = c('0%', '33%', '67%', '100%')) %>% relocate(prob)
podaci_sankey <- podaci %>% replace(is.na(.), 0) %>% select(KOL1,KOL2, PROJEKT1:Procjena) %>%
mutate(KOL1 = case_when(KOL1 <= pull(qvant[2, "KOL1"]) ~ "1",
KOL1 <= pull(qvant[3, "KOL1"]) ~ "2",
.default = "3"),
KOL2 = case_when(KOL2 <= pull(qvant[2, "KOL2"]) ~ "1",
KOL2 <= pull(qvant[3, "KOL2"]) ~ "2",
.default = "3"),
PROJEKT1 = case_when(PROJEKT1 <= pull(qvant[2, "PROJEKT1"]) ~ "1",
PROJEKT1 <= pull(qvant[3, "PROJEKT1"]) ~ "2",
.default = "3"),
PROJEKT2 = case_when(PROJEKT2 <= pull(qvant[2, "PROJEKT2"]) ~ "1",
PROJEKT2 <= pull(qvant[3, "PROJEKT2"]) ~ "2",
.default = "3"),
Procjena = case_when(Procjena < pull(qvant[2, "Procjena"]) ~ "1",
Procjena < pull(qvant[3, "Procjena"]) ~ "2",
.default = "3")) %>%
mutate_at(vars(KOL1:Procjena), ~fct_relevel(., c("1","2","3"))) %>%
make_long(KOL1, PROJEKT1, KOL2, PROJEKT2, Procjena)
postoci <- podaci_sankey %>% group_by(x, node) %>% summarize(n = n()) %>%
ungroup(node) %>% mutate(pct2 = n / sum(n) * 100, pct = round(pct2))
podaci_sankey <- podaci_sankey %>% left_join(postoci, by = c("x","node"))
podaci_long <- podaci_kor %>%
pivot_longer(everything(), names_to = "varijabla", values_to = "vrijednost")
UIU_blank <- data.frame(
varijabla = factor(rep(c("AKTIVNOST", "KOL1", "KOL2", "Procjena",
"PROJEKT1", "PROJEKT2", "TEST", "UKUPNO"), each = 2)),
x = c(0,10,0,20,0,20,0,3,0,20,0,20,0,10,0,100),
y = 0
)
trening <- readRDS("modeli/RF_work_Procjena.rds")
zadavanje_explainer <- explain_tidymodels(
Zadavanje_fit %>% extract_workflow(),
data = Zadavanje_work %>% select(KOL1:AKTIVNOST),
y = Zadavanje_work %>% select(PROJEKT1),
verbose = FALSE
)
pdp_zadavanje_all <- model_profile(
zadavanje_explainer,
variables = NULL,
N = NULL
)
cond_zadavanje_all <- model_profile(
zadavanje_explainer,
variables = NULL,
N = NULL,
type = "conditional"
)
acc_zadavanje_all <- model_profile(
zadavanje_explainer,
variables = NULL,
N = NULL,
type = "accumulated"
)
ob2_zadavanje <- predict_parts(
explainer = zadavanje_explainer,
new_observation = Zadavanje_work %>% select(KOL1:AKTIVNOST) %>% slice(2)
)
ob99_zadavanje <- predict_parts(
explainer = zadavanje_explainer,
new_observation = Zadavanje_work %>% select(KOL1:AKTIVNOST) %>% slice(99)
)
ob88_zadavanje <- predict_parts(
explainer = zadavanje_explainer,
new_observation = Zadavanje_work %>% select(KOL1:AKTIVNOST) %>% slice(88)
)
ob2_zadavanje_shap <- predict_parts(
explainer = zadavanje_explainer,
new_observation = Zadavanje_work %>% select(KOL1:AKTIVNOST) %>% slice(2),
type = "shap",
B = 20
)
ob99_zadavanje_shap <- predict_parts(
explainer = zadavanje_explainer,
new_observation = Zadavanje_work %>% select(KOL1:AKTIVNOST) %>% slice(99),
type = "shap",
B = 20
)
ob88_zadavanje_shap <- predict_parts(
explainer = zadavanje_explainer,
new_observation = Zadavanje_work %>% select(KOL1:AKTIVNOST) %>% slice(88),
type = "shap",
B = 20
)
krivulje_zadavanje <- as_tibble(pdp_zadavanje_all$agr_profiles) %>% mutate(vrsta = "PDP") %>%
bind_rows(as_tibble(cond_zadavanje_all$agr_profiles) %>% mutate(vrsta = "M")) %>%
bind_rows(as_tibble(acc_zadavanje_all$agr_profiles) %>% mutate(vrsta = "ALE"))
rjesavanje_explainer <- explain_tidymodels(
Rjesavanje_fit %>% extract_workflow(),
data = Rjesavanje_work %>% select(KOL1:PROJEKT1),
y = Rjesavanje_work %>% select(PROJEKT2),
verbose = FALSE
)
pdp_rjesavanje_all <- model_profile(
rjesavanje_explainer,
variables = NULL,
N = NULL
)
cond_rjesavanje_all <- model_profile(
rjesavanje_explainer,
variables = NULL,
N = NULL,
type = "conditional"
)
acc_rjesavanje_all <- model_profile(
rjesavanje_explainer,
variables = NULL,
N = NULL,
type = "accumulated"
)
ob2_rjesavanje <- predict_parts(
explainer = rjesavanje_explainer,
new_observation = Rjesavanje_work %>% select(KOL1:PROJEKT1) %>% slice(2)
)
ob99_rjesavanje <- predict_parts(
explainer = rjesavanje_explainer,
new_observation = Rjesavanje_work %>% select(KOL1:PROJEKT1) %>% slice(99)
)
ob88_rjesavanje <- predict_parts(
explainer = rjesavanje_explainer,
new_observation = Rjesavanje_work %>% select(KOL1:PROJEKT1) %>% slice(88)
)
ob2_rjesavanje_shap <- predict_parts(
explainer = rjesavanje_explainer,
new_observation = Rjesavanje_work %>% select(KOL1:PROJEKT1) %>% slice(2),
type = "shap",
B = 20
)
ob99_rjesavanje_shap <- predict_parts(
explainer = rjesavanje_explainer,
new_observation = Rjesavanje_work %>% select(KOL1:PROJEKT1) %>% slice(99),
type = "shap",
B = 20
)
ob88_rjesavanje_shap <- predict_parts(
explainer = rjesavanje_explainer,
new_observation = Rjesavanje_work %>% select(KOL1:PROJEKT1) %>% slice(88),
type = "shap",
B = 20
)
krivulje_rjesavanje <- as_tibble(pdp_rjesavanje_all$agr_profiles) %>% mutate(vrsta = "PDP") %>%
bind_rows(as_tibble(cond_rjesavanje_all$agr_profiles) %>% mutate(vrsta = "M")) %>%
bind_rows(as_tibble(acc_rjesavanje_all$agr_profiles) %>% mutate(vrsta = "ALE"))
procjena_explainer <- explain_tidymodels(
Procjena_fit %>% extract_workflow(),
data = Procjena_work %>% select(KOL1:PROJEKT2),
y = Procjena_work %>% select(Procjena),
verbose = FALSE
)
pdp_procjena_all <- model_profile(
procjena_explainer,
variables = NULL,
N = NULL
)
cond_procjena_all <- model_profile(
procjena_explainer,
variables = NULL,
N = NULL,
type = "conditional"
)
acc_procjena_all <- model_profile(
procjena_explainer,
variables = NULL,
N = NULL,
type = "accumulated"
)
ob2_procjena <- predict_parts(
explainer = procjena_explainer,
new_observation = Procjena_work %>% select(KOL1:PROJEKT2) %>% slice(2)
)
ob45_procjena <- predict_parts(
explainer = procjena_explainer,
new_observation = Procjena_work %>% select(KOL1:PROJEKT2) %>% slice(45)
)
ob88_procjena <- predict_parts(
explainer = procjena_explainer,
new_observation = Procjena_work %>% select(KOL1:PROJEKT2) %>% slice(88)
)
ob2_procjena_shap <- predict_parts(
explainer = procjena_explainer,
new_observation = Procjena_work %>% select(KOL1:PROJEKT2) %>% slice(2),
type = "shap",
B = 20
)
ob45_procjena_shap <- predict_parts(
explainer = procjena_explainer,
new_observation = Procjena_work %>% select(KOL1:PROJEKT2) %>% slice(45),
type = "shap",
B = 20
)
ob88_procjena_shap <- predict_parts(
explainer = procjena_explainer,
new_observation = Procjena_work %>% select(KOL1:PROJEKT2) %>% slice(88),
type = "shap",
B = 20
)
krivulje_procjena <- as_tibble(pdp_procjena_all$agr_profiles) %>% mutate(vrsta = "PDP") %>%
bind_rows(as_tibble(cond_procjena_all$agr_profiles) %>% mutate(vrsta = "M")) %>%
bind_rows(as_tibble(acc_procjena_all$agr_profiles) %>% mutate(vrsta = "ALE"))
```
# Distribucije varijabli {data-navmenu="DESKRIPTIVA"}
## Column 1
### Distribucije varijabli
```{r warning=FALSE, fig.width=18, fig.height=9}
ggplot(podaci_long, aes(x = vrijednost)) +
geom_histogram(aes(y=after_stat(density)),
color="#ec8ae5", fill="#48d09b", alpha=0.7) +
geom_density(alpha=.2, fill="yellow") + geom_blank(data = UIU_blank, aes(x=x,y=y)) +
scale_x_continuous(name = "bodovi") +
facet_wrap(vars(varijabla), scales = "free", ncol = 4) +
geom_rug(alpha = 0.3)
```
# Korelacije {data-navmenu="DESKRIPTIVA"}
## Column 1
### Korelacije
```{r warning=FALSE, fig.width=10, fig.height=7}
corrplot(round(cor(podaci), 2), type = "lower",
diag=FALSE, addCoef.col = 'black', tl.srt = 45)
```
## Column 2
### p-vrijednosti korelacija
```{r warning=FALSE, fig.width=10, fig.height=7}
corrplot(round(cor(podaci), 2), type = "lower",
diag=FALSE, tl.srt = 45, p.mat = testRes$p, insig = 'p-value', sig.level = -1)
```
# Sankey dijagram {data-navmenu="DESKRIPTIVA"}
## Columnn 1
### Sankey dijagram
```{r warning=FALSE, fig.width=10, fig.height=7}
ggplot(podaci_sankey, aes(x = x, next_x = next_x, node = node, next_node = next_node,
fill = factor(node),
label = paste0(node, ' (', pct, '%)'))) +
geom_sankey(flow.alpha = 0.5, node.color = "black", show.legend = FALSE) +
geom_sankey_label(size = 3, color = "black", fill= "white", hjust = -0.35) +
theme_bw() + theme_sankey(base_size = 16) +
theme(axis.title = element_blank(), axis.text.y = element_blank(),
axis.ticks = element_blank(), panel.grid = element_blank())
```
# Cronbach alpha {data-navmenu="DESKRIPTIVA"}
## Columnn 1
### Cronbach alpha
```{r}
psych::alpha(podaci, check.keys = TRUE)
```
# Opis modela {data-navmenu="PROJEKT1"}
## Column 1
### PROJEKT1 (slučajna šuma)
**Prediktori**
- TEST - suma svih bodova na testovima
- AKTIVNOST - bodovi na aktivnosti
- KOL1 - ukupni broj bodova na prvom kolokviju
- KOL2 - ukupni broj bodova na drugom kolokviju
**Response** - Klasifikacija studenata na temelju broja bodova dobivenih na 1. projektu
- **1** - ako je na zadavanju problema ukupni broj bodova unutar intervala [0%, 33%]
- **2** - ako je na zadavanju problema ukupni broj bodova unutar intervala (33%, 67%]
- **3** - ako je na zadavanju problema ukupni broj bodova unutar intervala (67%, 100%]
**Hiperparametri** - za odabir optimalne kombinacije hiperparametara napravljen je
cross-validation s 10 *kutija* pri čemu je na slučajni način isprobano 500 kombinacija
hiperparametara *mtry*, *min_n* i *trees*. Najbolji model je odabran s obzirom na *roc_auc*
metriku.
- *mtry* - uvijek je jednak 2
- *trees* - uzimani su prirodni brojevi između 400 i 1500.
- *min_n* - uzimani su prirodni brojevi između 2 i 40.
### PROJEKT1 (stablo odlučivanja)
**Prediktori**
- TEST - suma svih bodova na testovima
- AKTIVNOST - bodovi na aktivnosti
- KOL1 - ukupni broj bodova na prvom kolokviju
- KOL2 - ukupni broj bodova na drugom kolokviju
**Response** - Klasifikacija studenata na temelju broja bodova dobivenih na 1. projektu
- **1** - ako je na zadavanju problema ukupni broj bodova unutar intervala [0%, 33%]
- **2** - ako je na zadavanju problema ukupni broj bodova unutar intervala (33%, 67%]
- **3** - ako je na zadavanju problema ukupni broj bodova unutar intervala (67%, 100%]
**Hiperparametri** - za odabir optimalne kombinacije hiperparametara napravljen je
cross-validation s 10 *kutija* pri čemu je na slučajni način isprobano 5000 kombinacija
hiperparametara *tree_depth*, *min_n* i *cost_complexity* preko uzorkovanja na latinskoj hiperkocki.
Najbolji model je odabran s obzirom na *roc_auc* metriku.
- *tree_depth* - uzimani su prirodni brojevi između 1 i 15.
- *cost_complexity* - uzimani su realni brojevi između 1e-10 i 0.1.
- *min_n* - uzimani su prirodni brojevi između 2 i 40.
## Column 2
### 3-kvantili za kreiranje klasa
```{r}
qvant %>% kbl() %>%
kable_paper("hover", full_width = F)
```
### summary
```{r}
summary(podaci) %>% kbl() %>%
kable_paper("hover", full_width = F)
```
# hiperparametri - slučajna šuma {data-navmenu="PROJEKT1"}
## Column 1
### Testirani hiperparametri
```{r}
Zadavanje_tuning %>%
collect_metrics() %>%
filter(.metric == "roc_auc", trees > 0) %>%
pivot_longer(cols = trees:min_n) %>%
mutate(best_mod = mean == max(mean)) %>%
ggplot(aes(x = value, y = mean)) +
#geom_line(alpha = 0.5, size = 1.5) +
geom_point(aes(color = best_mod), size = 0.3) +
facet_wrap(~name, scales = "free_x") +
scale_x_continuous() +
labs(y = "roc auc", x = "", color = "Best Model")
```
## Column 2
### 20 najboljih modela za roc_auc metriku
```{r}
print(Zadavanje_tuning %>% show_best(metric = 'roc_auc', n = 20), n = 20)
```
# efikasnost - slučajna šuma {data-navmenu="PROJEKT1"}
## Column 1
### Metrike na testnom i trening skupu {data-height=200}
```{r}
tab1 <- Zadavanje_fit %>% collect_predictions() %>%
rf_metrike(truth = PROJEKT1, estimate = .pred_class, .pred_1:.pred_3)
tab2 <- Zadavanje_work %>%
rf_metrike(truth = PROJEKT1, estimate = .pred_class, .pred_1:.pred_3)
tab1 %>% inner_join(tab2, by = ".metric") %>%
select(.metric, .estimator = .estimator.x, trening = .estimate.y, test = .estimate.x)
```
### Confusion matrix
```{r}
Zadavanje_fit %>% collect_predictions() %>%
conf_mat(truth = PROJEKT1, estimate = .pred_class) %>% autoplot("heatmap") +
scale_fill_gradient(low = "#87DEE7",
high = "#FFFFCC")
```
## Column 2
### ROC curve
```{r}
Zadavanje_fit %>% collect_predictions() %>% roc_curve(PROJEKT1, .pred_1:.pred_3) %>% autoplot()
```
### Gain curve
```{r}
Zadavanje_fit %>% collect_predictions() %>% gain_curve(PROJEKT1, .pred_1:.pred_3) %>% autoplot()
```
# važnost prediktora - slučajna šuma {data-navmenu="PROJEKT1"}
## Columnn 1
### Gini
```{r}
Zadavanje_fit %>% extract_fit_parsnip() %>% vip()
```
### permutacija
```{r}
Zadavanje_permfit %>% extract_fit_parsnip() %>% vip()
```
## Column 2
### Boruta
```{r}
plot(Zadavanje_Boruta, cex.axis=.7, las=2, xlab="",
colCode = c("green", "orange", "#f6546a", "#2acaea"))
```
### Boruta (history)
```{r}
plotImpHistory(Zadavanje_Boruta, colCode = c("green", "orange", "#f6546a", "#2acaea"))
```
# PDP-M-ALE plot - slučajna šuma {data-navmenu="PROJEKT1"}
## Column {.tabset .tabset-fade}
### PDP plot
```{r fig.width=12}
plot(pdp_zadavanje_all, geom = "profiles")
```
### M plot
```{r fig.width=12}
plot(cond_zadavanje_all, geom = "profiles")
```
### ALE plot
```{r fig.width=12}
plot(acc_zadavanje_all, geom = "profiles")
```
### svi
```{r fig.width=12}
oznake <- c("klasa 1", "klasa 2", "klasa 3")
names(oznake) <- c("workflow.1", "workflow.2", "workflow.3")
ggplot(krivulje_zadavanje, aes(x = `_x_`, y = `_yhat_`, color = vrsta)) +
geom_line() +
facet_grid(cols = vars(`_vname_`), rows = vars(`_label_`), scales = "free_x",
labeller = labeller(`_label_` = oznake)) +
xlab("bodovi") + ylab("predikcija vjerojatnosti")
```
# Break-Down SHAP - slučajna šuma {data-navmenu="PROJEKT1"}
## Column 1 {.tabset .tabset-fade}
### BD 2
```{r}
plot(ob2_zadavanje)
```
### BD 88
```{r}
plot(ob88_zadavanje)
```
### BD 99
```{r}
plot(ob99_zadavanje)
```
## Column 2 {.tabset .tabset-fade}
### SHAP 2
```{r}
plot(ob2_zadavanje_shap)
```
### SHAP 88
```{r}
plot(ob88_zadavanje_shap)
```
### SHAP 99
```{r}
plot(ob99_zadavanje_shap)
```
# hiperparametri - stablo odlučivanja {data-navmenu="PROJEKT1"}
## Column 1
### Testirani hiperparametri
```{r}
Zadavanje_tuning_tree %>%
collect_metrics() %>%
filter(.metric == "roc_auc") %>%
pivot_longer(cols = cost_complexity:min_n) %>%
mutate(best_mod = mean == max(mean)) %>%
ggplot(aes(x = value, y = mean)) +
#geom_line(alpha = 0.5, size = 1.5) +
geom_point(aes(color = best_mod), size = 0.3) +
facet_wrap(~name, scales = "free_x") +
scale_x_continuous() +
labs(y = "roc auc", x = "", color = "Best Model")
```
## Column 2
### 20 najboljih modela za roc_auc metriku
```{r}
print(Zadavanje_tuning_tree %>% show_best(metric = 'roc_auc', n = 20), n = 20)
```
# efikasnost - stablo odlučivanja {data-navmenu="PROJEKT1"}
## Column 1
### Metrike na testnom i trening skupu {data-height=200}
```{r}
tab1 <- Zadavanje_fit_tree %>% collect_predictions() %>%
rf_metrike(truth = PROJEKT1, estimate = .pred_class, .pred_1:.pred_3)
tab2 <- Zadavanje_work_tree %>%
rf_metrike(truth = PROJEKT1, estimate = .pred_class, .pred_1:.pred_3)
tab1 %>% inner_join(tab2, by = ".metric") %>%
select(.metric, .estimator = .estimator.x, trening = .estimate.y, test = .estimate.x)
```
### Confusion matrix
```{r}
Zadavanje_fit_tree %>% collect_predictions() %>%
conf_mat(truth = PROJEKT1, estimate = .pred_class) %>% autoplot("heatmap") +
scale_fill_gradient(low = "#87DEE7",
high = "#FFFFCC")
```
## Column 2
### ROC curve
```{r}
Zadavanje_fit_tree %>% collect_predictions() %>% roc_curve(PROJEKT1, .pred_1:.pred_3) %>% autoplot()
```
### Gain curve
```{r}
Zadavanje_fit_tree %>% collect_predictions() %>% gain_curve(PROJEKT1, .pred_1:.pred_3) %>% autoplot()
```
# važnost prediktora - stablo odlučivanja {data-navmenu="PROJEKT1"}
## Columnn 1 {data-width=300}
### Važnost prediktora
```{r}
Zadavanje_fit_tree %>% extract_fit_parsnip() %>% vip()
```
## Column 2 {data-width=500 .tabset .tabset-fade}
### Stablo graf
```{r fig.width=10}
Zadavanje_fit_tree %>%
extract_fit_engine() %>%
rpart.plot(roundint = FALSE, digits = 3)
```
### Stablo tekst
```{r}
Zadavanje_fit_tree %>%
extract_fit_engine()
```
### Stablo detalji
```{r}
info1 <- summary(Zadavanje_fit_tree %>%
extract_fit_engine())$splits
info1
```
# Opis modela {data-navmenu="PROJEKT2"}
## Column 1
### PROJEKT2 (slučajna šuma)
**Prediktori**
- TEST - suma svih bodova na testovima
- AKTIVNOST - bodovi na aktivnosti
- KOL1 - ukupni broj bodova na prvom kolokviju
- KOL2 - ukupni broj bodova na drugom kolokviju
- PROJEKT1 - ukupni broj bodova na 1. projektu
**Response** - Klasifikacija studenata na temelju broja bodova dobivenih na 2. projektu
- **1** - ako je na rješavanju problema ukupni broj bodova unutar intervala [0%, 33%]
- **2** - ako je na rješavanju problema ukupni broj bodova unutar intervala (33%, 67%]
- **3** - ako je na rješavanju problema ukupni broj bodova unutar intervala (67%, 100%]
**Hiperparametri** - za odabir optimalne kombinacije hiperparametara napravljen je
cross-validation s 10 *kutija* pri čemu je na slučajni način isprobano 500 kombinacija
hiperparametara *mtry*, *min_n* i *trees*. Najbolji model je odabran s obzirom na *roc_auc*
metriku.
- *mtry* - uzimane su vrijednosti iz skupa {2,3}
- *trees* - uzimani su prirodni brojevi između 400 i 1500.
- *min_n* - uzimani su prirodni brojevi između 2 i 40.
### PROJEKT2 (stablo odlučivanja)
**Prediktori**
**Prediktori**
- TEST - suma svih bodova na testovima
- AKTIVNOST - bodovi na aktivnosti
- KOL1 - ukupni broj bodova na prvom kolokviju
- KOL2 - ukupni broj bodova na drugom kolokviju
- PROJEKT1 - ukupni broj bodova na 1. projektu
**Response** - Klasifikacija studenata na temelju broja bodova dobivenih na 2. projektu
- **1** - ako je na rješavanju problema ukupni broj bodova unutar intervala [0%, 33%]
- **2** - ako je na rješavanju problema ukupni broj bodova unutar intervala (33%, 67%]
- **3** - ako je na rješavanju problema ukupni broj bodova unutar intervala (67%, 100%]
**Hiperparametri** - za odabir optimalne kombinacije hiperparametara napravljen je
cross-validation s 10 *kutija* pri čemu je na slučajni način isprobano 5000 kombinacija
hiperparametara *tree_depth*, *min_n* i *cost_complexity* preko uzorkovanja na latinskoj hiperkocki.
Najbolji model je odabran s obzirom na *roc_auc* metriku.
- *tree_depth* - uzimani su prirodni brojevi između 1 i 15.
- *cost_complexity* - uzimani su realni brojevi između 1e-10 i 0.1.
- *min_n* - uzimani su prirodni brojevi između 2 i 40.
## Column 2
### 3-kvantili za kreiranje klasa
```{r}
qvant %>% kbl() %>%
kable_paper("hover", full_width = F)
```
### summary
```{r}
summary(podaci) %>% kbl() %>%
kable_paper("hover", full_width = F)
```
# hiperparametri - slučajna šuma {data-navmenu="PROJEKT2"}
## Column 1
### Testirani hiperparametri
```{r}
Rjesavanje_tuning %>%
collect_metrics() %>%
filter(.metric == "roc_auc", trees > 0) %>%
pivot_longer(cols = mtry:min_n) %>%
mutate(best_mod = mean == max(mean)) %>%
ggplot(aes(x = value, y = mean)) +
#geom_line(alpha = 0.5, size = 1.5) +
geom_point(aes(color = best_mod), size = 0.3) +
facet_wrap(~name, scales = "free_x") +
scale_x_continuous() +
labs(y = "roc auc", x = "", color = "Best Model")
```
## Column 2
### 20 najboljih modela za roc_auc metriku
```{r}
print(Rjesavanje_tuning %>% show_best(metric = 'roc_auc', n = 20), n = 20)
```
# efikasnost - slučajna šuma {data-navmenu="PROJEKT2"}
## Column 1
### Metrike na testnom i trening skupu {data-height=200}
```{r}
tab1 <- Rjesavanje_fit %>% collect_predictions() %>%
rf_metrike(truth = PROJEKT2, estimate = .pred_class, .pred_1:.pred_3)
tab2 <- Rjesavanje_work %>%
rf_metrike(truth = PROJEKT2, estimate = .pred_class, .pred_1:.pred_3)
tab1 %>% inner_join(tab2, by = ".metric") %>%
select(.metric, .estimator = .estimator.x, trening = .estimate.y, test = .estimate.x)
```
### Confusion matrix
```{r}
Rjesavanje_fit %>% collect_predictions() %>%
conf_mat(truth = PROJEKT2, estimate = .pred_class) %>% autoplot("heatmap") +
scale_fill_gradient(low = "#87DEE7",
high = "#FFFFCC")
```
## Column 2
### ROC curve
```{r}
Rjesavanje_fit %>% collect_predictions() %>% roc_curve(PROJEKT2, .pred_1:.pred_3) %>% autoplot()
```
### Gain curve
```{r}
Rjesavanje_fit %>% collect_predictions() %>% gain_curve(PROJEKT2, .pred_1:.pred_3) %>% autoplot()
```
# važnost prediktora - slučajna šuma {data-navmenu="PROJEKT2"}
## Columnn 1
### Gini
```{r}
Rjesavanje_fit %>% extract_fit_parsnip() %>% vip()
```
### permutacija
```{r}
Rjesavanje_permfit %>% extract_fit_parsnip() %>% vip()
```
## Column 2
### Boruta
```{r}
plot(Rjesavanje_Boruta, cex.axis=.7, las=2, xlab="",
colCode = c("green", "orange", "#f6546a", "#2acaea"))
```
### Boruta (history)
```{r}
plotImpHistory(Rjesavanje_Boruta, colCode = c("green", "orange", "#f6546a", "#2acaea"))
```
# PDP-M-ALE plot - slučajna šuma {data-navmenu="PROJEKT2"}
## Column {.tabset .tabset-fade}
### PDP plot
```{r fig.width=12}
plot(pdp_rjesavanje_all, geom = "profiles")
```
### M plot
```{r fig.width=12}
plot(cond_rjesavanje_all, geom = "profiles")
```
### ALE plot
```{r fig.width=12}
plot(acc_rjesavanje_all, geom = "profiles")
```
### svi
```{r fig.width=12}
oznake <- c("klasa 1", "klasa 2", "klasa 3")
names(oznake) <- c("workflow.1", "workflow.2", "workflow.3")
ggplot(krivulje_rjesavanje, aes(x = `_x_`, y = `_yhat_`, color = vrsta)) +
geom_line() +
facet_grid(cols = vars(`_vname_`), rows = vars(`_label_`), scales = "free_x",
labeller = labeller(`_label_` = oznake)) +
xlab("bodovi") + ylab("predikcija vjerojatnosti")
```
# Break-Down SHAP - slučajna šuma {data-navmenu="PROJEKT2"}
## Column 1 {.tabset .tabset-fade}
### BD 2
```{r}
plot(ob2_rjesavanje)
```
### BD 88
```{r}
plot(ob88_rjesavanje)
```
### BD 99
```{r}
plot(ob99_rjesavanje)
```
## Column 2 {.tabset .tabset-fade}
### SHAP 2
```{r}
plot(ob2_rjesavanje_shap)
```
### SHAP 88
```{r}
plot(ob88_rjesavanje_shap)
```
### SHAP 99
```{r}
plot(ob99_rjesavanje_shap)
```
# hiperparametri - stablo odlučivanja {data-navmenu="PROJEKT2"}
## Column 1
### Testirani hiperparametri
```{r}
Rjesavanje_tuning_tree %>%
collect_metrics() %>%
filter(.metric == "roc_auc") %>%
pivot_longer(cols = cost_complexity:min_n) %>%
mutate(best_mod = mean == max(mean)) %>%
ggplot(aes(x = value, y = mean)) +
#geom_line(alpha = 0.5, size = 1.5) +
geom_point(aes(color = best_mod), size = 0.3) +
facet_wrap(~name, scales = "free_x") +
scale_x_continuous() +
labs(y = "roc auc", x = "", color = "Best Model")
```
## Column 2
### 20 najboljih modela za roc_auc metriku
```{r}
print(Rjesavanje_tuning_tree %>% show_best(metric = 'roc_auc', n = 20), n = 20)
```
# efikasnost - stablo odlučivanja {data-navmenu="PROJEKT2"}
## Column 1
### Metrike na testnom i trening skupu {data-height=200}
```{r}
tab1 <- Rjesavanje_fit_tree %>% collect_predictions() %>%
rf_metrike(truth = PROJEKT2, estimate = .pred_class, .pred_1:.pred_3)
tab2 <- Rjesavanje_work_tree %>%
rf_metrike(truth = PROJEKT2, estimate = .pred_class, .pred_1:.pred_3)
tab1 %>% inner_join(tab2, by = ".metric") %>%
select(.metric, .estimator = .estimator.x, trening = .estimate.y, test = .estimate.x)
```
### Confusion matrix
```{r}
Rjesavanje_fit_tree %>% collect_predictions() %>%
conf_mat(truth = PROJEKT2, estimate = .pred_class) %>% autoplot("heatmap") +
scale_fill_gradient(low = "#87DEE7",
high = "#FFFFCC")
```
## Column 2
### ROC curve
```{r}
Rjesavanje_fit_tree %>% collect_predictions() %>% roc_curve(PROJEKT2, .pred_1:.pred_3) %>% autoplot()
```
### Gain curve
```{r}
Rjesavanje_fit_tree %>% collect_predictions() %>% gain_curve(PROJEKT2, .pred_1:.pred_3) %>% autoplot()
```
# važnost prediktora - stablo odlučivanja {data-navmenu="PROJEKT2"}
## Columnn 1 {data-width=300}
### Važnost prediktora
```{r}
Rjesavanje_fit_tree %>% extract_fit_parsnip() %>% vip()
```
## Column 2 {data-width=700 .tabset .tabset-fade}
### Stablo graf
```{r fig.width = 16, fig.height = 8}
Rjesavanje_fit_tree %>%
extract_fit_engine() %>%
rpart.plot(roundint = FALSE, digits = 3)
```
### Stablo tekst
```{r}
Rjesavanje_fit_tree %>%
extract_fit_engine()
```
### Stablo detalji
```{r}
info2 <- summary(Rjesavanje_fit_tree %>%
extract_fit_engine())$splits
info2
```
# Opis modela {data-navmenu="PROCJENA"}
## Column 1
### Procjena rješenja problema (slučajna šuma)
**Prediktori**
- TEST - suma svih bodova na testovima
- AKTIVNOST - bodovi na aktivnosti
- KOL1 - ukupni broj bodova na prvom kolokviju
- KOL2 - ukupni broj bodova na drugom kolokviju
- PROJEKT1 - ukupni broj bodova na 1. projektu
- PROJEKT2 - ukupni broj bodova na 2. projektu
**Response** - Klasifikacija studenata na temelju broja bodova dobivenih na procjeni rješenja problema
- **1** - ako je na procjeni rješenja problema ukupni broj bodova unutar intervala [0%, 33%)
- **2** - ako je na procjeni rješenja problema ukupni broj bodova unutar intervala [33%, 67%)
- **3** - ako je na procjeni rješenja problema ukupni broj bodova unutar intervala [67%, 100%]
**Hiperparametri** - za odabir optimalne kombinacije hiperparametara napravljen je
cross-validation s 10 *kutija* pri čemu je na slučajni način isprobano 500 kombinacija
hiperparametara *mtry*, *min_n* i *trees*. Najbolji model je odabran s obzirom na *roc_auc*
metriku.
- *mtry* - uzimane su vrijednosti iz skupa {2,3}
- *trees* - uzimani su prirodni brojevi između 400 i 1500.
- *min_n* - uzimani su prirodni brojevi između 2 i 40.
### Procjena rješenja problema (stablo odlučivanja)
**Prediktori**
- TEST - suma svih bodova na testovima
- AKTIVNOST - bodovi na aktivnosti
- KOL1 - ukupni broj bodova na prvom kolokviju
- KOL2 - ukupni broj bodova na drugom kolokviju
- PROJEKT1 - ukupni broj bodova na 1. projektu
- PROJEKT2 - ukupni broj bodova na 2. projektu
**Response** - Klasifikacija studenata na temelju broja bodova dobivenih na procjeni rješenja problema
- **1** - ako je na procjeni rješenja problema ukupni broj bodova unutar intervala [0%, 33%)
- **2** - ako je na procjeni rješenja problema ukupni broj bodova unutar intervala [33%, 67%)
- **3** - ako je na procjeni rješenja problema ukupni broj bodova unutar intervala [67%, 100%]
**Hiperparametri** - za odabir optimalne kombinacije hiperparametara napravljen je
cross-validation s 10 *kutija* pri čemu je na slučajni način isprobano 5000 kombinacija
hiperparametara *tree_depth*, *min_n* i *cost_complexity* preko uzorkovanja na latinskoj hiperkocki.
Najbolji model je odabran s obzirom na *roc_auc* metriku.
- *tree_depth* - uzimani su prirodni brojevi između 1 i 15.
- *cost_complexity* - uzimani su realni brojevi između 1e-10 i 0.1.
- *min_n* - uzimani su prirodni brojevi između 2 i 40.
## Column 2
### 3-kvantili za kreiranje klasa
```{r}
qvant %>% kbl() %>%
kable_paper("hover", full_width = F)
```
### summary
```{r}
summary(podaci) %>% kbl() %>%
kable_paper("hover", full_width = F)
```
# hiperparametri - slučajna šuma {data-navmenu="PROCJENA"}
## Column 1
### Testirani hiperparametri
```{r}
Procjena_tuning %>%
collect_metrics() %>%
filter(.metric == "roc_auc", trees > 0) %>%
pivot_longer(cols = mtry:min_n) %>%
mutate(best_mod = mean == max(mean)) %>%
ggplot(aes(x = value, y = mean)) +
#geom_line(alpha = 0.5, size = 1.5) +
geom_point(aes(color = best_mod), size = 0.3) +
facet_wrap(~name, scales = "free_x") +
scale_x_continuous() +
labs(y = "roc auc", x = "", color = "Best Model")
```
## Column 2
### 20 najboljih modela za roc_auc metriku
```{r}
print(Procjena_tuning %>% show_best(metric = 'roc_auc', n = 20), n = 20)
```
# efikasnost - slučajna šuma {data-navmenu="PROCJENA"}
## Column 1
### Metrike na testnom i trening skupu {data-height=200}
```{r}
tab1 <- Procjena_fit %>% collect_predictions() %>%
rf_metrike(truth = Procjena, estimate = .pred_class, .pred_1:.pred_3)
tab2 <- Procjena_work %>%
rf_metrike(truth = Procjena, estimate = .pred_class, .pred_1:.pred_3)
tab1 %>% inner_join(tab2, by = ".metric") %>%
select(.metric, .estimator = .estimator.x, trening = .estimate.y, test = .estimate.x)
```
### Confusion matrix
```{r}
Procjena_fit %>% collect_predictions() %>%
conf_mat(truth = Procjena, estimate = .pred_class) %>% autoplot("heatmap") +
scale_fill_gradient(low = "#87DEE7",
high = "#FFFFCC")
```
## Column 2
### ROC curve
```{r}
Procjena_fit %>% collect_predictions() %>% roc_curve(Procjena, .pred_1:.pred_3) %>% autoplot()
```
### Gain curve
```{r}
Procjena_fit %>% collect_predictions() %>% gain_curve(Procjena, .pred_1:.pred_3) %>% autoplot()
```
# važnost prediktora - slučajna šuma {data-navmenu="PROCJENA"}
## Columnn 1
### Gini
```{r}
Procjena_fit %>% extract_fit_parsnip() %>% vip()
```
### permutacija
```{r}
Procjena_permfit %>% extract_fit_parsnip() %>% vip()
```
## Column 2
### Boruta
```{r}
plot(Procjena_Boruta, cex.axis=.7, las=2, xlab="",
colCode = c("green", "orange", "#f6546a", "#2acaea"))
```
### Boruta (history)
```{r}
plotImpHistory(Procjena_Boruta, colCode = c("green", "orange", "#f6546a", "#2acaea"))
```
# PDP-M-ALE plot - slučajna šuma {data-navmenu="PROCJENA"}
## Column {.tabset .tabset-fade}
### PDP plot
```{r fig.width=12}
plot(pdp_procjena_all, geom = "profiles")
```
### M plot
```{r fig.width=12}
plot(cond_procjena_all, geom = "profiles")
```
### ALE plot
```{r fig.width=12}
plot(acc_procjena_all, geom = "profiles")
```
### svi
```{r fig.width=12}
oznake <- c("klasa 1", "klasa 2", "klasa 3")
names(oznake) <- c("workflow.1", "workflow.2", "workflow.3")
ggplot(krivulje_procjena, aes(x = `_x_`, y = `_yhat_`, color = vrsta)) +
geom_line() +
facet_grid(cols = vars(`_vname_`), rows = vars(`_label_`), scales = "free_x",
labeller = labeller(`_label_` = oznake)) +
xlab("bodovi") + ylab("predikcija vjerojatnosti")
```
# Break-Down SHAP - slučajna šuma {data-navmenu="PROCJENA"}
## Column 1 {.tabset .tabset-fade}
### BD 2
```{r}
plot(ob2_procjena)
```
### BD 45
```{r}
plot(ob45_procjena)
```
### BD 88
```{r}
plot(ob88_procjena)
```
## Column 2 {.tabset .tabset-fade}
### SHAP 2
```{r}
plot(ob2_procjena_shap)
```
### SHAP 45
```{r}
plot(ob45_procjena_shap)
```
### SHAP 88
```{r}
plot(ob88_procjena_shap)
```
# hiperparametri - stablo odlučivanja {data-navmenu="PROCJENA"}
## Column 1
### Testirani hiperparametri
```{r}
Procjena_tuning_tree %>%
collect_metrics() %>%
filter(.metric == "roc_auc") %>%
pivot_longer(cols = cost_complexity:min_n) %>%
mutate(best_mod = mean == max(mean)) %>%
ggplot(aes(x = value, y = mean)) +
#geom_line(alpha = 0.5, size = 1.5) +
geom_point(aes(color = best_mod), size = 0.3) +
facet_wrap(~name, scales = "free_x") +
scale_x_continuous() +
labs(y = "roc auc", x = "", color = "Best Model")
```
## Column 2
### 20 najboljih modela za roc_auc metriku
```{r}
print(Procjena_tuning_tree %>% show_best(metric = 'roc_auc', n = 20), n = 20)
```
# efikasnost - stablo odlučivanja {data-navmenu="PROCJENA"}
## Column 1
### Metrike na testnom i trening skupu {data-height=200}
```{r}
tab1 <- Procjena_fit_tree %>% collect_predictions() %>%
rf_metrike(truth = Procjena, estimate = .pred_class, .pred_1:.pred_3)
tab2 <- Procjena_work_tree %>%
rf_metrike(truth = Procjena, estimate = .pred_class, .pred_1:.pred_3)
tab1 %>% inner_join(tab2, by = ".metric") %>%
select(.metric, .estimator = .estimator.x, trening = .estimate.y, test = .estimate.x)
```
### Confusion matrix
```{r}
Procjena_fit_tree %>% collect_predictions() %>%
conf_mat(truth = Procjena, estimate = .pred_class) %>% autoplot("heatmap") +
scale_fill_gradient(low = "#87DEE7",
high = "#FFFFCC")
```
## Column 2
### ROC curve
```{r}
Procjena_fit_tree %>% collect_predictions() %>% roc_curve(Procjena, .pred_1:.pred_3) %>% autoplot()
```
### Gain curve
```{r}
Procjena_fit_tree %>% collect_predictions() %>% gain_curve(Procjena, .pred_1:.pred_3) %>% autoplot()
```
# važnost prediktora - stablo odlučivanja {data-navmenu="PROCJENA"}
## Columnn 1 {data-width=300}
### Važnost prediktora
```{r}
Procjena_fit_tree %>% extract_fit_parsnip() %>% vip()
```
## Column 2 {data-width=500 .tabset .tabset-fade}
### Stablo graf
```{r fig.width=16, fig.height=8}
Procjena_fit_tree %>%
extract_fit_engine() %>%
rpart.plot(roundint = FALSE, digits = 3, tweak = 1.1)
```
### Stablo tekst
```{r}
Procjena_fit_tree %>%
extract_fit_engine()
```
### Stablo detalji
```{r}
info3 <- summary(Procjena_fit_tree %>%
extract_fit_engine())$splits
info3
```