Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 21, 2024

There is a transaction record as follows: the item was a computer, and the currency was Euro. The seller rating was 1000, and the duration

There is a transaction record as follows: the item was a computer, and the currency was Euro. The seller rating was 1000, and the duration was one day. Also, the end day was Monday and the closing price was 30. Lastly, the opening price was 4. Predict whether this transaction was competitive or not.

The 'ebay' data table is as below:

'data.frame': 1972 obs. of 8 variables:

$ Category : Factor w/ 18 levels "Antique/Art/Craft",..: 14 14 14 14 14 14 14 14 14 14 ...

$ currency : Factor w/ 3 levels "EUR","GBP","US": 3 3 3 3 3 3 3 3 3 3 ...

$ sellerRating: int 3249 3249 3249 3249 3249 3249 3249 3249 3249 3249 ...

$ Duration : Factor w/ 5 levels "1","3","5","7",..: 3 3 3 3 3 3 3 3 3 3 ...

$ endDay : Factor w/ 7 levels "Fri","Mon","Sat",..: 2 2 2 2 2 2 2 2 2 2 ...

$ ClosePrice : int 0 0 0 0 0 0 0 0 0 0 ...

$ OpenPrice : int 0 0 0 0 0 0 0 0 0 0 ...

$ Competitive : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...

I calculated the summary of the decision tree as below:

n= 1382 node), split, n, loss, yval, (yprob) * denotes terminal node 1) root 1382 635 1 (0.4594790 0.5405210) 2) OpenPrice>=1.5 953 401 0 (0.5792235 0.4207765) 4) ClosePrice< 9.5 447 126 0 (0.7181208 0.2818792) 8) OpenPrice>=3.5 249 40 0 (0.8393574 0.1606426) * 9) OpenPrice< 3.5 198 86 0 (0.5656566 0.4343434) 18) ClosePrice< 3.5 138 26 0 (0.8115942 0.1884058) * 19) ClosePrice>=3.5 60 0 1 (0.0000000 1.0000000) * 5) ClosePrice>=9.5 506 231 1 (0.4565217 0.5434783) 10) OpenPrice>=9.5 314 106 0 (0.6624204 0.3375796) 20) sellerRating>=562 241 60 0 (0.7510373 0.2489627) * 21) sellerRating< 562 73 27 1 (0.3698630 0.6301370) * 11) OpenPrice< 9.5 192 23 1 (0.1197917 0.8802083) * 3) OpenPrice< 1.5 429 83 1 (0.1934732 0.8065268) 6) ClosePrice< 1.5 119 36 0 (0.6974790 0.3025210) * 7) ClosePrice>=1.5 310 0 1 (0.0000000 1.0000000) *

and plot the tree as below:

rpart(formula = Competitive ~ ., data = traineB.df, method = "class") n= 1382

CP nsplit rel error xerror xstd

1 0.23779528 0 1.0000000 1.0000000 0.02917557

2 0.11496063 1 0.7622047 0.7748031 0.02803172

3 0.07401575 3 0.5322835 0.5874016 0.02598796

4 0.04724409 4 0.4582677 0.5165354 0.02490746

5 0.02992126 6 0.3637795 0.4047244 0.02277797

6 0.01000000 7 0.3338583 0.3527559 0.02157499 V

ariable importance

ClosePrice OpenPrice Category sellerRating Duration endDay currency

42 38 8 6 3 3 1

Node number 1: 1382 observations, complexity param=0.2377953 predicted class=1 expected loss=0.459479 P(node) =1 class counts: 635 747 probabilities: 0.459 0.541 left son=2 (953 obs) right son=3 (429 obs) Primary splits: OpenPrice < 1.5 to the right, improve=88.04095, (0 missing) Category splits as RLLRRLRRRLLRLRRLRR, improve=37.80220, (0 missing) ClosePrice < 9.5 to the left, improve=36.28260, (0 missing) sellerRating < 3350 to the right, improve=28.94549, (0 missing) endDay splits as LRLLRRL, improve=21.63334, (0 missing) Surrogate splits: ClosePrice < 1.5 to the right, agree=0.776, adj=0.277, (0 split) sellerRating < 26282.5 to the left, agree=0.716, adj=0.086, (0 split) Duration splits as LLRLL, agree=0.716, adj=0.084, (0 split) Category splits as LLLLLLLRRLLLLRLLLL, agree=0.708, adj=0.058, (0 split) currency splits as LRL, agree=0.699, adj=0.030, (0 split)

Node number 2: 953 observations, complexity param=0.1149606 predicted class=0 expected loss=0.4207765 P(node) =0.6895803 class counts: 552 401 probabilities: 0.579 0.421 left son=4 (447 obs) right son=5 (506 obs) Primary splits: ClosePrice < 9.5 to the left, improve=32.48385, (0 missing) Category splits as RLRRRLRRRLLRLLRLRR, improve=23.43729, (0 missing) sellerRating < 3350 to the right, improve=18.58827, (0 missing) OpenPrice < 3.5 to the right, improve=13.05877, (0 missing) endDay splits as LRLRLRL, improve= 4.67842, (0 missing) Surrogate splits: OpenPrice < 8.5 to the left, agree=0.829, adj=0.635, (0 split) Category splits as LRLRRLLRRLLRRLRRRR, agree=0.687, adj=0.333, (0 split) Duration splits as RRRLR, agree=0.590, adj=0.125, (0 split) sellerRating < 2168.5 to the right, agree=0.558, adj=0.058, (0 split) endDay splits as RLLRLRR, agree=0.554, adj=0.049, (0 split)

Node number 3: 429 observations, complexity param=0.07401575 predicted class=1 expected loss=0.1934732 P(node) =0.3104197 class counts: 83 346 probabilities: 0.193 0.807 left son=6 (119 obs) right son=7 (310 obs) Primary splits: ClosePrice < 1.5 to the left, improve=83.664960, (0 missing) Category splits as LRLRRRRRRL-RRRRRRR, improve=13.260000, (0 missing) Duration splits as RRRLL, improve= 7.161476, (0 missing) OpenPrice < 0.5 to the right, improve= 5.339721, (0 missing) currency splits as LRR, improve= 3.532801, (0 missing) Surrogate splits: Category splits as RRLRRRRRRL-RRRRRRR, agree=0.746, adj=0.084, (0 split) sellerRating < 30737.5 to the right, agree=0.725, adj=0.008, (0 split)

Node number 4: 447 observations, complexity param=0.04724409 predicted class=0 expected loss=0.2818792 P(node) =0.3234443 class counts: 321 126 probabilities: 0.718 0.282 left son=8 (249 obs) right son=9 (198 obs) Primary splits: OpenPrice < 3.5 to the right, improve=16.524920, (0 missing) Category splits as RLRLRLRLRLLLLL-LRR, improve=10.109070, (0 missing) endDay splits as LLLLLRR, improve= 7.032405, (0 missing) ClosePrice < 2.5 to the left, improve= 4.662388, (0 missing) Duration splits as LLLLR, improve= 4.105146, (0 missing) Surrogate splits: ClosePrice < 3.5 to the right, agree=0.866, adj=0.697, (0 split) Category splits as RLRRLRRRRLLRLL-LLR, agree=0.673, adj=0.263, (0 split) sellerRating < 2317.5 to the left, agree=0.640, adj=0.187, (0 split) currency splits as RLL, agree=0.638, adj=0.182, (0 split) Duration splits as LLLLR, agree=0.613, adj=0.126, (0 split)

Node number 5: 506 observations, complexity param=0.1149606 predicted class=1 expected loss=0.4565217 P(node) =0.366136 class counts: 231 275 probabilities: 0.457 0.543 left son=10 (314 obs) right son=11 (192 obs) Primary splits: OpenPrice < 9.5 to the right, improve=70.164250, (0 missing) Category splits as RLRRLLRRLLLRLRRLLR, improve=28.842930, (0 missing) sellerRating < 2968 to the right, improve=16.366210, (0 missing) Duration splits as RLRRR, improve= 6.691498, (0 missing) endDay splits as LRLRLLL, improve= 6.502354, (0 missing) Surrogate splits: Category splits as RLLLLLLRLLLRLLRLLL, agree=0.690, adj=0.182, (0 split) ClosePrice < 10.5 to the right, agree=0.644, adj=0.063, (0 split) currency splits as LRL, agree=0.623, adj=0.005, (0 split) sellerRating < 107 to the right, agree=0.623, adj=0.005, (0 split)

Node number 6: 119 observations predicted class=0 expected loss=0.302521 P(node) =0.08610709 class counts: 83 36 probabilities: 0.697 0.303

Node number 7: 310 observations predicted class=1 expected loss=0 P(node) =0.2243126 class counts: 0 310 probabilities: 0.000 1.000

Node number 8: 249 observations predicted class=0 expected loss=0.1606426 P(node) =0.1801737 class counts: 209 40 probabilities: 0.839 0.161

Node number 9: 198 observations, complexity param=0.04724409 predicted class=0 expected loss=0.4343434 P(node) =0.1432706 class counts: 112 86 probabilities: 0.566 0.434 left son=18 (138 obs) right son=19 (60 obs) Primary splits: ClosePrice < 3.5 to the left, improve=55.090030, (0 missing) endDay splits as LLLLLRL, improve= 9.554223, (0 missing) Category splits as RLLLRLRLRLLLLL--LL, improve= 8.232531, (0 missing) sellerRating < 3328.5 to the right, improve= 5.546898, (0 missing) Duration splits as LRLLR, improve= 2.181818, (0 missing) Surrogate splits: endDay splits as LLLLLRL, agree=0.763, adj=0.217, (0 split)

Node number 10: 314 observations, complexity param=0.02992126 predicted class=0 expected loss=0.3375796 P(node) =0.2272069 class counts: 208 106 probabilities: 0.662 0.338 left son=20 (241 obs) right son=21 (73 obs) Primary splits: sellerRating < 562 to the right, improve=16.281240, (0 missing) Category splits as LLRRRLRRLLRRRR-LRR, improve= 8.215652, (0 missing) ClosePrice < 39.5 to the left, improve= 8.203453, (0 missing) Duration splits as RLRRR, improve= 3.410924, (0 missing) currency splits as RRL, improve= 3.262934, (0 missing) Surrogate splits: ClosePrice < 220.5 to the left, agree=0.799, adj=0.137, (0 split) Category splits as LLLRRLLLLLLLLL-LLL, agree=0.777, adj=0.041, (0 split) currency splits as LRL, agree=0.774, adj=0.027, (0 split) OpenPrice < 199.5 to the left, agree=0.774, adj=0.027, (0 split)

Node number 11: 192 observations predicted class=1 expected loss=0.1197917 P(node) =0.1389291 class counts: 23 169 probabilities: 0.120 0.880

Node number 18: 138 observations predicted class=0 expected loss=0.1884058 P(node) =0.09985528 class counts: 112 26 probabilities: 0.812 0.188

Node number 19: 60 observations predicted class=1 expected loss=0 P(node) =0.04341534 class counts: 0 60 probabilities: 0.000 1.000

Node number 20: 241 observations predicted class=0 expected loss=0.2489627 P(node) =0.1743849 class counts: 181 60 probabilities: 0.751 0.249

Node number 21: 73 observations predicted class=1 expected loss=0.369863 P(node) =0.052822 class counts: 27 46 probabilities: 0.370 0.630

The Entrophy of each item is as below:

> entropy(eBay$Category)

[1] 3.630209

> entropy(eBay$currency)

[1] 1.189062

> entropy(eBay$Duration)

[1] 1.832854

> entropy(eBay$endDay)

[1] 2.619409

> entropy(eBay$Competitive)

[1] 0.9952461

> entropy(eBay$sellerRating)

[1] 7.428584

> entropy(eBay$ ClosePrice)

[1] 5.858434

> entropy(eBay$OpenPrice)

[1] 4.371138

How can I calculate the prediction of the model and predict when the item was a computer, and the currency was Euro. The seller rating was 1000, and the duration was one day. Also, the end day was Monday and the closing price was 30. Lastly, the opening price was 4, whether this transaction was competitive or not?