Perhaps one of the business areas that faces the greatest risk each day is the lending industry. Banks, mortgage companies, and other types of lenders face one specific risk many times every day: Are they going to be paid back when they make a loan? Organizations that make their money by lending money must be able to anticipate risk and predict the likelihood that they will be paid back, with interest, or else their business model will fail and they will have to close their doors. In this Assignment, you will use R with two data sets to predict the risk of loan default for a lender, and then report and explain your results.
Assignment Instructions
Complete the following steps:
Using the university’s online Library and Internet resources, research the lending industry. In a Word document, prepare a risk management plan outline for loan default risk faced by lenders. Include all five parts of risk management planning: Identification, Understanding, Data Preparation, Modeling and Application. Cite all sources used to prepare your risk management plan.
Download the Loans.csv and Applicants.csv files. Import both of these as data frames into RStudio. Give each a descriptive name. Show this in your Word document.
Using the Loans.csv file, build a logistic regression model to predict the “Good Risk” dependent variable (use family=binomial() in the glm function in R). In this column, ‘1’ indicates that making the loan is a good risk for the lender; ‘0’ indicates that making the loan is a bad risk. Make sure that you do not use the Applicant ID as an independent variable! You will need to load the MASS package in R by issuing library(MASS), before using the glm function to build your model. Show the creation of the model in your Word document.
In your Word document, document your logistic model’s output, and specifically explain which independent variables have the most predictive power and which have the least. Make sure you identify how you know, and explain why it matters.
Apply your logistic regression model to the data in Applicants.csv to generate predictions of “Good Risk” for each loan applicant. If your glm model is stored in an R object called ‘LoanModel’, for example, and your Applicants.csv data is in a frame called ‘Appl’, then you would issue a command that looks like this: LoanPredictions <- predict(LoanModel, Appl, type=“response”). Document the application of your model to the Applications data in your Word document.
In your Word document, interpret your predictions for the Applicants.csv data. Specifically address the following:
How many loans do you predict to be a good risk for the lender?
How many are predicted to be a bad risk?
What are your highest and lowest post-probability percentages for predictions?
How many loans have at least a 75% post-probability percentage and what does that mean for the lender?
How many loans have less than a 25% post-probability percentage and what does that mean for the lender?
Suppose that the lender is willing to accept a little higher risk and has decided they will make loans to applicants who have post-probability percentages between 40% and 65%. List two things the lender could do to mitigate risk when lending to this group, and explain how these will help.
Make sure that you cite at least five supporting sources beyond the textbook in support of your writing and explanations. Cite correctly in APA format.
Assignment Requirements
Prepare your Assignment submission in Microsoft Word following standard APA formatting guidelines: Double spaced, Times New Roman 12-point font, one inch margins on all sides. Include a title page, table of contents and references page. You do not need to write an abstract. Label all tables and figures. Cite sources appropriately both in the text of your writing (parenthetical citations) and on your references page (full APA citation format).
For more information on APA style formatting, refer to the resources in the Academic Tools section of this course.
Applicant ID
Number of Missed/Late Payments
Lines of Credit
Credit Score
Monthly Income
Age at First Credit
Age in Years
Marital Status
250162
13
5
511
3014
27
31
2
337157
22
4
495
2012
25
34
1
696961
7
5
641
3382
27
38
1
102576
6
6
748
3865
22
33
1
399338
6
7
799
3774
21
44
2
916894
28
3
519
3004
25
27
3
332229
9
7
693
3966
23
38
1
591594
22
3
515
2158
24
39
1
988822
5
6
811
4562
26
38
1
990531
0
6
709
4780
19
38
2
302120
18
3
491
1797
24
38
2
851836
3
8
789
4758
20
39
1
465514
0
6
772
4894
23
34
1
203291
8
7
810
4329
21
39
1
183488
25
4
491
1913
21
39
3
528534
28
5
499
2075
25
28
3
260650
0
8
709
4744
21
37
1
963949
8
5
641
3250
24
39
1
455615
10
3
541
2491
17
35
2
432768
14
7
560
2744
20
30
1
673501
25
5
497
2159
19
34
1
334354
1
8
711
4699
19
43
1
450082
22
3
515
2243
24
30
1
799506
16
4
548
2363
22
44
1
839577
4
8
795
4357
25
36
1
630035
5
6
619
3402
26
43
2
174765
5
6
629
3985
22
40
1
480448
3
7
808
4657
24
38
1
605712
4
6
813
4343
24
43
2
510435
27
3
549
2322
20
28
1
587635
4
7
836
4940
21
39
2
616259
7
7
709
3807
23
34
2
471782
12
5
622
3434
27
33
2
793010
12
3
532
3208
27
34
2
597727
10
6
804
4734
23
40
1
373615
1
6
833
4687
23
33
2
906102
4
7
796
5132
22
44
3
800324
2
7
735
3925
21
36
1
164313
12
4
646
3087
18
35
1
533675
1
7
803
4961
22
44
1
958620
11
6
608
3058
18
33
1
807605
28
4
507
2128
19
30
1
775431
4
7
711
3705
23
37
3
547205
0
8
790
5013
22
35
2
888942
10
6
701
3817
27
27
2
394878
12
4
532
2662
16
36
3
207967
19
5
516
2036
24
40
2
492165
28
3
487
2133
23
28
3
492918
6
5
626
3828
23
35
1
882604
12
6
619
3491
21
29
2
468202
3
8
785
3798
25
42
1
240088
5
6
660
4439
17
43
2
890107
29
5
474
1670
24
44
3
668821
3
6
826
4318
18
42
1
566858
4
5
731
4792
21
40
2
851205
14
4
523
2989
27
31
3
452357
6
7
806
4382
20
31
2
568492
6
6
565
3178
19
41
1
752701
8
8
694
3910
22
34
3
870175
20
4
476
2036
24
37
3
889868
7
7
787
4177
24
33
2
280600
3
7
832
4345
20
42
2
383243
17
5
543
2673
20
40
2
247490
12
4
600
2990
19
35
1
189360
6
8
789
3564
19
43
3
347183
11
3
528
2995
19
35
3
337498
12
5
530
2943
21
44
3
895487
13
4
620
3393
16
36
3
245112
20
5
517
2189
25
36
1
834030
3
8
803
4451
21
43
2
231762
23
3
482
1849
23
44
2
505225
1
6
751
4775
26
37
1
525546
4
8
770
4587
27
35
2
561397
15
6
582
2866
26
26
3
378241
3
8
704
4619
24
40
2
333250
4
7
765
4535
27
41
2
109378
19
4
509
1982
23
35
2
566994
10
3
473
2354
26
42
2
849787
26
4
505
2218
21
26
1
688110
6
6
615
3350
23
41
2
900707
12
7
621
3604
26
32
1
521465
16
3
522
2343
25
36
2
,
Applicant ID
Number of Missed/Late Payments
Lines of Credit
Credit Score
Monthly Income
Age at First Credit
Age in Years
Marital Status
Good Risk
701445
18
4
543
2562
20
32
1
0
838181
0
8
707
4731
16
40
1
1
611138
11
4
538
2410
20
36
1
0
467118
13
6
543
2816
24
35
3
0
870643
12
4
537
2517
23
36
3
0
456293
4
8
800
5142
20
41
1
1
331236
5
8
720
4098
18
43
1
1
164077
21
4
498
2155
14
32
1
0
162443
6
6
658
3730
19
33
1
1
525891
6
8
715
4138
23
42
1
1
561710
24
4
475
1983
16
35
3
0
824683
23
4
499
2044
17
37
3
0
723682
32
4
484
1834
20
33
2
0
325387
18
4
538
3188
22
29
2
0
278317
15
6
570
2724
19
33
1
0
546865
6
8
751
4082
22
39
1
1
612359
23
4
488
1992
19
35
3
0
687886
1
8
761
4616
25
40
1
1
163628
21
4
513
2155
21
37
1
0
542030
17
4
506
2391
25
35
3
0
968465
17
3
498
2263
21
36
3
0
185087
25
4
492
1988
21
31
3
0
846310
19
4
488
2126
25
35
1
0
796712
11
6
599
2989
18
35
1
0
387895
35
4
492
2088
23
28
2
0
717829
3
8
747
4497
24
40
1
1
902524
22
4
491
1969
19
29
3
0
618661
9
6
648
3457
25
34
1
1
321583
29
6
660
3013
15
31
1
1
934822
25
4
511
2245
18
32
3
0
410612
9
8
718
3831
22
32
1
1
775575
3
8
783
4957
19
37
<
The post Perhaps one of the business areas that faces the greatest r first appeared on Writeden.