Applied Machine Learning Project – part 2
- Apply machine learning methods using a software package, including understand and revise existing machine learning programs from a third party.
- Evaluate and compare different machine learning methods for a given problem experimentally and select the appropriate methods(s) considering various assessment criteria.
- Display the results of machine learning methods and propose appropriate improvements to methods.
Submission should 1500 words in length (+/- 10%).
Assessment task details – provide a description of the Task:
This assignment follows on from assignment 1, in which you selected a suitable open-source dataset and identified what ML techniques to apply on it.
Task
In assignment 1, you selected an open-source research dataset and identified what Machine Learning techniques could be conducted on that dataset. In this applied project, you will be conducting the actual analysis. Your tasks are to carry out the following:
- Build and train classification and/or regression models from the dataset in any suitable programming environment of your choosing (for e.g., Python) using three machine learning techniques of your choice.
- Compare and contrast the performance of the three machine learning techniques in terms of prediction and other metrics such as validation accuracy, training time, prediction speed, R-squared values, MSE values and transparency (as may be applicable).
This assignment follows on from assignment 1, in which you selected a suitable open-source dataset and identified what ML techniques to apply on it.
Task
In assignment 1, you selected an open-source research dataset and identified what Machine Learning techniques could be conducted on that dataset. In this applied project, you will be conducting the actual analysis. Your tasks are to carry out the following:
- Build and train classification and/or regression models from the dataset in any suitable programming environment of your choosing (for e.g., Python) using three machine learning techniques of your choice.
- Compare and contrast the performance of the three machine learning techniques in terms of prediction and other metrics such as validation accuracy, training time, prediction speed, R-squared values, MSE values and transparency (as may be applicable).
- Analyse the error matrices, the ROCs (and AUCs) for all three methods (as may be applicable).
- Comment on how the hyperparameters (if any) are tuned or optimized (if applicable) to enhance the built/trained models.
- Submit a report showing the work carried out.
For datasets without labels or classes or categories, you can generate suitable labels or classes or categories using conventional methods if you think that is useful. For example, in a credit card scoring dataset, a 24-year-old male who rents and has a large unpaid credit amount on his car, with little money in his checking and savings accounts may be considered to have a “high risk” of defaulting on any additional credit.
For this assignment, Python is the preferred programming language. You can build on sample code provided in the recommended Python book: Wei Meng-Lee’s Python Machine Learning.
The code you use must be submitted as part of the assignment.
Please refer to the below published papers to have an analytical and methodical understanding of how machine learning techniques or algorithms (classification methods ONLY) can be evaluated and compared for a given machine learning problem or task:
A. O. Sangodoyin, M. O. Akinsolu, P. Pillai and V. Grout, "Detection and Classification of DDoS Flooding Attacks on Software-Defined Networks: A Case Study for the Application of Machine Learning," in IEEE Access, vol. 9, pp. 122495-122508, 2021, doi: 10.1109/ACCESS.2021.3109490
Please refer to this published paper to have an analytical and methodical understanding of how linear regression can be applied for a given machine learning problem or task:
• Akinsolu, Mobayode O., Sangodoyin, Abimbola O. and Uyoata, Uyoata E. (2022) Behavioral Study of Software-Defined Network Parameters Using Exploratory Data Analysis and Regression-Based Sensitivity Analysis. Mathematics, 10 (14). p. 2536. ISSN 2227-7390. https://www.mdpi.com/2227-7390/10/14/2536
Submission instructions – What should be the format of the submission? / Where should it be submitted?
A Word or PDF document only (only one document to be submitted), containing your assignment and a reference list. Any additional material should be put in an appendix at the end of the report. References should be presented using the IEEE referencing format.
All submitted work is expected to observe academic standards in terms of referencing, academic writing, use of language etc. Failure to adhere to these instructions may result in your work being awarded a lower grade than it would otherwise deserve.
All references must be presented using IEEE formatting.