The purpose of this project is to familiarize you with basic concepts of statistics and data
analysis. The exercise will help you develop an understanding of data analysis through
data visualization and data summaries (descriptive statistics).
You will conduct a basic analysis on a provided dataset to identify and understand
distributions of variables and their effect on death events. The dataset that you will
analyze includes a collection of variables that potentially affect/lead to death of the
patient. Using Microsoft Excel, you will analyze and visualize the data and include your
results in a report, providing appropriate explanation where applicable.
Follow the steps below to conduct your analysis and compose your report. Be sure to
include charts you produce in Excel in your report.
Directions
1. Identify variables according to their data types (Numerical or categorical)? (4
points)
2. Calculate the minimum, maximum, mean, mode, standard deviation and variance
for all numerical variables. (4 points)
a. Plot the frequency histogram to show the distribution of variables. (4
points)
b. Describe what conclusions you can make from the histograms and
statistics. (4 points)
3. Identify all categories of all categorical variables.(4 points)
a. Plot (use bar graph) distribution of categories (the count of each category)
for all categorical variables. Comment on comparisons. (4 points)
b. Identify the leading cause of death. (Hint: Plot and compare the number of
deaths caused by diabetes, high blood pressure, smoking habit). (4
points)
MTH 315 – Project 1 Instructions
4. Does age/ sex have any influence in cause of death for this dataset? Explain. (4
points)
5. Compare the distributions of each numerical variable in the events of death. Use
appropriate graphs. (Hint: For example, compare distribution of number of
platelets when the patient either died or stayed alive). (4 points)
The post The minimum, maximum, mean, mode, standard deviation and variance first appeared on COMPLIANT PAPERS.