Sitemap

practical_exam_work_18IT037

4 min readNov 18, 2021

Task:

Dataset used: https://archive.ics.uci.edu/ml/machine-learning-databases/00383/

Task 1:

Dataset Description using Orange tool.
What is need to be done to improve the accuracy of classification result of the given dataset? Get the maximum classification accuracy possible by performing following methods.
→Pre-processing
o Encoding
o Normalization
o Missing value handling
o Feature Selection

Compare your accuracy with and without applying pre-processing steps. Perform the Classification and visualize accuracy before and after preprocessing in Orange/Python.

Task-2:
Generate the Dashboard of preprocessed dataset from task-1.
Find the Maximum data insights by plotting Bar chart, Boxplot, Pie Plot, Stack Plot using PowerBI dashboard visualization.

Following answers need to be submitted in a single PDF file:
1. Provide a screen shot of data description and explain in brief.
2. Provide screen shot(s) of data pre-processing steps showing its significance.
3. Provide a screen shot showing accuracy before and after pre-processing.
4. Provide a screen shot of PowerBI dashboard with description.

SOLUTION

  1. Dataset provided to us is of risk factor of cervical cancer which is hosted originally on https://archive.ics.uci.edu/ml/machine-learning-databases/00383/, this dataset deals with different types of disease and how are they causes like causes are smokes and their different values, Hormonal Contraceptives, First sexual intercourse, number of sexual partners other are Age, IUD, STD and under STD their are many like disease like HIV, condylotomies, cervical condylotomies, syphilis, genital herpes, HPV, hepatitis B, AIDS, Cancer. some are under DX field.

Now, We will process data with preprocess our data using orange3 tool present. I have created the following data flow which does classification of the project of target variable ‘STD:HIV’ int the dataset using 4 parallel methods.

Press enter or click to view image in full size
The dataflow

Here, The data table of given data set without clearing missing values.

Press enter or click to view image in full size
Data Table

Here precision of simple method

Press enter or click to view image in full size
accuracy score of simple method
Press enter or click to view image in full size
select columns
Press enter or click to view image in full size
Impute the data
Press enter or click to view image in full size
Accuracy using Encoding

2. Now , I will use other path where I am used some preprocessing. The unenclosed area in the data flow in below image.

Here, First we will eradicate the missing values using preprocess widget, next we will send it to data table. Now since there are no missing values we will calculate our precision.

Press enter or click to view image in full size
Preprocess fields
Press enter or click to view image in full size
Selected column for preprocess method
Press enter or click to view image in full size
Accuracy of preprocessing method
Press enter or click to view image in full size
Using continues fields
Press enter or click to view image in full size
Selected columns
Press enter or click to view image in full size
Missing value is not in this method
Press enter or click to view image in full size
Accuracy using Normalization method

3. The snapshot is below is shows the difference between the my simple and pre-processing method. Output shows the CA,F1-score, Precision and Recall are better than pre processed data.

Press enter or click to view image in full size

Now I am generated a dashboard for the dataset using Power BI. To generate the dashboard we will first make report after that we will publish it to workflow.

The report contains graph of Pie Chart, Scatter Chart, Box Plot, Area chart, Stacked Bar Chart likewise.

Press enter or click to view image in full size
Report page
Press enter or click to view image in full size
Scatter chart
Press enter or click to view image in full size
Bar chart
Press enter or click to view image in full size
Pie Chart
Press enter or click to view image in full size
Area chart
Press enter or click to view image in full size
Box plot
Press enter or click to view image in full size

4. After publishing the report to Powe BI, we will login into app.powebi.com and then navigate to dashboard where we will see pin the charts to dashboard as below:

Press enter or click to view image in full size
Publish Report

That’s all the task I have done.

Thank you

--

--

Gohil Rushabh Navinchandra
Gohil Rushabh Navinchandra

Written by Gohil Rushabh Navinchandra

Hello, Everyone I am IT Student from Charusat University.

No responses yet