Dataset used:

Task 1:

Dataset Description using Orange tool.
What is need to be done to improve the accuracy of classification result of the given dataset? Get the maximum classification accuracy possible by performing following methods.
o Encoding
o Normalization
o Missing value handling
o Feature Selection

Compare your accuracy with and without applying pre-processing steps. Perform the Classification and visualize accuracy before and after preprocessing in Orange/Python.

Generate the Dashboard of preprocessed dataset from task-1.
Find the Maximum data insights by plotting Bar chart, Boxplot, Pie Plot, Stack Plot using PowerBI dashboard visualization.

Following answers need to be submitted in a single PDF file:
1. Provide a screen shot of data description and explain in brief.
2. Provide screen shot(s) of data pre-processing steps showing its significance.
3. Provide a screen shot showing accuracy before and after pre-processing.
4. Provide a screen shot of PowerBI dashboard with description.


  1. Dataset provided to us is of risk factor of cervical cancer which is hosted originally on, this dataset deals with different types of disease and how are they causes like causes are smokes and their different values, Hormonal Contraceptives, First sexual intercourse, number of sexual partners other are Age, IUD, STD and under STD their are many like disease like HIV, condylotomies, cervical condylotomies, syphilis, genital herpes, HPV, hepatitis B, AIDS, Cancer. some are under DX field.

Now, We will process data with preprocess our data using orange3 tool present. I have created the following data flow which does classification of the project of target variable ‘STD:HIV’ int the dataset using 4 parallel methods.

The dataflow

Here, The data table of given data set without clearing missing values.

Data Table

Here precision of simple method

accuracy score of simple method
select columns
Impute the data
Accuracy using Encoding

2. Now , I will use other path where I am used some preprocessing. The unenclosed area in the data flow in below image.

Here, First we will eradicate the missing values using preprocess widget, next we will send it to data table. Now since there are no missing values we will calculate our precision.

Preprocess fields
Selected column for preprocess method
Accuracy of preprocessing method
Using continues fields
Selected columns
Missing value is not in this method
Accuracy using Normalization method

3. The snapshot is below is shows the difference between the my simple and pre-processing method. Output shows the CA,F1-score, Precision and Recall are better than pre processed data.

Now I am generated a dashboard for the dataset using Power BI. To generate the dashboard we will first make report after that we will publish it to workflow.

The report contains graph of Pie Chart, Scatter Chart, Box Plot, Area chart, Stacked Bar Chart likewise.

Report page
Scatter chart
Bar chart
Pie Chart
Area chart
Box plot

4. After publishing the report to Powe BI, we will login into and then navigate to dashboard where we will see pin the charts to dashboard as below:

Publish Report

That’s all the task I have done.

Thank you



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store