Dataset used: https://archive.ics.uci.edu/ml/machine-learning-databases/00383/
Dataset Description using Orange tool.
What is need to be done to improve the accuracy of classification result of the given dataset? Get the maximum classification accuracy possible by performing following methods.
o Missing value handling
o Feature Selection
Compare your accuracy with and without applying pre-processing steps. Perform the Classification and visualize accuracy before and after preprocessing in Orange/Python.
Generate the Dashboard of preprocessed dataset from task-1.
Find the Maximum data insights by plotting Bar chart, Boxplot, Pie Plot, Stack Plot using PowerBI dashboard visualization.
Following answers need to be submitted in a single PDF file:
1. Provide a screen shot of data description and explain in brief.
2. Provide screen shot(s) of data pre-processing steps showing its significance.
3. Provide a screen shot showing accuracy before and after pre-processing.
4. Provide a screen shot of PowerBI dashboard with description.
- Dataset provided to us is of risk factor of cervical cancer which is hosted originally on https://archive.ics.uci.edu/ml/machine-learning-databases/00383/, this dataset deals with different types of disease and how are they causes like causes are smokes and their different values, Hormonal Contraceptives, First sexual intercourse, number of sexual partners other are Age, IUD, STD and under STD their are many like disease like HIV, condylotomies, cervical condylotomies, syphilis, genital herpes, HPV, hepatitis B, AIDS, Cancer. some are under DX field.
Now, We will process data with preprocess our data using orange3 tool present. I have created the following data flow which does classification of the project of target variable ‘STD:HIV’ int the dataset using 4 parallel methods.
Here, The data table of given data set without clearing missing values.
Here precision of simple method
2. Now , I will use other path where I am used some preprocessing. The unenclosed area in the data flow in below image.
Here, First we will eradicate the missing values using preprocess widget, next we will send it to data table. Now since there are no missing values we will calculate our precision.
3. The snapshot is below is shows the difference between the my simple and pre-processing method. Output shows the CA,F1-score, Precision and Recall are better than pre processed data.
Now I am generated a dashboard for the dataset using Power BI. To generate the dashboard we will first make report after that we will publish it to workflow.
The report contains graph of Pie Chart, Scatter Chart, Box Plot, Area chart, Stacked Bar Chart likewise.
4. After publishing the report to Powe BI, we will login into app.powebi.com and then navigate to dashboard where we will see pin the charts to dashboard as below:
That’s all the task I have done.