Exploratory data analysis could be described as our efforts to understand and get familiarized with our data before further engaged with it in our analysis. It usually involves, checking the data, counting and calculating frequencies and ranges, tabulating and visualizing. Visualization is particularly important as it helps us to discover more features and help seeing what is available and missing.
In our analysis we will use the Iris dataset which is a very famous dataset among the learners of data analysis.
In this dataset there 150 instances of data and it is good example of how data analysis could be used for classification tasks. With using the information abut the petal and sepal length and width we can classify the Iris flowers to varieties.
a2
The first step in our analysis is examining the data with tabulating. This will allow us to observe basic characteristics. We can learn a lot about our data with just looking at the briefly to a few lines.
Counting and visualizing the data will tell us how many observations we have from each group. This interactive bar graph allows also sharing more information about data. Our data says we have 50 observations from each variety.
Visualization of the sepal length with a scatter plot graph tells us we are on track. We observe some separation of the varieties according to sepal length and width. Especially the "setosa" variety could be easily classified. However the other two varieties are a little bit mixed. To learn more about the data hover over the data points.
Voila ! The varieties can be easily identified according to their petal lengths. Our data exploration helped us to identify the important feature.
Haluk Bingol, - navigation was a life saver.
Fellows and friends who generously answered my questions even at very late hours.
Indian youtubers
Some sources I benefited from
For the idea of Creating a hexadecimal colour based on a string with JavaScript