Exploratory Data Analysis with Iris Dataset

Exploratory data analysis could be described as our efforts to understand and get familiarized with our data before further engaged with it in our analysis. It usually involves, checking the data, counting and calculating frequencies and ranges, tabulating and visualizing. Visualization is particularly important as it helps us to discover more features and help seeing what is available and missing.

In our analysis we will use the Iris dataset which is a very famous dataset among the learners of data analysis.

Iris Dataset

In this dataset there 150 instances of data and it is good example of how data analysis could be used for classification tasks. With using the information abut the petal and sepal length and width we can classify the Iris flowers to varieties.

a2

Overview of the Data

The first step in our analysis is examining the data with tabulating. This will allow us to observe basic characteristics. We can learn a lot about our data with just looking at the briefly to a few lines.

Overview of the Data - Counting and Visualizing

Counting and visualizing the data will tell us how many observations we have from each group. This interactive bar graph allows also sharing more information about data. Our data says we have 50 observations from each variety.

Overview of the Data - Visualizing According to the Sepal Length and Width

Visualization of the sepal length with a scatter plot graph tells us we are on track. We observe some separation of the varieties according to sepal length and width. Especially the "setosa" variety could be easily classified. However the other two varieties are a little bit mixed. To learn more about the data hover over the data points.