Step by step decription

Exploratory Data Analysis with Iris Dataset

Exploratory data analysis could be described as our efforts to understand and get familiarized with our data before further engaged with it in our analysis. It usually involves, checking the data, counting and calculating frequencies and ranges, tabulating and visualizing. Visualization is particularly important as it helps us to discover more features and help seeing what is available and missing.

In our analysis we will use the Iris dataset which is a very famous dataset among the learners of data analysis.

Iris Dataset

Iris

In this dataset there 150 instances of data and it is good example of how data analysis could be used for classification tasks. With using the information abut the petal and sepal length and width we can classify the Iris flowers to varieties.

a2

Overview of the Data

The first step in our analysis is examining the data with tabulating. This will allow us to observe basic characteristics. We can learn a lot about our data with just looking at the briefly to a few lines.

Overview of the Data - Counting and Visualizing

A bar chart showing data points per Iris type 0 10 20 30 40 50 60 Data points per Iris type

Counting and visualizing the data will tell us how many observations we have from each group. This interactive bar graph allows also sharing more information about data. Our data says we have 50 observations from each variety.

Overview of the Data - Visualizing According to the Sepal Length and Width

A scatterplot showing the sepal length and width relationship 1 2 3 4 5 6 7 8 Sepal Length 4 3 2 1 0 Sepal Width

Visualization of the sepal length with a scatter plot graph tells us we are on track. We observe some separation of the varieties according to sepal length and width. Especially the "setosa" variety could be easily classified. However the other two varieties are a little bit mixed. To learn more about the data hover over the data points.

Overview of the Data - Visualizing According to the Petal Length and Width

A scatter plot showing the petal length and width relationship 1 2 3 4 5 6 7 Petal Length 3 2 1 0 Petal Width

Voila ! The varieties can be easily identified according to their petal lengths. Our data exploration helped us to identify the important feature.

Special Thanks to

Haluk Bingol, - navigation was a life saver.

Fellows and friends who generously answered my questions even at very late hours.

Indian youtubers

Some sources I benefited from

Iris dataset

For the idea of Creating a hexadecimal colour based on a string with JavaScript

Converting CSV to Json in Javascript

How to Make Charts with SVG

Photo - Machine Learning in R for beginners