Module 8: Correlation and ggplot2

Hi everyone!

This week we learned about correlation analysis and how to utilize the ggplot2 package to visualize correlation. A great method to visualize correlation or relationships between variables is a scatterplot with a line of best fit.

Within the mtcars dataset, I wanted to gain a better understanding of how each variable correlated against each other to find a relationship between variables I find the most interesting to visualize. I used the corrplot package to generate this initial visualization.


From this plot, we can see there is a strong negative correlation (dark red) around 0.7 for horsepower (hp) and acceleration (qsec). I thought this was unusual because a car with higher horsepower would make sense to accelerate faster since it generates more power. To explore this further, I created a scatterplot with a line of best fit.

Based on this scatterplot, it is clear that increasing horsepower relates to a lower acceleration by the car. However, this graphic explains this phenomenon is more apparent within the limited range of 100 to 200 hp and 15 to 20 second accelerations. This suggests other variables should be looked into as they may be contributing to this relationship between this interval.

Few (2009) makes several recommendations for correlation visualizations including the addition of a regression line, significance of outliers weakening correlation, and avoiding the cluttering of points. I took his suggestion and added a bright red regression line to indicate the negative relationship between horsepower for a quick understanding. I can see one major outlier towards the top right where horsepower is low but the acceleration is high. However, this point does not seem to skew the regression line heavily since most of the data points are close to the regression line. I agree with minimizing cluster in points for readability. There is no evident overlapping of points in this scatterplot, so I did not have to make any additional changes to work with Few's recommendation.

Check the code out in Github!

-Ramya's POV

Comments

Popular posts from this blog

Final Project: Biodiversity in U.S National Parks

Module 12: Social Network Analysis