Module 8: Correlation and ggplot2
Hi everyone!
This week we learned about correlation analysis and how to utilize the ggplot2 package to visualize correlation. A great method to visualize correlation or relationships between variables is a scatterplot with a line of best fit.
Within the mtcars dataset, I wanted to gain a better understanding of how each variable correlated against each other to find a relationship between variables I find the most interesting to visualize. I used the corrplot package to generate this initial visualization.
Few (2009) makes several recommendations for correlation visualizations including the addition of a regression line, significance of outliers weakening correlation, and avoiding the cluttering of points. I took his suggestion and added a bright red regression line to indicate the negative relationship between horsepower for a quick understanding. I can see one major outlier towards the top right where horsepower is low but the acceleration is high. However, this point does not seem to skew the regression line heavily since most of the data points are close to the regression line. I agree with minimizing cluster in points for readability. There is no evident overlapping of points in this scatterplot, so I did not have to make any additional changes to work with Few's recommendation.
Check the code out in Github!
-Ramya's POV


Comments
Post a Comment