Posts

Final Project: Biodiversity in U.S National Parks

Image
Hi everyone! For my final project, I plan to explore the 2016 National Park Dataset: https://www.kaggle.com/datasets/nationalparkservice/park-biodiversity/data Problem Description: Nonnative species have been identified as a major contributor to biodiversity decline because invasive species are highly adaptive to any environment and can easily outcompete natives. This loss of native species is a particular issue for U.S National Parks that were created to be untouched and protected. According to the National Park Service, they disturb ecological processes, harm ecosystem integrity, degrade natural resources, interfere with visitor experiences in parks, and exacerbate climate change and fragmentation from land use change (NPS, 2021). In order to understand these non-natives, it is important to understand their distribution, where they have been reducing biodiversity, and potential reasons to why. Problem Objectives: Visualize the largest National Parks in size and their locations in...

Module 13: Animations

Image
Hi everyone! This week we learned how we can utilize animation in our R visualizations.  I chose to visualize both the annual average temperatures in the Contiguous U.S and Florida to see how we compare to the mean temperature pattern.  The data source: https://github.com/washingtonpost/data-2C-beyond-the-limit-usa/tree/main    I find this animation interesting because it takes a normal line graph and visually moves each fluctuation in temperature allowing the viewer to see the quick increases in the late 2010s. I had to do a lot of trial-and- error to figure out how I could make both the U.S and Florida animations sync in one graphic, so we can visually see the differences in the temperature trends. This syncing lets the viewer easily see the more stable temperatures of Florida compared to the average U.S temperatures until1980. After 1980, Florida experienced major fluctuations and violet increases in temperature that is not seen as extremely in the mean U.S t...

Module 12: Social Network Analysis

Image
Hi everyone! This week we learned about social network analysis and how we can use ggnet2 in R to accomplish this. I absolutely love the new Netflix show "Wednesday" and the original Addams Family movies, so I decided to analyze which characters in the original movie have the most interactions with the other characters.  My hypothesis:  Wednesday will have the most interactions since she is seen as the main character. The dataset site:  https://moviegalaxies.com/movies/view/26/the-addams-family/#  I imported the json file with the nodes and edges. I set the ggnet2 parameters to vary the size of the node based on the "indegree" or amount of interactions that character has overall. Result: From the visualization, we can see Granny has the most connections (9) and Wednesday has very few connections.  I enjoyed using ggnet2 to create this social network. The biggest challenge of creating this social network was finding a dataset for the visualization and customizing...

Module 11: Tufte Visualizations

Image
Hi everyone! This week we learned about Dr. Tufte and his visualization methodologies.  Tufte explains the fundamental principles of analytical design to be showing comparisons, causality, multiple variables at once, visual evidence through modes like color, documentation of data, and the actual content. I found his discussion on not having pre-specified ideas for visualizing data to be incredibly insightful because we should be choosing our visualizations based on what the analysis of data shows us instead of forcing a visualization type for a dataset. I chose to recreate Dr. Piwek's marginal histogram scatter plot based on the faithful eruptions dataset. I found ggMarginal to be a super cool function that allowed me to bring two types of graphs into one. The histograms on top of the scatterplot make it easier to see shorter and longer waiting times have larger eruptions rather than the medium waiting times.  I enhanced Dr. Piwek's graph by adding a fill color of red that com...

Module 10: Improving Visualizations

Image
Hi everyone! This week we learned about time series analysis and visualizations. Time series models help us see trends over time such as Minard's Napoleon March visualization that is able to show the size of the army, location, and route. Nathan's Hot Dog Contest Visualization I decided to improve the visualization above of Nathan's Hot Dog Eating Contest Results to include more information about who won each win and clarify when new records were set. The number of hotdogs eaten in each record win is shown in the bar and the red diamond on top indicates a record winner.  I utilized the 5 principles of visualization from last week to enhance the alignment and repetition through new records shown as red diamonds, contrast through the different colored countries, and proximity and balance through the easily comparable bar heights. Findings:  The colors for each country add additional information on which country won that allows us to see Japan had the most record breaking wins...

Module 9: Multivariates

Image
Hi everyone! This week we learned about multivariate visualizations and the five principles of design. We explored corrgrams to visually see correlations, heat maps to capture trends, and multivariate scatterplots to understand relationships. The five principles: (1) Alignment - create a sharper and ordered design (2) Repetition - create association and consistency (3) Contrast - emphasize or highlight key elements (4) Proximity - group or visual connection between elements (5) Balance - symmetrical (equal weight) or asymmetrical (contrast) I utilized the built-in R dataset airquality that has a record of ozone, temperature, solar radiation, and wind measurements for multiple dates in 1973 New York. Purpose : What is the relationship between Ozone, temperature, solar radiation, and wind measurements in 1973 New York? I began with a corrgram to gain a general understanding of the strength of relations between these variables. This design is strong in contrast through the red and blue s...

Module 8: Correlation and ggplot2

Image
Hi everyone! This week we learned about correlation analysis and how to utilize the ggplot2 package to visualize correlation.  A great method to visualize correlation or relationships between variables is a scatterplot with a line of best fit. Within the mtcars dataset, I wanted to gain a better understanding of how each variable correlated against each other to find a relationship between variables I find the most interesting to visualize. I used the corrplot package to generate this initial visualization. From this plot, we can see there is a strong negative correlation (dark red) around 0.7 for horsepower (hp) and acceleration (qsec). I thought this was unusual because a car with higher horsepower would make sense to accelerate faster since it generates more power. To explore this further, I created a scatterplot with a line of best fit. Based on this scatterplot, it is clear that increasing horsepower relates to a lower acceleration by the car. However, this graphic explains th...