How To Add Ggplot To R
Learning Objectives
- Produce scatter plots, boxplots, and time series plots using ggplot.
- Set universal plot settings.
- Describe what faceting is and employ faceting in ggplot.
- Alter the aesthetics of an existing ggplot plot (including axis labels and color).
- Build complex and customized plots from data in a data frame.
We get-go by loading the required packages. ggplot2
is included in the tidyverse
package.
If non still in the workspace, load the data we saved in the previous lesson.
Plotting with ggplot2
ggplot2
is a plotting package that provides helpful commands to create complex plots from data in a data frame. Information technology provides a more than programmatic interface for specifying what variables to plot, how they are displayed, and general visual properties. Therefore, we but demand minimal changes if the underlying information modify or if we determine to alter from a bar plot to a scatterplot. This helps in creating publication quality plots with minimal amounts of adjustments and tweaking.
ggplot2
refers to the name of the package itself. When using the package we use the part ggplot()
to generate the plots, so references to using the function will exist referred to as ggplot()
and the package as a whole as ggplot2
ggplot2
plots piece of work best with data in the 'long' format, i.e., a cavalcade for every variable, and a row for every observation. Well-structured data will save you lots of time when making figures with ggplot2
ggplot graphics are built layer by layer by adding new elements. Adding layers in this fashion allows for all-encompassing flexibility and customization of plots.
To build a ggplot, we will use the following basic template that can be used for different types of plots:
ggplot(data = <DATA>, mapping = aes(<MAPPINGS>)) + <GEOM_FUNCTION>()
- apply the
ggplot()
part and bind the plot to a specific data frame using thedata
argument
- ascertain an aesthetic mapping (using the aesthetic (
aes
) function), by selecting the variables to be plotted and specifying how to present them in the graph, e.g., as ten/y positions or characteristics such every bit size, shape, color, etc.
-
add together 'geoms' – graphical representations of the data in the plot (points, lines, bars).
ggplot2
offers many unlike geoms; we will use some common ones today, including:-
geom_point()
for scatter plots, dot plots, etc. -
geom_boxplot()
for, well, boxplots! -
geom_line()
for trend lines, time series, etc.
-
To add a geom to the plot use +
operator. Because nosotros have two continuous variables, let's utilise geom_point()
beginning:
The +
in the ggplot2
package is especially useful because it allows you to modify existing ggplot
objects. This means y'all can hands set plot "templates" and conveniently explore different types of plots, and so the above plot can also be generated with code similar this:
Notes
- Anything you put in the
ggplot()
office tin be seen past whatever geom layers that y'all add (i.eastward., these are universal plot settings). This includes the x- and y-centrality you set upwardly inaes()
. - You tin besides specify aesthetics for a given geom independently of the aesthetics defined globally in the
ggplot()
function. - The
+
sign used to add layers must be placed at the stop of each line containing a layer. If, instead, the+
sign is added in the line before the other layer,ggplot2
will non add the new layer and volition return an mistake message. - Yous may notice that nosotros sometimes reference 'ggplot2' and sometimes 'ggplot'. To clarify, 'ggplot2' is the name of the nearly contempo version of the parcel. However, whatever time we call the role itself, it'southward just chosen 'ggplot'.
- The previous version of the
ggplot2
packet, calledggplot
, which also independent theggplot()
function is now unsupported and has been removed from CRAN in gild to reduce accidental installations and farther confusion.
Claiming (optional)
Scatter plots can exist useful exploratory tools for small-scale datasets. For data sets with big numbers of observations, such as the
surveys_complete
data set, overplotting of points can be a limitation of besprinkle plots. One strategy for treatment such settings is to employ hexagonal binning of observations. The plot space is tessellated into hexagons. Each hexagon is assigned a color based on the number of observations that fall within its boundaries. To use hexagonal binning withggplot2
, showtime install the R packagehexbin
from CRAN:Then use the
geom_hex()
function:
- What are the relative strengths and weaknesses of a hexagonal bin plot compared to a besprinkle plot? Examine the above scatter plot and compare it with the hexagonal bin plot that you created.
Edifice your plots iteratively
Building plots with ggplot2
is typically an iterative procedure. We start past defining the dataset nosotros'll use, lay out the axes, and choose a geom:
Then, nosotros start modifying this plot to extract more information from it. For instance, nosotros can add transparency (alpha
) to avoid overplotting:
We can besides add colors for all the points:
Or to color each species in the plot differently, you could use a vector as an input to the argument color. ggplot2
volition provide a different color respective to unlike values in the vector. Hither is an example where we color with species_id
:
Claiming
Use what yous simply learned to create a scatter plot of
weight
overspecies_id
with the plot types showing in unlike colors. Is this a skillful way to evidence this type of information?Answer
Boxplot
We can employ boxplots to visualize the distribution of weight within each species:
By adding points to the boxplot, we can have a better thought of the number of measurements and of their distribution:
Observe how the boxplot layer is backside the jitter layer? What do you need to change in the lawmaking to put the boxplot in front of the points such that information technology'south not hidden?
Challenges
Boxplots are useful summaries, but hide the shape of the distribution. For instance, if at that place is a bimodal distribution, it would not be observed with a boxplot. An culling to the boxplot is the violin plot (sometimes known as a beanplot), where the shape (of the density of points) is drawn.
- Supplant the box plot with a violin plot; run across
geom_violin()
.In many types of data, it is important to consider the scale of the observations. For example, it may be worth changing the scale of the centrality to better distribute the observations in the space of the plot. Irresolute the scale of the axes is done similarly to adding/modifying other components (i.e., by incrementally adding commands). Try making these modifications:
- Represent weight on the logten scale; see
scale_y_log10()
.So far, we've looked at the distribution of weight within species. Try making a new plot to explore the distribution of another variable within each species.
Create boxplot for
hindfoot_length
. Overlay the boxplot layer on a jitter layer to show actual measurements.Add color to the information points on your boxplot according to the plot from which the sample was taken (
plot_id
).
Hint: Cheque the grade for
plot_id
. Consider changing the form ofplot_id
from integer to factor. Why does this change how R makes the graph?
Plotting fourth dimension series data
Allow'due south calculate number of counts per year for each genus. First we need to group the information and count records inside each group:
Timelapse data can exist visualized as a line plot with years on the 10-axis and counts on the y-axis:
Unfortunately, this does non piece of work because nosotros plotted information for all the genera together. Nosotros need to tell ggplot to draw a line for each genus past modifying the aesthetic role to include group = genus
:
We volition be able to distinguish genera in the plot if we add colors (using color
also automatically groups the data):
Integrating the pipe operator with ggplot2
In the previous lesson, we saw how to use the pipage operator %>%
to use dissimilar functions in a sequence and create a coherent workflow. We can also utilise the pipe operator to laissez passer the data
argument to the ggplot()
office. The hard part is to remember that to build your ggplot, yous need to use +
and not %>%
.
The piping operator tin besides be used to link data manipulation with consequent data visualization.
Faceting
ggplot
has a special technique called faceting that allows the user to split one plot into multiple plots based on a factor included in the dataset. We will utilise it to make a time series plot for each genus:
At present nosotros would like to split up the line in each plot past the sexual activity of each individual measured. To do that nosotros need to make counts in the data frame grouped past year
, genus
, and sexual activity
:
We can now brand the faceted plot by splitting further by sex using color
(inside a single plot):
We can besides facet both by sex and genus:
You can besides organise the panels merely by rows (or simply past columns):
Notation: ggplot2
before version 3.0.0 used formulas to specify how plots are faceted. If you see facet_grid
/wrap(...)
code containing ~
, please read https://ggplot2.tidyverse.org/news/#tidy-evaluation.
ggplot2
themes
Usually plots with white background expect more readable when printed. Every single component of a ggplot
graph tin can be customized using the generic theme()
role, as we volition see beneath. Nevertheless, at that place are pre-loaded themes bachelor that change the overall appearance of the graph without much try.
For example, we can change our previous graph to have a simpler white background using the theme_bw()
office:
In addition to theme_bw()
, which changes the plot background to white, ggplot2
comes with several other themes which can exist useful to apace change the look of your visualization. The consummate list of themes is bachelor at https://ggplot2.tidyverse.org/reference/ggtheme.html. theme_minimal()
and theme_light()
are pop, and theme_void()
can be useful as a starting point to create a new manus-crafted theme.
The ggthemes package provides a wide variety of options.
Claiming
Use what you lot just learned to create a plot that depicts how the average weight of each species changes through the years.
Respond
#> `summarise()` has grouped output by 'twelvemonth'. You lot can override using the #> `.groups` statement.
Customization
Accept a wait at the ggplot2
cheat sheet, and recall of ways you could improve the plot.
Now, permit'southward modify names of axes to something more informative than 'year' and 'n' and add together a title to the figure:
The axes have more informative names, but their readability can be improved by increasing the font size. This can be done with the generic theme()
function:
Notation that it is also possible to alter the fonts of your plots. If you are on Windows, y'all may have to install the extrafont
package, and follow the instructions included in the README for this package.
Afterward our manipulations, yous may notice that the values on the x-axis are notwithstanding not properly readable. Permit'southward modify the orientation of the labels and adjust them vertically and horizontally and then they don't overlap. You tin can utilise a 90 caste bending, or experiment to detect the appropriate angle for diagonally oriented labels. We tin also modify the facet label text (strip.text
) to italicize the genus names:
ggplot(information = yearly_sex_counts, mapping = aes(x = year, y = n, color = sex)) + geom_line() + facet_wrap(vars(genus)) + labs(championship = "Observed genera through time", x = "Year of observation", y = "Number of individuals") + theme_bw() + theme(centrality.text.x = element_text(color = "grey20", size = 12, angle = xc, hjust = 0.5, vjust = 0.5), axis.text.y = element_text(colour = "grey20", size = 12), strip.text = element_text(face up = "italic"), text = element_text(size = 16))
If you like the changes you created better than the default theme, you can save them every bit an object to be able to easily apply them to other plots you may create:
Challenge
With all of this information in hand, please take another five minutes to either improve one of the plots generated in this practice or create a cute graph of your own. Utilise the RStudio
ggplot2
cheat sheet for inspiration.Here are some ideas:
- Run into if you lot can change the thickness of the lines.
- Can you detect a manner to change the name of the legend? What nigh its labels?
- Try using a unlike color palette (run into https://www.cookbook-r.com/Graphs/Colors_(ggplot2)/).
Arranging plots
Faceting is a great tool for splitting one plot into multiple plots, merely sometimes y'all may want to produce a unmarried figure that contains multiple plots using dissimilar variables or even different data frames. The patchwork
package allows us to combine split up ggplots into a single effigy while keeping everything aligned properly. Like nigh R packages, we can install patchwork
from CRAN, the R bundle repository:
After you accept loaded the patchwork
package you can apply +
to place plots next to each other, /
to arrange them vertically, and plot_layout()
to determine how much space each plot uses:
You can likewise apply parentheses ()
to create more circuitous layouts. There are many useful examples on the patchwork website
Exporting plots
After creating your plot, y'all can save it to a file in your favorite format. The Export tab in the Plot pane in RStudio will save your plots at low resolution, which will not be accustomed by many journals and will non calibration well for posters. The ggplot2
extensions website provides a list of packages that extend the capabilities of ggplot2
, including boosted themes.
Instead, employ the ggsave()
part, which allows you lot to easily change the dimension and resolution of your plot by adjusting the advisable arguments (width
, meridian
and dpi
):
my_plot <- ggplot(data = yearly_sex_counts, aes(10 = year, y = northward, color = sex)) + geom_line() + facet_wrap(vars(genus)) + labs(title = "Observed genera through fourth dimension", x = "Year of observation", y = "Number of individuals") + theme_bw() + theme(centrality.text.ten = element_text(colour = "grey20", size = 12, angle = 90, hjust = 0.five, vjust = 0.5), centrality.text.y = element_text(colour = "grey20", size = 12), text = element_text(size = xvi)) ggsave("name_of_file.png", my_plot, width = 15, height = ten) ## This also works for plots combined with patchwork plot_combined <- plot_weight / plot_count + plot_layout(heights = c(3, 2)) ggsave("plot_combined.png", plot_combined, width = 10, dpi = 300)
Note: The parameters width
and height
also make up one's mind the font size in the saved plot.
Folio built on: 📆 2022-07-08 ‒ 🕢 04:13:42
How To Add Ggplot To R,
Source: https://datacarpentry.org/R-ecology-lesson/04-visualization-ggplot2.html
Posted by: rosadotorty1998.blogspot.com
0 Response to "How To Add Ggplot To R"
Post a Comment