数据科学与R语言

Intro & qplot

2016-12-17  本文已影响0人  pleple

Reference information

Book Name: ggplot2 - Elegant Graphics for Data Analysis
Author: Hadley Wickham
Publisher: Springer
ISBN: 978-0-387-98140-6
e-ISBN: 978-0-387-98141-3

Intro

Resources:

Grammar of graphics

A statistical graphic is a mapping from data to aesthetic attributes (colour, shape, size) of geometric objects (points, lines, bars).
The plot may also contain statistical transformations of the data and is drawn on a specific coordinate system.
Faceting can be used to generate the same plot for different subsets of the dataset.

Relevant resources

qplot

short for quick plot

Basic use

The first two arguments to qplot() are x and y.

qplot(carat,price, data=diamonds)
qplot(log(carat), log(price), data=diamonds)
qplot(carat, x*y*z, data=diamonds)

Colour, size, shape and other aesthetic attributes

qplot(carat, price, data=dsmall, colour=color)
qplot(carat, price, data=dsmall, shape=cut)

You can also manually set the aesthetics using I().
For large datasets, semitransparent points are often useful to alleviate some of the overplotting.
It's often useful to specify the transparency as a fraction, e.g., 1/10 or 2/10, as the denominator specifies the number of points that must overplot to get a completely opaque colour.

qplot(carat, price, data=diamonds, alpha=I(1/10)

Plot geoms

Adding a smoother to a plot

qplot(carat, price, data=diamonds, geom=c('point','smooth')

If you want to turn the confidence interval off, use se = FALSE .
There are many different smoothers you can choose between by using the method argument.

qplot(carat, price, data=dsmall, geom=c('point','smooth'), span=0.2)

Loess does not work well for large datasets.

library(mgcv)
qplot(carat, price, data = dsmall, geom=c('point', 'smooth'), method='gam', formula=y~s(x))
qplot(carat, price, data = diamonds, geom=c('point','smooth'),
method='gam', formula=y~s(x,bs='cs'))
library(splines)
qplot(carat, price, data=dsmall, geom=c('point','smooth'),method='lm')
qplot(carat, price, data=dsmall, geom=c('point','smooth'),method='lm',formula=y~ns(x,5)

Boxplots and jittered points

How the values of the continuous variables vary with the levels of the categorical variable.

qplot(color, price/carat, data=diamonds, geom='jitter', alpha=I(1/50)

**aesthetics: ** size, colour, shape, fill(boxplot)

Histogram and density plots

qplot(carat, data = diamonds, geom='histogram')
qplot(carat, data= diamonds, geom='density')

For the density plot, the adjust argument controls the degree of smoothness (high values of adjust produce smoother plots).
For the histogram, the binwidth argument controls the amount of smoothing by setting the bin size. (Break points can also be specified explicitly, using the breaks argument.)

qplot(carat, data=diamonds, geom='density', colour = color)
qplot(carat, data=diamonds, geom='histogram', fill = color)

The density plot is more appealing at first because it seems easy to read and compare the various curves. However, it is more difficult to understand exactly what a density plot is showing.
In addition, the density plot makes some assumptions that may not be true for our data, i.e. that it is unbounded, continuous and smooth.

Bar charts

The discrete analogue of histogram is the bar chart.
geom='bar'
The bar geom counts the number of instances of each class so that you don't need to tabulate your values beforehand.
If you'd like to tabulate class members in some other way, such as by summing up a continuous variable, you can use the weight geom.

qplot(color, data=diamonds, geom='bar',weight=carat)+scale_y_continuous('carat'))

Time series with line and path plots

Line and path plots are typically used for time series data.

qplot(data, unemploy/pop, data = economics, geom='line')

We could draw a scatterplot of unemployment rate vs. length of unemployment, but then we could no longer see the evolution over time. The solution is to join points adjacent in time with line segments, forming a path plot.
Apply the colour aesthetic to the line to make it easier to see the direction of time.

qplot(unemploy/pop, uempmed, data  = economics, geom='path', colour = year(date)) + scale_area()

Faceting

We have already discussed using aesthetics (colour and shape) to compare subgroups, drawing all groups on the same plot. Faceting takes an alternative approach.

qplot(carat, data=diamonds, facets=color~., geom='histogram',binwidth=0.1, xlim=c(0,3))
qplot(carat, ..density.., data=diamonds, facets=color~., geom='histogram', binwidth=0.1, xlim=c(0,3))

Other options

xlim , ylim
log : e.g. log='x' will log the x-axis, log='xy' will log both.
main : main title of the plot, can be a string or an expression
xlab, ylab

上一篇下一篇

猜你喜欢

热点阅读