R语言初级画图练习
2019-04-23 本文已影响0人
山竹山竹px
> options(stringsAsFactors = F)#不要把字符串当做因子
> a=read.table('SraRunTable.txt',sep = '\t',header = T)
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'SraRunTable.txt': No such file or directory#没有这个东西,但是有网址,网址打开后直接就是内容没有点击下载的通道,怎么下载呢?我直接把read.table的第一个参数“file”,写成网址
> a=read.table("http://www.bio-info-trainee.com/tmp/5years/SraRunTable.txt",sep = '\t',header = T)
> View(a)#解决问题
> sort(a$MBases)[1]#排序取第一个
[1] 0
> sort(a$MBases,decreasing = T)[1]#倒序取第一个
[1] 74
> max(a$MBases)
[1] 74
> min(a$MBases)
[1] 0
> fivenum(a$MBases)#五分位数
[1] 0 8 12 16 74
> ?boxplot#画盒须图(箱线图)
> boxplot(a$MBases)
> hist(a$MBases)#频数图
> density(a$MBases)
Call:
density.default(x = a$MBases)
Data: a$MBases (768 obs.); Bandwidth 'bw' = 1.423
x y
Min. :-4.269 Min. :0.0000000
1st Qu.:16.366 1st Qu.:0.0000353
Median :37.000 Median :0.0003001
Mean :37.000 Mean :0.0121039
3rd Qu.:57.634 3rd Qu.:0.0142453
Max. :78.269 Max. :0.0665647
箱线图箱线图,是一种用作显示一组数据分散情况资料的统计图。它能显示出一组数据的最大值、最小值、中位数及上下四分位数,圆圈表示异常值
频数图
分两组做箱线图
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'sample.csv': No such file or directory
> b=read.csv("homework/sample.csv")#给出路径,重新读入
> d=merge(a,b,by.x = 'Sample_Name',by.y = 'Accession')#用相同的列 合并
> e=d[,c("MBases","Title")]#取MBases title 列
> x=e[1,2]
> ?strsplit#拆分字符向量的元素
> strsplit(x,'_')#用下划线拆分字符
[[1]]
[1] "SS2" "15" "0048" "A1" #分成四个
> strsplit(x,'_')[[1]][3]#查看第三个。[[]]主要用于获取列表(list)中的元素,而[]则可以适用于所有对象,但不能按索引抓取列表(list)中元素
[1] "0048"
> strsplit(x,'_')[3]#不懂
[[1]]
NULL
> plate=unlist(lapply(e[,2],function(x){#lapply,对第2列的每一个做strsplit的操作,返回的结果如下,拿后几个示例一下,然后把这个list,UNlist一下,命名为plate
# [[765]]
# [1] "0049"
#[[766]]
#[1] "0049"
#[[767]]
#[1] "0049"
#[[768]]
#[1] "0049"
+ x
+ strsplit(x,'_')[[1]][3]#上面说过
+
+ }))
> table(plate)#结果提取
plate#0048 有384个,0049有384个
0048 0049
384 384
> boxplot(e[,1]~plate)#以plate里面的分组再做箱线图
> ?t.test
> t.test(e[,1]~plate)#对数据向量执行一个和两个样本t检验。这里的两个样本是plate里的两组
Welch Two Sample t-test
data: e[, 1] by plate
t = 2.3019, df = 728.18, p-value = 0.02162#显著差异看P值
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.1574805 1.9831445
sample estimates:
mean in group 0048 mean in group 0049
13.08854 12.01823
> library(ggplot2)#已经下载了包,每次用前需要library
Warning message:
程辑包‘ggplot2’是用R版本3.5.3 来建造的
> ggplot(e,aes(x=plate,y=MBases))#先铺上画布,图homework6
> ggplot(e,aes(x=plate,y=MBases))+geom_boxplot()#用“+”加上箱线图,homework5,打“geom_”会自动跳出可供选择的图样
> ggplot(e,aes(x=plate,y=MBases))+geom_point()#点图
> library(ggpubr)
载入需要的程辑包:magrittr
Warning messages:
1: 程辑包‘ggpubr’是用R版本3.5.3 来建造的
2: 程辑包‘magrittr’是用R版本3.5.3 来建造的
> ggboxplot(e,x ="plate",y="MBases")#建好坐标
> ggboxplot(e,x ="plate",y="MBases",color = "plate")#上色
> ggboxplot(e,x ="plate",y="MBases",color = "plate",palette = "jco")#换了个颜色
> p <- ggboxplot(e, x = "plate", y = "MBases",
+ color = "plate", palette = "jco",
+ add = "jitter")
> # Add p-value
> p + stat_compare_means(method = 't.test')#将平均比较P值添加到ggplot
建好坐标,画布
点图
箱线图加颜色
最终的呈现