2019-11-20 生信R练习

2019-11-20 本文已影响0人 __一蓑烟雨__

分别判断一下“a”,TRUE，3是什么数据类型？提示：typeof（），将要判断的内容放进括号里

> typeof("a")
[1] "character"
> typeof(TRUE)
[1] "logical"
> typeof(3)
[1] "double"

练习2 向量生成
2.1. 生成任意向量

> c(1,3,6,10)
[1]  1  3  6 10

2.2 生成1到30之间所有4的倍数，答案是 #4，8，12，16，20，24，28

> seq(from=4, to=28, by=4)
[1]  4  8 12 16 20 24 28

2.3 生成sample4，sample8，sample12…sample28 提示：用paste0，请尝试改写刚才的代码

> paste0(rep("sample", times=7), seq(from=4, to=28, by=4))
[1] "sample4"  "sample8"  "sample12" "sample16" "sample20" "sample24"
[7] "sample28"

思考如何从50个数中筛选小于7的？
50个数组成向量，赋值给x
用x<7判断返回50个逻辑值
挑选结果为TRUE的


> x <- runif(50, min = 1, max = 60)
> x
 [1] 13.245162 14.490830 36.147008 34.917460  5.546798  3.096894 38.924934
 [8] 55.788297 36.287453 34.093144 32.035636 59.120618 30.950868 41.284497
[15] 36.490932 15.093252 16.231790 44.029268 27.701679 11.332479 45.055198
[22]  7.194271 52.008152 37.264053 33.872413 20.397862 27.734755 30.526017
[29] 11.671115 32.248206  5.441269 17.387600 13.549272 17.802638 53.810552
[36] 27.327884 47.019108 52.956523 25.374328  4.764700 20.793762 43.699831
[43] 20.919305 38.194433 50.596259 51.511768 24.090198 23.449139 53.831280
[50] 39.014630
> x[x<7]
[1] 5.546798 3.096894 5.441269 4.764700

练习3：向量取子集
3.1 将基因名 “ACTR3B”,“ANLN”,“BAG1”,“BCL2”,“BIRC5”,“RAB”,“ABCT”,“ANF”,“BAD”,“BCF”,“BARC7”,“BAL V”组成一个向量，赋值给x

> x <- c("ACTR3B","ANLN","BAG1","BCL2","BIRC5","RAB","ABCT",
+        "ANF","BAD","BCF","BARC7","BALV")
> x
 [1] "ACTR3B" "ANLN"   "BAG1"   "BCL2"   "BIRC5"  "RAB"    "ABCT"  
 [8] "ANF"    "BAD"    "BCF"    "BARC7"  "BALV"

3.2 用函数计算向量长度

> length(x)
[1] 12

3.3 用向量取子集的方法，选出第1,3,5,7,9,11个基因名。

> x[seq(from=1, to=11, by=2)]
[1] "ACTR3B" "BAG1"   "BIRC5"  "ABCT"   "BAD"    "BARC7"

3.4 用向量取子集的方法，选出第1到7、10-15个基因名。

> x[c(1:7,10:15)]
 [1] "ACTR3B" "ANLN"   "BAG1"   "BCL2"   "BIRC5"  "RAB"    "ABCT"  
 [8] "BCF"    "BARC7"  "BALV"   NA       NA       NA

3.5 用向量取子集的方法，选出出在c(“ANLN”, “BCL2”,“TP53”)中有的基因名。提示：%in%

 x[x %in% c("ANLN", "BCL2","TP53")]
[1] "ANLN" "BCL2"

3.6 修改第6个基因名为“a”

> x[6] <- "a"
> x
 [1] "ACTR3B" "ANLN"   "BAG1"   "BCL2"   "BIRC5"  "a"      "ABCT"  
 [8] "ANF"    "BAD"    "BCF"    "BARC7"  "BALV"

3.7 生成100个随机数: rnorm(n=100,mean=0,sd=18) 将小于-2的统一改为-2，将大于2的统一改为2

> x <- rnorm(n=100,mean=0,sd=18)
> x
  [1]  19.8391787  -0.3014686   2.9121954  36.4457050 -12.6664966
  [6]  17.2942629  32.2287310 -19.1549729   0.3174578  -7.0183553
 [11]  -8.8349895 -18.8229177 -16.1318028  22.8489689  10.6891371
 [16]  13.9614177  28.0326668  -6.5772323  14.6980161  -1.0914260
 [21]  -9.0248097  16.6691291   0.6648784 -19.1916031  -4.2922144
 [26]  26.9140220  21.0988538 -26.2387298   1.7110121  15.2579693
 [31] -29.2385615  25.3541404  -9.7516865   5.0159650  -3.4915094
 [36]  28.3708473 -26.5598574  -2.6029477 -17.1576566   7.3177692
 [41]  40.1267196 -27.2609461  -1.1107336  -2.6508742  27.7486752
 [46] -17.6734020   8.9384071  30.5450619  -4.6932536 -12.7067145
 [51]  -2.9012131   9.0237929 -18.2437141  29.0655402   0.1015557
 [56] -52.2881831 -19.9289667  27.8562048 -17.5829463  -1.8270621
 [61]   0.7677045 -28.7409243   8.8374127   7.5888606  33.7302702
 [66]  18.6212578   1.4725856  -1.4854277  10.9093218 -15.9735626
 [71]   1.8975850   6.3517405   9.9070805 -20.4179574  26.3223277
 [76]  12.6381008  45.1280007 -34.0204886 -10.6166302 -30.8610413
 [81]  -7.5779622   5.5825448  30.6462705  -7.9809265 -21.5747475
 [86]  -5.5328565  11.1789756   3.2742394  23.7312168  -5.3803676
 [91] -29.6679914  17.1269725 -20.0362132  11.1053966   9.2428869
 [96]   6.6502639  31.0300944  -3.7106022 -23.6555125   1.1425337
> x[x< -2] <- -2
> x[x> 2] <- 2
> x
  [1]  2.0000000 -0.3014686  2.0000000  2.0000000 -2.0000000  2.0000000
  [7]  2.0000000 -2.0000000  0.3174578 -2.0000000 -2.0000000 -2.0000000
 [13] -2.0000000  2.0000000  2.0000000  2.0000000  2.0000000 -2.0000000
 [19]  2.0000000 -1.0914260 -2.0000000  2.0000000  0.6648784 -2.0000000
 [25] -2.0000000  2.0000000  2.0000000 -2.0000000  1.7110121  2.0000000
 [31] -2.0000000  2.0000000 -2.0000000  2.0000000 -2.0000000  2.0000000
 [37] -2.0000000 -2.0000000 -2.0000000  2.0000000  2.0000000 -2.0000000
 [43] -1.1107336 -2.0000000  2.0000000 -2.0000000  2.0000000  2.0000000
 [49] -2.0000000 -2.0000000 -2.0000000  2.0000000 -2.0000000  2.0000000
 [55]  0.1015557 -2.0000000 -2.0000000  2.0000000 -2.0000000 -1.8270621
 [61]  0.7677045 -2.0000000  2.0000000  2.0000000  2.0000000  2.0000000
 [67]  1.4725856 -1.4854277  2.0000000 -2.0000000  1.8975850  2.0000000
 [73]  2.0000000 -2.0000000  2.0000000  2.0000000  2.0000000 -2.0000000
 [79] -2.0000000 -2.0000000 -2.0000000  2.0000000  2.0000000 -2.0000000
 [85] -2.0000000 -2.0000000  2.0000000  2.0000000  2.0000000 -2.0000000
 [91] -2.0000000  2.0000000 -2.0000000  2.0000000  2.0000000  2.0000000
 [97]  2.0000000 -2.0000000 -2.0000000  1.1425337

练习4：数据框新建与取子集

4.1 新建这个数据框（提示：后面的三列是rnorm（）

image.png

> gene <- paste0("gene",1:15) #循环补齐
> gene
 [1] "gene1"  "gene2"  "gene3"  "gene4"  "gene5"  "gene6"  "gene7" 
 [8] "gene8"  "gene9"  "gene10" "gene11" "gene12" "gene13" "gene14"
[15] "gene15"
> gene <- paste0(rep("gene",times=15),1:15)
> gene
 [1] "gene1"  "gene2"  "gene3"  "gene4"  "gene5"  "gene6"  "gene7" 
 [8] "gene8"  "gene9"  "gene10" "gene11" "gene12" "gene13" "gene14"
[15] "gene15"
> s1 <- rnorm(15,mean = 0, sd=1)
> s1
 [1]  0.68501477  3.26641452  0.56060046 -0.06901730 -0.97244294
 [6] -0.54658659 -1.68869233 -1.57237270 -0.40498716  0.31928642
[11]  0.04042768 -0.39000956 -1.81922223  0.65918071  0.45962167
> s2 <- rnorm(15, mean = 0, sd=1)
> s2
 [1]  1.6166263 -1.8561905 -0.2868239  1.7503219  0.1164136  1.3842532
 [7]  0.5742209  0.1364908  0.9142160 -1.8008263 -0.3398806  0.6062646
[13]  1.3411303  0.7672873  0.1937257
> s3 <- rnorm(15, mean = 0, sd=1)
> s3
 [1]  1.14056669  0.01386480 -1.10530591 -0.02516264 -0.16367334
 [6]  0.37005975 -0.38082454  0.65295237  2.06134181 -1.79664494
[11]  0.58407712 -0.72275312 -0.62916466 -1.81620605 -0.25928910
> my_dataframe <- data.frame(gene, s1, s2, s3)
> my_dataframe
     gene          s1         s2          s3
1   gene1  0.68501477  1.6166263  1.14056669
2   gene2  3.26641452 -1.8561905  0.01386480
3   gene3  0.56060046 -0.2868239 -1.10530591
4   gene4 -0.06901730  1.7503219 -0.02516264
5   gene5 -0.97244294  0.1164136 -0.16367334
6   gene6 -0.54658659  1.3842532  0.37005975
7   gene7 -1.68869233  0.5742209 -0.38082454
8   gene8 -1.57237270  0.1364908  0.65295237
9   gene9 -0.40498716  0.9142160  2.06134181
10 gene10  0.31928642 -1.8008263 -1.79664494
11 gene11  0.04042768 -0.3398806  0.58407712
12 gene12 -0.39000956  0.6062646 -0.72275312
13 gene13 -1.81922223  1.3411303 -0.62916466
14 gene14  0.65918071  0.7672873 -1.81620605
15 gene15  0.45962167  0.1937257 -0.25928910

4.2 提取第一列（两种方法）

 > my_dataframe$gene
 [1] gene1  gene2  gene3  gene4  gene5  gene6  gene7  gene8  gene9  gene10
[11] gene11 gene12 gene13 gene14 gene15
15 Levels: gene1 gene10 gene11 gene12 gene13 gene14 gene15 gene2 ... gene9
> my_dataframe[,1]
 [1] gene1  gene2  gene3  gene4  gene5  gene6  gene7  gene8  gene9  gene10
[11] gene11 gene12 gene13 gene14 gene15
15 Levels: gene1 gene10 gene11 gene12 gene13 gene14 gene15 gene2 ... gene9
>

4.3 提取第二行

> my_dataframe[2,]
   gene       s1       s2        s3
2 gene2 3.266415 -1.85619 0.0138648

4.4 提取第3行第4列

> my_dataframe[3,4]
[1] -1.105306

4.5 提取行名和列名

> colnames(my_dataframe)
[1] "gene" "s1"   "s2"   "s3"  
> row.names(my_dataframe)
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14"
[15] "15"

4.6 求第2列的平均值

> mean(my_dataframe[,2])
[1] -0.09818564
> mean(my_dataframe$s1)
[1] -0.09818564

4.7 按照列名提取s1,s3列

> my_dataframe$s1
 [1]  0.68501477  3.26641452  0.56060046 -0.06901730 -0.97244294
 [6] -0.54658659 -1.68869233 -1.57237270 -0.40498716  0.31928642
[11]  0.04042768 -0.39000956 -1.81922223  0.65918071  0.45962167
> my_dataframe$s3
 [1]  1.14056669  0.01386480 -1.10530591 -0.02516264 -0.16367334
 [6]  0.37005975 -0.38082454  0.65295237  2.06134181 -1.79664494
[11]  0.58407712 -0.72275312 -0.62916466 -1.81620605 -0.25928910

4.8 筛选s3列大于0的行

> my_dataframe$s3[my_dataframe$s3>0]
[1] 1.1405667 0.0138648 0.3700597 0.6529524 2.0613418 0.5840771
> my_dataframe[,4][my_dataframe[,4]>0]
[1] 1.1405667 0.0138648 0.3700597 0.6529524 2.0613418 0.5840771

5 安装任意两个R包

> BiocManager::install()
Bioconductor version 3.9 (BiocManager 1.30.4), R 3.6.0 (2019-04-26)
installation path not writeable, unable to update packages: boot, cluster,
  foreign, KernSmooth, mgcv, nlme
Update old packages: 'AnnotationDbi', 'backports', 'BiocManager',
  'BiocParallel', 'biomaRt', 'blob', 'callr', 'car', 'carData',
  'checkmate', 'clipr', 'cowplot', 'curl', 'data.table', 'devtools',
  'digest', 'doParallel', 'dplyr', 'edgeR', 'effects', 'ellipsis',
  'FactoMineR', 'farver', 'fgsea', 'foreach', 'GenomicRanges', 'ggforce',
  'ggplot2', 'ggplotify', 'ggpubr', 'ggraph', 'ggsignif', 'git2r',
  'haven', 'hexbin', 'Hmisc', 'hms', 'htmlTable', 'htmltools',
  'htmlwidgets', 'httpuv', 'httr', 'IRanges', 'iterators', 'knitr',
  'lambda.r', 'later', 'lava', 'limma', 'maptools', 'markdown',
  'matrixStats', 'mclust', 'openssl', 'openxlsx', 'pillar', 'pkgbuild',
  'pkgconfig', 'processx', 'prodlim', 'promises', 'purrr', 'quantreg',
  'R6', 'Rcpp', 'RcppArmadillo', 'RcppEigen', 'rlang', 'robust',
  'RSQLite', 'rvcheck', 'S4Vectors', 'shiny', 'sp',
  'SummarizedExperiment', 'survival', 'sys', 'tidyr', 'usethis', 'vctrs',
  'whisker', 'xfun', 'xml2', 'zip'
Update all/some/none? [a/s/n]: 
n
> library(BiocManager)
Bioconductor version 3.9 (BiocManager 1.30.4), ?BiocManager::install for
  help
Bioconductor version '3.9' is out-of-date; the current release version
  '3.10' is available with R version '3.6'; see
  https://bioconductor.org/install

练习6 文件的读取和导出
6.1 读取complete_set.txt（已保存在工作目录）

> read.table("complete_set.txt")
                     V1                   V2                 V3
1                 geneA                geneB              geneC
2    -0.635020187971398    -0.49728008811353  0.514896730700242
3      0.91605661780324   -0.545381308500589   1.20238322656491
4     0.805995294157758   -0.315914513323816   0.27825197143441

6.2 查看有多少行、多少列

> nrow(read.table("complete_set.txt"))
[1] 51
> ncol(read.table("complete_set.txt"))
[1] 20

6.3.获取行名和列名

> row.names(read.table("complete_set.txt"))
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14"
[15] "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28"
[29] "29" "30" "31" "32" "33" "34" "35" "36" "37" "38" "39" "40" "41" "42"
[43] "43" "44" "45" "46" "47" "48" "49" "50" "51"
> colnames(read.table("complete_set.txt"))
 [1] "V1"  "V2"  "V3"  "V4"  "V5"  "V6"  "V7"  "V8"  "V9"  "V10" "V11"
[12] "V12" "V13" "V14" "V15" "V16" "V17" "V18" "V19" "V20"

6.4.导出为csv格式，再读取它

> write.table(read.table("complete_set.txt"), file="complete_set1.csv")
> read.table("complete_set1.csv")
                     V1                   V2                 V3
1                 geneA                geneB              geneC
2    -0.635020187971398    -0.49728008811353  0.514896730700242
3      0.91605661780324   -0.545381308500589   1.20238322656491

6.5.保存为Rdata，再加载它

> x=read.table("complete_set1.csv")
> save(x,file = "complete_set1.Rdata")
> load("complete_set1.Rdata")

练习7 ：tidyr_dplyr

7.1.将iris数据框的前4列gather，然后还原

test <- iris
head(iris)
iris_g <- gather(test, s_p, exp, -Species)
head(iris_g)
iris_g %>% 
  group_by(s_p) %>% 
  mutate(id=1:n()) %>% 
  spread(s_p, exp)

7.2.将第三列分成两列（以小数点为分隔符）然后合并

head(iris_s)
iris_s <- separate(test,Petal.Length,c("Petal", "Length"),sep = "[.]")
head(iris_s)

7.3.加载test.Rdata，将deg数据框按照pvalue从小到大排序

load("test.Rdata")
head(deg)
head(arrange(deg,P.Value))

7.4. 将两个数据框按照probe_id列连接在一起

merge(deg,ids,by="probe_id")

2019-11-20 生信R练习

猜你喜欢

热点阅读