[R]特定字符的提取
2024-07-02 本文已影响0人
花生学生信
例子
假设column_names是一个向量
column_names <- c("chr08_2600000_2610000", "chr12_4500000_4510000", "chr17_7800000_7810000")
column_names
![](https://img.haomeiwen.com/i25274977/8c8ba5e03ab4f259.png)
提取染色体信息
Chr <- sub("_.*", "", column_names)
Chr
![](https://img.haomeiwen.com/i25274977/d4ee3a4a5469b18a.png)
提取第二个”下划线“后的字符
Between_1_2 <- sub("^(\\w+)_(\\d+).*", "\\2", column_names)
Between_1_2
![](https://img.haomeiwen.com/i25274977/fe26770efb15caf6.png)
提取前两个字符
Chr <- sub("^(\\w+)_.*", "\\1", column_names)
Chr
![](https://img.haomeiwen.com/i25274977/5f1942bf61a1bd5b.png)
下面是实践部分
#读取基因变异数据
mygene <- read.csv("5_scaffold_1091_1091_57926_63436.csv", header = T,row.names = 1)
###提取列名
column_names <- names(mygene)
##提取染色体名
Chr <- sub("_.*", "", column_names)
# 创建一个数据框,其中包含转置后的第一行,列名是"sample"
new_df <- data.frame(sample = column_names,Pos=column_names,Chr=Chr)
new_df
![](https://img.haomeiwen.com/i25274977/9e35328d0cde5646.png)
![](https://img.haomeiwen.com/i25274977/97ce9ae284f981d4.png)
###保存文件,即为onci图的第三个输入文件
write.csv(new_df, file = "3.mytype.csv", row.names = FALSE)