数据、字符R处理R语言

R读取数据出现“line 1 appears to contai

2019-04-20  本文已影响0人  浩瀚之宇

由于数据可能在Windows下编辑过,保存的是UTF-16的格式用R读取可能会出现以下问题。这种情况有以下三种解决方案。

> sampInfo=read.table("/media/xxx/sampInfo_origin.txt", na.strings=c("", "NA"), sep="\t", header=T)
Error in make.names(col.names, unique = TRUE) : 
  invalid multibyte string at '<ff><fe>R'
In addition: Warning messages:
1: In read.table("/media/xxx/sampInfo_origin.txt",  :
  line 1 appears to contain embedded nulls
2: In read.table("/media/xxx/sampInfo_origin.txt",  :
  line 2 appears to contain embedded nulls
3: In read.table("/media/xxx/sampInfo_origin.txt",  :
  line 3 appears to contain embedded nulls
4: In read.table("/media/xxx/sampInfo_origin.txt",  :
  line 4 appears to contain embedded nulls
5: In read.table("/media/albert/xxx/sampInfo_origin.txt",  :
  line 5 appears to contain embedded nulls

解决方法一:fileEncoding="UTF16LE"或者fileEncoding="UTF16"

> sampInfo=read.table("/media/xxx/sampInfo_origin.txt", fileEncoding="UTF16LE", sep="\t", header=T)
> sampInfo=read.table("/media/xxx/sampInfo_origin.txt", fileEncoding="UTF16", sep="\t", header=T)
> head(sampInfo)
         Run Sample_Name age ancestry arthropathymeds biologics das_score
1 SRRxxx72  GSMxxx25  66     <NA>            <NA>      <NA>        NA
2 SRRxxx73  GSMxxx26  72     <NA>            <NA>      <NA>        NA
3 SRRxxx75  GSMxxx28  61     <NA>            <NA>      <NA>        NA
4 SRRxxx74  GSMxxx27  72     <NA>            <NA>      <NA>        NA
5 SRRxxx76  GSMxxx29  50     <NA>            <NA>      <NA>        NA
6 SRRxxx77  GSMxxx30  59     <NA>            <NA>      <NA>        NA
  disease_activity donor gender leflumide nsaids othermeds phenotype
1             <NA>  C137   male      <NA>   <NA>      <NA>   Healthy
2             <NA>  C141   male      <NA>   <NA>      <NA>   Healthy
3             <NA>  C383   male      <NA>   <NA>      <NA>   Healthy
4             <NA>  C148 female      <NA>   <NA>      <NA>   Healthy
5             <NA>  C391 female      <NA>   <NA>      <NA>   Healthy
6             <NA>  C392 female      <NA>   <NA>      <NA>   Healthy
  classification status plaquenil rituximab steroids sulfasalazine tissue
1              H      H      <NA>      <NA>     <NA>          <NA>  Blood
2              H      H      <NA>      <NA>     <NA>          <NA>  Blood
3              H      H      <NA>      <NA>     <NA>          <NA>  Blood
4              H      H      <NA>      <NA>     <NA>          <NA>  Blood
5              H      H      <NA>      <NA>     <NA>          <NA>  Blood
6              H      H      <NA>      <NA>     <NA>          <NA>  Blood

解决方法二:在Excel中打开,另存为csv文件即可。

> sampInfo=read.csv("/media/xxx/sampInfo_origin.csv", comment.char = "#", sep=",", header=T)
> head(sampInfo)
         Run Sample_Name age ancestry arthropathymeds biologics das_score
1 SRRxxx72  GSMxxx25  66     <NA>            <NA>      <NA>        NA
2 SRRxxx73  GSMxxx26  72     <NA>            <NA>      <NA>        NA
3 SRRxxx75  GSMxxx28  61     <NA>            <NA>      <NA>        NA
4 SRRxxx74  GSMxxx27  72     <NA>            <NA>      <NA>        NA
5 SRRxxx76  GSMxxx29  50     <NA>            <NA>      <NA>        NA
6 SRRxxx77  GSMxxx30  59     <NA>            <NA>      <NA>        NA
  disease_activity donor gender leflumide nsaids othermeds phenotype
1             <NA>  C137   male      <NA>   <NA>      <NA>   Healthy
2             <NA>  C141   male      <NA>   <NA>      <NA>   Healthy
3             <NA>  C383   male      <NA>   <NA>      <NA>   Healthy
4             <NA>  C148 female      <NA>   <NA>      <NA>   Healthy
5             <NA>  C391 female      <NA>   <NA>      <NA>   Healthy
6             <NA>  C392 female      <NA>   <NA>      <NA>   Healthy
  classification status plaquenil rituximab steroids sulfasalazine tissue
1              H      H      <NA>      <NA>     <NA>          <NA>  Blood
2              H      H      <NA>      <NA>     <NA>          <NA>  Blood
3              H      H      <NA>      <NA>     <NA>          <NA>  Blood
4              H      H      <NA>      <NA>     <NA>          <NA>  Blood
5              H      H      <NA>      <NA>     <NA>          <NA>  Blood
6              H      H      <NA>      <NA>     <NA>          <NA>  Blood

解决方法三:在linux系统里将sampInfo_origin.txt用gedit打开,另存为sampInfo_origin01.txt,“Character Encoding” 改为 UTF-8, “Line ending”改为“Unix/Linux”。

> sampInfo=read.table("/media/xxx/sampInfo_origin01.txt", sep="\t", header=T)
> head(sampInfo,2)
         Run Sample_Name age ancestry arthropathymeds biologics das_score
1 SRRxxx72  GSMxxx25  66     <NA>            <NA>      <NA>        NA
2 SRRxxx73  GSMxxx26  72     <NA>            <NA>      <NA>        NA
  disease_activity donor gender leflumide nsaids othermeds phenotype
1             <NA>  C137   male      <NA>   <NA>      <NA>   Healthy
2             <NA>  C141   male      <NA>   <NA>      <NA>   Healthy
  classification status plaquenil rituximab steroids sulfasalazine tissue
1              H      H      <NA>      <NA>     <NA>          <NA>  Blood
2              H      H      <NA>      <NA>     <NA>          <NA>  Blood
上一篇下一篇

猜你喜欢

热点阅读