Data Science with R in 4 Weeks -

2016-01-13 本文已影响40人慢思考快思考

Reshaping Data

Reshape & reshape 2

经常用到的一中分析是reshape - cast。相当于excel里面的pivot 图表。注意，R做pivot table并不是最理想的选择，毕竟，Excel的reorpting功能更强大一些。

package（reshape）里面有两个function特别重要，melt和cast。

我们来看一下这两个函数的使用

Wide and Long data

Wide Data: Wide data has a column for each variable. For example, this is wide-format data

变量按照列横向展开的，就要wide data。变量多，展开的列就多，就比较宽.

Ozone Solar.R Wind Temp Month Day

1 41 190 7.4 67 5 1

2 36 118 8.0 72 5 2

3 12 149 12.6 74 5 3

4 18 313 11.5 62 5 4

5 NA NA 14.3 56 5 5

6 28 NA 14.9 66 5 6

Long data: Long-format data has a column for possible variable types and a column for the values of those variables.

所有的变量存储在一个列里面的，就要long data。变量比较多，这个列的行数就比较多，所以就长。

# variable value

# 1 ozone 23.615

# 2 ozone 29.444

# 3 ozone 59.115

# 4 ozone 59.962

# 5 wind 11.623

# 6 wind 10.267

melt takes wide-format data and melts it into long-format data.melt把宽数据变成长数据

cast takes long-format data and casts it into wide-format data.；cast把长数据，变成宽数据。

> names(airquality) <- tolower(names(airquality))

> library(reshape2)

> aql <- melt(airquality)

> head(aql)

variable value

1 ozone 41

2 ozone 36

3 ozone 12

4 ozone 18

5 ozone NA

6 ozone 28

这时，我们把dataset里面每一个variable都拆解出来。但有时候，我们不希望看到这么细致的数据，我们只想知道每个月里面的每一天的Ozone，solar是什么样子的。

我们可以用：

> aql <- melt(airquality, id=c ("month", "day"))

> head(aql)

month day variable value

1 5 1 ozone 41

2 5 2 ozone 36

3 5 3 ozone 12

4 5 4 ozone 18

5 5 5 ozone NA

6 5 6 ozone 28

如果想给数据variable重新命名的话：

> aql <- melt(airquality, id=c ("month", "day"), variable.name = "climate_related", value.name = "values")

> head(aql)

month day climate_related values

1 5 1 ozone 41

2 5 2 ozone 36

3 5 3 ozone 12

4 5 4 ozone 18

5 5 5 ozone NA

6 5 6 ozone 28

Cast, pivot table

subset data 或者输出你想要的若干列数据，然后从cast命令作table

reference :http://seananderson.ca/2013/10/19/reshape.html

http://www.statmethods.net/management/reshape.html

Data Science with R in 4 Weeks -

Reshaping Data

Reshape & reshape 2

猜你喜欢

热点阅读