作者:林群东耀禎逸群 | 来源:互联网 | 2023-05-18 22:13
IhaveadatawhichcontainsomeNAvalueintheirelements.WhatIwanttodoistoperformcluster
I have a data which contain some NA value in their elements. What I want to do is to perform clustering without removing rows where the NA is present.
我有一个数据,其中包含一些元素中的NA值。我要做的是执行集群,而不删除包含NA的行。
I understand that gower
distance measure in daisy
allow such situation. But why my code below doesn't work? I welcome other alternatives than 'daisy'.
我理解黛西的高尔距离测量允许这样的情况。但是为什么我下面的代码不能工作呢?我欢迎“黛西”以外的其他选择。
# plot heat map with dendogram together.
library("gplots")
library("cluster")
# Arbitrarily assigning NA to some elements
mtcars[2,2] <- "NA"
mtcars[6,7] <- "NA"
mydata <- mtcars
hclustfunc <- function(x) hclust(x, method="complete")
# Initially I wanted to use this but it didn't take NA
#distfunc <- function(x) dist(x,method="euclidean")
# Try using daisy GOWER function
# which suppose to work with NA value
distfunc <- function(x) daisy(x,metric="gower")
d <- distfunc(mydata)
fit <- hclustfunc(d)
# Perform clustering heatmap
heatmap.2(as.matrix(mydata),dendrogram="row",trace="none", margin=c(8,9), hclust=hclustfunc,distfun=distfunc);
The error message I got is this:
我得到的错误信息是:
Error in which(is.na) : argument to 'which' is not logical
Calls: distfunc.g -> daisy
In addition: Warning messages:
1: In data.matrix(x) : NAs introduced by coercion
2: In data.matrix(x) : NAs introduced by coercion
3: In daisy(x, metric = "gower") :
binary variable(s) 8, 9 treated as interval scaled
Execution halted
At the end of the day, I'd like to perform hierarchical clustering with the NA allowed data.
最后,我希望使用NA允许的数据执行分层集群。
Update
更新
Converting with as.numeric
work with example above. But why this code failed when read from text file?
转换的。数值工作与上面的例子。但是为什么从文本文件中读取代码失败呢?
library("gplots")
library("cluster")
# This time read from file
mtcars <- read.table("http://dpaste.com/1496666/plain/",na.strings="NA",sep="\t")
# Following suggestion convert to numeric
mydata <- apply( mtcars, 2, as.numeric )
hclustfunc <- function(x) hclust(x, method="complete")
#distfunc <- function(x) dist(x,method="euclidean")
# Try using daisy GOWER function
distfunc <- function(x) daisy(x,metric="gower")
d <- distfunc(mydata)
fit <- hclustfunc(d)
heatmap.2(as.matrix(mydata),dendrogram="row",trace="none", margin=c(8,9), hclust=hclustfunc,distfun=distfunc);
The error I get is this:
我得到的错误是:
Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
3: In min(x) : no non-missing arguments to min; returning Inf
4: In max(x) : no non-missing arguments to max; returning -Inf
Error in hclust(x, method = "complete") :
NA/NaN/Inf in foreign function call (arg 11)
Calls: hclustfunc -> hclust
Execution halted
~
~
2 个解决方案