在R中同时使用正则表达式时,是否可以聚合或使用子集?
我试图解决的问题是:我有一个名为'wpbCellFeatures'的数据框,其中包含多个列,包括唯一标识符'rowColFoVCell':
rowColFoVCell wpbCount meanFeret meanPerim meanCirc meanAR meanRound meanSolidity 1 001001001001 38 1.182632 3.047368 0.7560526 1.948947 0.6036842 0.8289474 2 001001001002 8 1.886250 4.493750 0.7537500 2.365000 0.5350000 0.8325000
此列包含数字'001001001001','001001001002','001001001003',......,'001003004002',...等.形成此ID的数字对应于行号,列号,视野和单元格数字,例如'001003004002'是第一行,第三列,第四视野和第二单元格.
我想选择具有1到3之间的行的所有标识符,并聚合到新的数据框中.如何在R中执行此操作,我认为它将涉及使用聚合和正则表达式,但我对此不太熟悉?
谢谢
我不会使用正则表达式,而是通过使用read.fwf
(或substr
相关函数)将第一列拆分为相应的列.然后,将其绑定回原始数据集,并aggregate
像往常一样使用等等.
toBind <- read.fwf(file = textConnection(as.character(mydf$rowColFoVCell)), widths = c(3, 3, 3, 3), colClasses = "character", col.names = c("Row", "Col", "FoV", "Cell")) cbind(toBind, mydf) # Row Col FoV Cell rowColFoVCell wpbCount meanFeret meanPerim meanCirc meanAR meanRound # 1 001 001 001 001 001001001001 38 1.182632 3.047368 0.7560526 1.948947 0.6036842 # 2 001 001 001 002 001001001002 8 1.886250 4.493750 0.7537500 2.365000 0.5350000 # meanSolidity # 1 0.8289474 # 2 0.8325000
在这里,我从"mydf"开始:
mydf <- structure(list(rowColFoVCell = c("001001001001", "001001001002"), wpbCount = c(38L, 8L), meanFeret = c(1.182632, 1.88625), meanPerim = c(3.047368, 4.49375), meanCirc = c(0.7560526, 0.75375), meanAR = c(1.948947, 2.365), meanRound = c(0.6036842, 0.535), meanSolidity = c(0.8289474, 0.8325)), .Names = c("rowColFoVCell", "wpbCount", "meanFeret", "meanPerim", "meanCirc", "meanAR", "meanRound", "meanSolidity"), class = "data.frame", row.names = c(NA, -2L))