作者: | 来源:互联网 | 2023-08-29 17:24
Iamtryingtocoveralogicinexcel,howeverfailingsinceiamnotaproinexcel.我试图掩盖excel中的逻辑
I am trying to cover a logic in excel, however failing since i am not a pro in excel.
我试图掩盖excel中的逻辑,但是因为我不是excel的专业人员而失败了。
Below is how my data looks like:
以下是我的数据的样子:
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15
1 12 3 3 4 5 6 7 7 7 7 7 7 7 7 7
2 1 4 5 5 5 5 5 5 6 7 8 8 8 8 8
3 3 4 4 4 4 45 32 57 23 23 23 23 23 23 23
As you can see, in the first row, from the last column there are multiple 7's. Similary, for the second row starting from the last column there are multiple 8's and for row3, there are multiple 23's.
如您所见,在第一行中,从最后一列有多个7。类似地,对于从最后一列开始的第二行有多个8,而对于row3,有多个23。
I want to replace the multiple columns of 7's,8's and 23's into #N/A, and keep only the first 7 in my result. I tried it with a simple logic, where IF(C15<>C14, C15,"N/A"), however this logic fails as it also converts the previous repeated values at row level too
我想将7,8和23的多个列替换为#N / A,并且只保留结果中的前7个。我用一个简单的逻辑尝试了它,其中IF(C15 <> C14,C15,“N / A”),然而这个逻辑失败了,因为它也转换了行级别的先前重复值
Below is how i am looking for the final result.
以下是我如何寻找最终结果。
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15
1 12 3 3 4 5 6 7 #N/A #N/A #N/A #N/A #N/A #N/A #N/A #N/A
2 1 4 5 5 5 5 5 5 6 7 8 #N/A #N/A #N/A #N/A
3 3 4 4 4 4 45 32 57 23 #N/A #N/A #N/A #N/A #N/A #N/A
Can i do it in excel, or i need to find some code in R?
我可以在excel中执行此操作,还是需要在R中找到一些代码?
Any leads would be appreciated.
任何线索将不胜感激。
THanks, Jay
谢谢,谢谢
Data:
数据:
df <- structure(list(C1 = c(12, 1, 3), C2 = c(3, 4, 4), C3 = c(3, 5, 4),
C4 = c(4, 5, 4), C5 = c(5, 5, 4), C6 = c(6, 5, 45),
C7 = c(7, 5, 32), C8 = c(7, 5, 57), C9 = c(7, 6, 23),
C10 = c(7, 7, 23), C11 = c(7, 8, 23), C12 = c(7, 8, 23),
C13 = c(7, 8, 23), C14 = c(7, 8, 23), C15 = c(7, 8, 23)),
.Names = c("C1", "C2", "C3", "C4", "C5",
"C6", "C7", "C8", "C9", "C10",
"C11", "C12", "C13", "C14", "C15"),
row.names = c(NA, -3L), class = "data.frame")
6 个解决方案
2
Here is an R solution using rleid
from data.table
:
这是使用data.table中的rleid的R解决方案:
library(data.table)
df[t(apply(df, 1, function(x) shift(rleid(x) == max(rleid(x)))))] <- NA
Result:
结果:
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15
1 12 3 3 4 5 6 7 NA NA NA NA NA NA NA NA
2 1 4 5 5 5 5 5 5 6 7 8 NA NA NA NA
3 3 4 4 4 4 45 32 57 23 NA NA NA NA NA NA
Note that this works even if the repeating integer appears anywhere in each row not connected to the repeating sequence at the end.
请注意,即使重复整数出现在每行中没有连接到末尾重复序列的任何位置,这仍然有效。
Data:
数据:
df = structure(list(C1 = c(12L, 1L, 3L), C2 = c(3L, 4L, 4L), C3 = c(3L,
5L, 4L), C4 = c(4L, 5L, 4L), C5 = c(5L, 5L, 4L), C6 = c(6L, 5L,
45L), C7 = c(7L, 5L, 32L), C8 = c(7L, 5L, 57L), C9 = c(7L, 6L,
23L), C10 = c(7L, 7L, 23L), C11 = c(7L, 8L, 23L), C12 = c(7L,
8L, 23L), C13 = c(7L, 8L, 23L), C14 = c(7L, 8L, 23L), C15 = c(7L,
8L, 23L)), .Names = c("C1", "C2", "C3", "C4", "C5", "C6", "C7",
"C8", "C9", "C10", "C11", "C12", "C13", "C14", "C15"), class = "data.frame", row.names = c(NA,
-3L))
1
With base R you can do it like the following.
This is the third version of the function, thanks to @useR for pointing out the wrong results of the others in some use cases.
使用基础R,您可以像下面这样做。这是该函数的第三个版本,感谢@useR在某些用例中指出其他函数的错误结果。
fun <- function(x, n){
r <- rle(x)
n <- length(x)
x[(n - r$lengths[length(r$lengths)] + 2):n] <- NA
x
}
x <- c(12,3,3,4,5,6,7,7,7,7,7,7,7,7,7)
fun(x)
#[1] 12 3 3 4 5 6 7 NA NA NA NA NA NA NA NA
y <- c(12,7,7,4,5,6,7,7,7,7,7,7,7,7,7)
fun(y)
#[1] 12 7 7 4 5 6 7 NA NA NA NA NA NA NA NA
Now with a data.frame
.
现在有了data.frame。
dat[] <- t(apply(dat, 1, fun))
# C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15
#1 12 3 3 4 5 6 7 NA NA NA NA NA NA NA NA
#2 1 4 5 5 5 5 5 5 6 7 8 NA NA NA NA
#3 3 4 4 4 4 45 32 57 23 NA NA NA NA NA NA
Data.
数据。
dat <- read.csv(text = "
C1,C2,C3,C4,C5,C6,C7,C8,C9,C10,C11,C12,C13,C14,C15
12,3,3,4,5,6,7,7,7,7,7,7,7,7,7
1,4,5,5,5,5,5,5,6,7,8,8,8,8,8
3,4,4,4,4,45,32,57,23,23,23,23,23,23,23
")
1
Data:
数据:
df1 <- read.table(text='C1,C2,C3,C4,C5,C6,C7,C8,C9,C10,C11,C12,C13,C14,C15
12,3,3,4,5,6,7,7,7,7,7,7,7,7,7
1,4,5,5,5,5,5,5,6,7,8,8,8,8,8
3,4,4,4,4,45,32,57,23,23,23,23,23,23,23', sep = ",", header = TRUE, stringsAsFactors = FALSE)
Code:
码:
apply(df1, 1, function(x) {
x <- rle(x)
len_x <- length(x$lengths)
if( (x$lengths)[len_x] > 1 ){ # check for end sequence
x <- list(lengths = c(x$lengths[-len_x], 1, x$lengths[len_x]- 1 ),
values = c(x$values, NA))
}
inverse.rle(x)
})
Output:
输出:
# [,1] [,2] [,3]
# [1,] 12 1 3
# [2,] 3 4 4
# [3,] 3 5 4
# [4,] 4 5 4
# [5,] 5 5 4
# [6,] 6 5 45
# [7,] 7 5 32
# [8,] NA 5 57
# [9,] NA 6 23
# [10,] NA 7 NA
# [11,] NA 8 NA
# [12,] NA NA NA
# [13,] NA NA NA
# [14,] NA NA NA
# [15,] NA NA NA