当前位置: 开发笔记 > 编程语言 > 正文

查找字符串是否是迭代子字符串？-Findingifastringisaniterativesubstring?

作者：刘德华 | 来源：互联网 | 2023-10-13 08:45

IhaveastringS.HowcanIfindifthestringfollowsSnT.我有一个字符串S.我怎么能找到字符串是否遵循SnT。Examp

I have a string S. How can I find if the string follows S = nT.

我有一个字符串S.我怎么能找到字符串是否遵循S = nT。

Examples:
Function should return true if
1) S = "abab"
2) S = "abcdabcd"
3) S = "abcabcabc"
4) S = "zzxzzxzzx"

示例:如果1)S =“abab”2)S =“abcdabcd”3)S =“abcabcabc”4)S =“zzxzzxzzx”,则函数应返回true

But if S="abcb" returns false.

但是如果S =“abcb”返回false。

I though maybe we can repeatedly call KMP on substrings of S and then decide.

我虽然也许我们可以在S的子串上反复调用KMP然后决定。

eg: for "abab": call on KMP on "a". it returns 2(two instances). now 2*len("a")!=len(s)
call on KMP on "ab". it returns 2. now 2*len("ab")==len(s) so return true

例如:对于“abab”:在“a”上拨打KMP。它返回2(两个实例)。现在2 * len(“a”)!= len(s)在“ab”上拨打KMP。它返回2.现在2 * len(“ab”)== len(s)所以返回true

Can you suggest any better algorithms?

你能建议更好的算法吗?

8 个解决方案

#1

I can think of a heuristic, only call KMP on a sub string if Len(original string)/Len of(sub string) is a positive integer.

如果Len(原始字符串)/ Len(子字符串)是一个正整数,我可以想到一个启发式,只在子字符串上调用KMP。

Also the maximum length of the sub string has to be less than N/2.

此外,子串的最大长度必须小于N / 2。

EDIT

Using these Heuristics Iwrote the follwing python code because my C is rusty at the moment

使用这些启发式方法我写了下面的python代码,因为我的C现在生锈了

oldstr='ABCDABCD'    

for i in xrange(0,len(oldstr)/2):
       newslice=oldstr[0:i+1]
         if newslice*(len(oldstr)/len(newslice)) == oldstr:
             print 'pattern found', newslice
             break

#2

You actually only need to care about testing substring lengths that are equal to the full string length divided by a prime number. The reason is: If S contains n copies of T, and n is not prime, then n = ab, and so S actually also contains a copies of bT (where "bT" means "T repeated b times"). This is an extension of anijhaw's answer.

实际上,您只需要关心测试子串长度等于完整字符串长度除以素数。原因是:如果S包含n个拷贝的T,并且n不是素数,则n = ab,因此S实际上还包含bT的副本(其中“bT”表示“T重复b次”)。这是anijhaw答案的延伸。

int primes[] = { 2, 3, 5, 7, 11, 13, 17 };  /* There are one or two more... ;) */
int nPrimes = sizeof primes / sizeof primes[0];

/* Passing in the string length instead of assuming ASCIIZ strings means we
 * don't have to modify the string in-place or allocate memory for new copies
 * to handle recursion. */
int is_iterative(char *s, int len) {
    int i, j;
    for (i = 0; i

 
Notice that when recursing to find even shorter repeated substrings, we don't need to check the entire string again, just the first larger repeat -- since we've already established that the remaining large repeats are, well, repeats of the first one. :) 
请注意,当递归查找更短的重复子串时,我们不需要再次检查整个字符串,只需要检查第一个较大的重复 - 因为我们已经确定剩余的大重复是,重复第一个重复。 :)

                        
                           
							  
							    #3
							    
							    
							      
1  
I don't see that the KMP algorithm helps in this case. It is not a matter of determining where to begin the next match. It seems that one way to reduce the search time is to start with the longest possibility (half the length) and work downward. The only lengths that neeed to be tested are lengths that evenly divide into the total length. Here is an example in Ruby. I should add that I realize the question was tagged as C, but this was just a simple way to show the algorithm I was thinking about (and allowed me to test that it worked). 
在这种情况下,我没有看到KMP算法有帮助。这不是确定从哪里开始下一场比赛的问题。似乎减少搜索时间的一种方法是从最长的可能性(长度的一半)开始并向下工作。需要测试的唯一长度是均匀分成总长度的长度。这是Ruby中的一个例子。我应该补充一点,我意识到问题被标记为C,但这只是一种简单的方式来显示我正在考虑的算法(并允许我测试它是否有效)。 
class String
def IsPattern( )
    len = self.length
    testlen = len / 2
    # the fastest is to start with two entries and work down
    while ( testlen > 0 )
        # if this is not an even divisor then it can't fit the pattern
        if ( len % testlen == 0 )
            # evenly divides, so it may match
            if ( self == self[0..testlen-1] * ( len / testlen ))
                return true
            end

        end
        testlen = testlen - 1
    end
    # must not have matched
    false
end
end

if __FILE__ == $0

   ARGV.each do |str|
       puts "%s, %s" % [str, str.IsPattern ? "true" : "false" ]
   end

end



[C:\test]ruby patterntest.rb a aa abab abcdabcd abcabcabc zzxzzxzzx abcd
a, false
aa, true
abab, true
abcdabcd, true
abcabcabc, true
zzxzzxzzx, true
abcd, false

							     
							                          
                           
							  
							    #4
							    
							    
							      
0  
I suppose you could try the following algorithm: 
我想你可以尝试以下算法: 
Lets L to be a possible substring length which generates the original word. For L from 1 to strlen(s)/2 check if the first character acquires in all L*i positions for i from 1 to strlen(s)/L. If it does then it could be a possible solution and you should check it with memcmp, if not try the next L. Of course you can skip some L values which are not dividing strlen(s). 
让L成为可能产生原始单词的子串长度。对于L从1到strlen(s)/ 2,检查第一个字符是否在i的所有L * i位置从1到strlen(s)/ L获得。如果它确实那么它可能是一个可能的解决方案你应该用memcmp检查它,如果没有尝试下一个L.当然你可以跳过一些没有划分strlen的L值。
							     
							                          
                           
							  
							    #5
							    
							    
							      
0  
Try this: 
    char s[] = "abcabcabcabc";
int nStringLength = strlen (s);
int nMaxCheckLength = nStringLength / 2;
int nThisOffset;
int nNumberOfSubStrings;
char cMustMatch;
char cCompare;
BOOL bThisSubStringLengthRepeats;
// Check all sub string lengths up to half the total length
for (int nSubStringLength = 1;  nSubStringLength <= nMaxCheckLength;  nSubStringLength++)
{
    // How many substrings will there be?
    nNumberOfSubStrings = nStringLength / nSubStringLength;

    // Only check substrings that fit exactly
    if (nSubStringLength * nNumberOfSubStrings == nStringLength)
    {
        // Assume it's going to be ok
        bThisSubStringLengthRepeats = TRUE;

        // check each character in substring
        for (nThisOffset = 0;  nThisOffset 

							     

							  
                        
                           
							  
							    #6
							    
							    
							      
0  
This is Java code but you should get the idea: 
这是Java代码,但你应该明白这个想法: 
        String str = "ababcababc";
    int repPos = 0;
    int repLen = 0;
    for( int i = 0; i 
 
This will return the length of the shortest repeating chunk or the length of the string if there's no repetition. 
如果没有重复,这将返回最短重复块的长度或字符串的长度。
							     

							  
                        
                           
							  
							    #7
							    
							    
							      
0  
You can build the suffix array of the string, sort it.
 Now look for series of ever doubling suffixes and when you've reached one that's the size of the entire string (S) the first in the series will give you T. 
您可以构建字符串的后缀数组,对其进行排序。现在寻找一系列加倍的后缀,当你达到整个字符串(S)的大小时,系列中的第一个会给你T. 
For example: 
abcd <-- T
abcdabcd <-- S
bcd
bcdabcd
cd
cdabcd
d
dabcd

x
xzzx
xzzxzzx
zx
zxzzx
zxzzxzzx
zzx <-- T
zzxzzx
zzxzzxzzx <-- S

a
apa
apapa
apapapa
pa <-- T
papa
papapa <-- Another T, not detected by this algo
papapapa <-- S

							     
							                          
                           
							  
							    #8
							    
							    
							      
0  
The brute force approach would be to pick all possible substrings and see if they can form the entire string.  
蛮力方法是挑选所有可能的子串,看看它们是否可以形成整个弦。 
We can do one better using the observation that for a substring to be a valid candidate len(str) % len(substr) == 0. This can be deduced from the problem statement. 
我们可以使用观察结果更好地做一个子字符串作为有效候选len(str)%len(substr)== 0.这可以从问题陈述中推断出来。 
Here is the full code: 
这是完整的代码: 
bool isRational(const string &str){
    int len = str.length();
    const auto &factors = getFactors(len); // this would include 1 but exclude len
    // sort(factors.begin(), factors.end()); To get out of the loop faster. Why? See https://stackoverflow.com/a/4698155/1043773
    for(auto iter = factors.rbegin(); iter != factors.rend(); ++iter){
        auto factor = *iter;
        bool result = true;
        for(int i = 0; i 
 
Notice that there is a faster variation in terms of time complexity which uses KMP. 
请注意,使用KMP的时间复杂度存在更快的变化。 
The above algorithm is O(N * factorCount(N)) But the good thing about this algorithm is it can bail out much faster than the KMP algorithm. Also the number of factors do not grow much.  
上面的算法是O(N * factorCount(N))但是这个算法的好处是它可以比KMP算法更快地拯救。此外,因素的数量也不会增长太多。 
Here is the graph of [i, factorCount(i)] for i <= 10^6  
这是i <= 10 ^ 6的[i,factorCount(i)]的图表 
 
Here is how the algorithm performs as against the KMP algorithm. Red graph is O(N * factorCount(N)) and Blue is O(N) KMP  
以下是算法如何执行KMP算法。红色图是O(N * factorCount(N)),蓝色是O(N)KMP 
The KMP code is picked from here 
从这里挑选KMP代码




    
        
                        string
                        substring
                        go
                        function
                        ci
                        instance
                        算法
                        uri
                        int
                    
    



    
        写下你的评论吧 !
        
            
                吐个槽吧,看都看了
            
            
                
                                        会员登录 | 用户注册
                                    
                
            
        

        
    

    
        推荐阅读
        
            
                                
                    
                        main
                        第四章高阶函数（参数传递、高阶函数、lambda表达式）（python进阶）的讲解和应用
                    

                    
                                                
                            
                        
                                                
                        本文主要讲解了第四章高阶函数（参数传递、高阶函数、lambda表达式）的相关知识，包括函数参数传递机制和赋值机制、引用传递的概念和应用、默认参数的定义和使用等内容。同时介绍了高阶函数和lambda表达式的概念，并给出了一些实例代码进行演示。对于想要进一步提升python编程能力的读者来说，本文将是一个不错的学习资料。 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2023-12-12 15:52:48
                    

                

                
                                
                    
                        function
                        Python正则表达式学习记录及常用方法
                    

                    
                                                
                            
                        
                                                
                        本文记录了学习Python正则表达式的过程，介绍了re模块的常用方法re.search，并解释了rawstring的作用。正则表达式是一种方便检查字符串匹配模式的工具，通过本文的学习可以掌握Python中使用正则表达式的基本方法。 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2023-12-13 16:37:19
                    

                

                                
                    
                    
                
                
                                
                    
                        main
                        hdu 5439（找规律）的数列求和问题
                    

                    
                                                
                        本文讨论了一个数列求和问题，该数列按照一定规律生成。通过观察数列的规律，我们可以得出求解该问题的算法。具体算法为计算前n项i*f[i]的和，其中f[i]表示数列中有i个数字。根据参考的思路，我们可以将算法的时间复杂度控制在O(n)，即计算到5e5即可满足1e9的要求。 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2023-12-12 14:05:58
                    

                

                
                                
                    
                        function
                        python里33个关键字符是什么意思_Python 关键知识点
                    

                    
                                                
                        1关于字符串相邻的两个或多个字符串字面值(引号引起来的字符)将会自动连接到一起：str_catpython!str_cat输出：python!把很长 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2023-10-17 16:31:38
                    

                

                
                                
                    
                        python
                        2018年人工智能大数据的爆发，学Java还是Python？
                    

                    
                                                
                            
                        
                                                
                        本文介绍了2018年人工智能大数据的爆发以及学习Java和Python的相关知识。在人工智能和大数据时代，Java和Python这两门编程语言都很优秀且火爆。选择学习哪门语言要根据个人兴趣爱好来决定。Python是一门拥有简洁语法的高级编程语言，容易上手。其特色之一是强制使用空白符作为语句缩进，使得新手可以快速上手。目前，Python在人工智能领域有着广泛的应用。如果对Java、Python或大数据感兴趣，欢迎加入qq群458345782。 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2023-12-14 20:08:28
                    

                

                
                                
                    
                        main
                        Open judge C16H: Magical Balls 快速幂+逆元问题解析
                    

                    
                                                
                        本文主要解析了Open judge C16H问题中涉及到的Magical Balls的快速幂和逆元算法，并给出了问题的解析和解决方法。详细介绍了问题的背景和规则，并给出了相应的算法解析和实现步骤。通过本文的解析，读者可以更好地理解和解决Open judge C16H问题中的Magical Balls部分。 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2023-12-14 12:03:27
                    

                

                
                                
                    
                        select
                        VB.NET在线急等问题解决方法，如何统计数据库字段下的数据并显示在文本框里？
                    

                    
                                                
                        本文介绍了一个在线急等问题解决方法，即如何统计数据库中某个字段下的所有数据，并将结果显示在文本框里。作者提到了自己是一个菜鸟，希望能够得到帮助。作者使用的是ACCESS数据库，并且给出了一个例子，希望得到的结果是560。作者还提到自己已经尝试了使用"select sum(字段2) from 表名"的语句，得到的结果是650，但不知道如何得到560。希望能够得到解决方案。 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2023-12-13 15:15:30
                    

                

                
                                
                    
                        function
                        利用Visual Basic开发SAP接口程序初探的方法与原理
                    

                    
                                                
                            
                        
                                                
                        本文介绍了利用Visual Basic开发SAP接口程序的方法与原理，以及SAP R/3系统的特点和二次开发平台ABAP的使用。通过程序接口自动读取SAP R/3的数据表或视图，在外部进行处理和利用水晶报表等工具生成符合中国人习惯的报表样式。具体介绍了RFC调用的原理和模型，并强调本文主要不讨论SAP R/3函数的开发，而是针对使用SAP的公司的非ABAP开发人员提供了初步的接口程序开发指导。 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2023-12-13 10:56:31
                    

                

                
                                
                    
                        function
                        C++字符字符串处理及字符集编码方案
                    

                    
                                                
                            
                        
                                                
                        本文介绍了C++中字符字符串处理的问题，并详细解释了字符集编码方案，包括UNICODE、Windows apps采用的UTF-16编码、ASCII、SBCS和DBCS编码方案。同时说明了ANSI C标准和Windows中的字符/字符串数据类型实现。文章还提到了在编译时需要定义UNICODE宏以支持unicode编码，否则将使用windows code page编译。最后，给出了相关的头文件和数据类型定义。 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2023-12-13 04:59:58
                    

                

                
                                
                    
                        function
                        Which is more efficient: char str[] or char *str?
                    

                    
                                                
                        This article discusses the efficiency of using char str[] and char *str and whether there is any reason to prefer one over the other. It explains the difference between the two and provides an example to illustrate their usage. ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2023-12-12 10:13:35
                    

                

                
                                
                    
                        select
                        十大经典排序算法动图演示+Python实现
                    

                    
                                                
                            
                        
                                                
                        本文介绍了十大经典排序算法的原理、演示和Python实现。排序算法分为内部排序和外部排序，常见的内部排序算法有插入排序、希尔排序、选择排序、冒泡排序、归并排序、快速排序、堆排序、基数排序等。文章还解释了时间复杂度和稳定性的概念，并提供了相关的名词解释。 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2023-12-10 19:28:59
                    

                

                
                                
                    
                        function
                        用Vue实现的Demo商品管理效果图及实现代码
                    

                    
                                                
                            
                        
                                                
                        本文介绍了一个使用Vue实现的Demo商品管理的效果图及实现代码。 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2023-12-10 15:14:25
                    

                

                
                                
                    
                        match
                        Python基础篇：315道题目及答案整理，帮助你检验学习成果
                    

                    
                                                
                        本文整理了315道Python基础题目及答案，帮助读者检验学习成果。文章介绍了学习Python的途径、Python与其他编程语言的对比、解释型和编译型编程语言的简述、Python解释器的种类和特点、位和字节的关系、以及至少5个PEP8规范。对于想要检验自己学习成果的读者，这些题目将是一个不错的选择。请注意，答案在视频中，本文不提供答案。 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2023-12-10 14:33:46
                    

                

                
                                
                    
                        main
                        GTK+浅谈之十五GObject面向对象的继承
                    

                    
                                                
                        本文介绍了GTK+中的GObject对象系统，该系统是基于GLib和C语言完成的面向对象的框架，提供了灵活、可扩展且易于映射到其他语言的特性。其中最重要的是GType，它是GLib运行时类型认证和管理系统的基础，通过注册和管理基本数据类型、用户定义对象和界面类型来实现对象的继承。文章详细解释了GObject系统中对象的三个部分：唯一的ID标识、类结构和实例结构。 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2023-12-10 12:33:04
                    

                

                
                                
                    
                        main
                        查找给定字符串的所有不同回文子字符串
                    

                    
                                                
                        查找给定字符串的所有不同回文子字符串原文:https://www ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2023-10-17 19:11:18