热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

如何使用agrep获得模糊字符串匹配的精确常用"max.distance"值?

如何解决《如何使用agrep获得模糊字符串匹配的精确常用"max.distance"值?》经验,有好办法吗?

我试图找出使用agrep在两个字符串名称之间进行模糊字符串匹配的最佳精度.

但是,我需要选择一个精度"max.distance"来在我想要匹配的所有字符串中应用相同的字符串,因为字符串的数量很大.我不能为我想要匹配的每个字符串选择最佳精度值"max.distance".

例如,假设每个"BANK OF AMERICA CORP"和"1st Capital Bank"使用精度"max.distance"为"0.2","0.1"和"0.05".

首先,下面是"BANK OF AMERICA CORP"的"max.distance"为"0.2","0.1"和"0.05":

    > agrep("BANK OF AMERICA CORP",C1999_0[,2],ignore.case = TRUE, value = TRUE,fixed = TRUE,max.distance =0.2)
     [1] "BANK OF AMERICA/PRIVATE BANK WEST"   "BANK OF AMERICA SECURITIES"         
     [3] "BANK OF AMERICA SEC LLC"             "BANK OF AMERICA SECURITIES LLC"     
     [5] "BANK OF AMERICA NT & SA"             "BANK OF AMERICA CORP"               
     [7] "ALLIANZ OF AMERICA CORP"             "Bank of America Securities/Vice Pre"
     [9] "Bank of America Securities/Investme" "Bank of America/President"          
    [11] "Bank of America Securities LLC/Prin" "Bank of America Securities LLC/Mana"
    [13] "Bank of America Securities LLC/Inve" "Bank of America Securities/Principa"
    [15] "Bank of America Securities LLC/Bank" "Bank of America Sec/Investment Bank"
    [17] "Bank Of America Securities/Managing" "Bank of America/Chairman--Midwest A"
    [19] "Bank of America Securities LLC/Vice" "Bank of America Corporation/Sales C"
    [21] "Bank of America Securities/Broker"   "Bank of America Corporation/Banker" 
    [23] "Bank of America Corporation/Senior"  "Bank of America Securities/Equity R"
    [25] "Bank of America Corporation/Vice Ch" "BANK OF AMERICA CORPORATION"        
    [27] "BANK OF AMERICA HEADQUARTERS"        "BANK OF AMERICA ADMINISTRATION"     
    [29] "BANK OF AMERICA N A"                 "Bank of America/Commercial Banking" 
    [31] "Bank of America Sec./Investment Ban"
    > 
    > agrep("BANK OF AMERICA CORP",C1999_0[,2],ignore.case = TRUE, value = TRUE,fixed = TRUE,max.distance =0.1)
    [1] "BANK OF AMERICA CORP"                "ALLIANZ OF AMERICA CORP"            
    [3] "Bank of America Corporation/Sales C" "Bank of America Corporation/Banker" 
    [5] "Bank of America Corporation/Senior"  "Bank of America Corporation/Vice Ch"
    [7] "BANK OF AMERICA CORPORATION"        
    > 
    > agrep("BANK OF AMERICA CORP",C1999_0[,2],ignore.case = TRUE, value = TRUE,fixed = TRUE,max.distance =0.05)
    [1] "BANK OF AMERICA CORP"                "Bank of America Corporation/Sales C"
    [3] "Bank of America Corporation/Banker"  "Bank of America Corporation/Senior" 
    [5] "Bank of America Corporation/Vice Ch" "BANK OF AMERICA CORPORATION"        

然后下面是"第一资本银行","max.distance"为"0.2","0.1"和"0.05":

    > agrep("1st Capital Bank",C1999_0[,2],ignore.case = TRUE, value = TRUE,fixed = TRUE,max.distance =0.2)
      [1] "HURST CAPITAL PARTNERS"             
      [2] "SOY CAPITAL BANK"                   
      [3] "FIRST CAPITOL BANK OF VICTOR"       
      [4] "OSTERWEIS CAPITAL MANAGEMENT"       
      [5] "1ST NATIONAL BANK"                  
      [6] "FIRST CAPITAL BANK"                 
      [7] "SEATTLE 1ST NAT'L BANK"             
      [8] "FIELD POINT CAPITAL MANAGEMENT"     
      [9] "SUMMERSET CAPITAL MANAGEMENT"       
     [10] "AMERIQUEST CAPITAL ASSOC"           
     [11] "BB&T CAPITAL MARKETS"               
     [12] "HUGHES CAPITAL MANAGEMENT"          
     [13] "WELLS CAPITAL MANAGEMENT"           
     [14] "SUPERIOR ST CAPITAL ADVISORS"       
     [15] "ORMES CAPITAL MARKETS INC"          
     [16] "1ST NAT'L BANK OF IL"               
     [17] "ADVENT CAPITAL MANAGEMENT"          
     [18] "1ST CAPITOL BANK"                   
     [19] "BIONDI REISS CAPITAL MANAGEMENT"    
     [20] "CCYBYS CAPITAL MARKETS"             
     [21] "SEACOAST CAPITAL PARTNERS"          
     [22] "DOUGLAS CAPITAL MANAGEMENT"         
     [23] "HIGHFIELDS CAPITAL MANAGEMENT"      
     [24] "PRECEPT CAPITAL MANAGEMENT LP"      
     [25] "AUGUST CAPITAL MANAGEMENT"          
     [26] "SAKSA CAPITAL MANAGEMENT"           
     [27] "IMS CAPITAL MANAGEMENT"             
     [28] "TRENT CAPITAL MANAGEMENT"           
     [29] "Ormes Capital Management"           
     [30] "GARNET CAPITAL MANAGEMENT LLC"      
     [31] "INTERFASE CAPITAL MANAGERS"         
     [32] "RJS CAPITAL MANAGEMENT INC"         
     [33] "1ST NATIONAL BANK OF DE KALB"       
     [34] "1ST NAT'L BANK OF PHILLIPS CO"      
     [35] "1ST NAT'L BANK OF OKLAHOMA"         
     [36] "PROGRESS CAPITAL MANAGEMENT INC"    
     [37] "CAPITAL BANK & TRUST"               
     [38] "1ST NATL BANK"                      
     [39] "ASB Capital Management/Real Estate" 
     [40] "Sears Capital Management"           
     [41] "Osterweis Capital Management/Invest"
     [42] "Cerberus Capital Management LP/Asse"
     [43] "LVS Capital Management/President"   
     [44] "1st Central Bank/Banker"            
     [45] "Summit Capital Management"          
     [46] "Orwes Capital Markets/Stockbroker"  
     [47] "Ormes Capital Management/Investment"
     [48] "Nevis Capital Management/Investment"
     [49] "Duncan Hurst Capital Management"    
     [50] "Progress Capital Management/Preside"
     [51] "Cerberus Capital Management LP"     
     [52] "Wit Capital/Banker"                 
     [53] "Ormes Capital Markets Inc."         
     [54] "Ormes Capital Markets/President & C"
     [55] "Berents & Hess Capital Management"  
     [56] "Progress Capital Management/Venture"
     [57] "First Capital Bank of KY"           
     [58] "Foothill Capital/Banker"            
     [59] "Pequot Capital Management/Equity Re"
     [60] "First Dominion Capital/Banking"     
     [61] "Greenwhich Capital/Banker"          
     [62] "Veritas Capital Management/Banker"  
     [63] "Veritas Capital Management/Investme"
     [64] "Lesese Capital Management/Investmen"
     [65] "Douglas Capital Management/Investme"
     [66] "FIRST NATINAL BANK OF AMARILLO"     
     [67] "NEVIS CAPITAL MANAGEMENT"           
     [68] "VERITAS CAPITAL MANAGEMENT"         
     [69] "SIEBERT CAPITAL MARKETS"            
     [70] "HOURGLASS CAPITAL MANAGEMENT"       
     [71] "1ST NATIONAL BANK DALHART"          
     [72] "TEXAS CAPITAL BANK"                 
     [73] "NICHOLAS CAPITAL MANAGEMENT"        
     [74] "CERBUS CAPITAL MANAGEMENT"          
     [75] "CROESUS CAPITAL MANAGEMENT"         
     [76] "EAST WEST CAPITAL ASSOCIATES INC"   
     [77] "PRENDERGAST CAPITAL MANAGEMENT"     
     [78] "NANTUCKET CAPITAL MANAGEMENT"       
     [79] "1ST NATIONAL BANK TEMPLE"           
     [80] "ENTRUST CAPITAL INC"                
     [81] "1ST NATIONAL BANK OF IL"            
     [82] "SIMMS CAPITAL MANAGEMENT"           
     [83] "FIRST CAPITAL ADVISORS"             
     [84] "FIRST CAPITAL MANAGEMENT LTD"       
     [85] "1ST NATIONAL BANK & TRUST"          
     [86] "PENTECOST CAPITAL MANAGEMENT INC"   
     [87] "EAST-WEST CAPITAL ASSOCIATES"       
     [88] "1ST NAT'L BANK OF JOLIET"           
     [89] "FIRST CAPITOL BANK OF VICTO"        
     [90] "FIRST CAPITAL FINANCIAL"            
     [91] "PACIFIC COAST CAPITAL PARTNERS"     
     [92] "FIRST CAPITOL BANK"                 
     [93] "FIRST CAPITAL ENGINEERING"          
     [94] "MIDWEST CAPITOL MANAGEMENT"         
     [95] "PEQUOT CAPITAL MANAGEMENT"          
     [96] "AGGOTT CAPITAL MANAGEMENT"          
     [97] "SIMMS CAPITAL MANAGEMENT INC"       
     [98] "PHILLIPS CAPITAL MANAGEMENT LLC"    
     [99] "1ST NATIONAL BANK OF COLD SP"       
    [100] "SOY CAPITOL BANK"                   
    > 
    > agrep("1st Capital Bank",C1999_0[,2],ignore.case = TRUE, value = TRUE,fixed = TRUE,max.distance =0.1)
    [1] "FIRST CAPITOL BANK OF VICTOR" "FIRST CAPITAL BANK"          
    [3] "1ST CAPITOL BANK"             "First Capital Bank of KY"    
    [5] "TEXAS CAPITAL BANK"           "FIRST CAPITOL BANK OF VICTO" 
    [7] "FIRST CAPITOL BANK"          
    > 
    > agrep("1st Capital Bank",C1999_0[,2],ignore.case = TRUE, value = TRUE,fixed = TRUE,max.distance =0.05)
    [1] "FIRST CAPITAL BANK"       "1ST CAPITOL BANK"        
    [3] "First Capital Bank of KY"

正如您所看到的,很难找到"max.distance"的公共精度值来申请每个字符串,例如"BANK OF AMERICA CORP"和"1st Capital Bank".除了这两个之外,我还有更多的公司名称,这就是为什么我很难找到模糊字符串匹配的公共精度值和命令.

C1999_0的原始数据文件太大而无法附加,所以我认为只使用上面显示的输出值就足以复制了.

我知道有几个子类别可以操作,例如成本,替换,插入等,但它们与仅更改"max.distance"值本身没有太大区别.

如果我能得到帮助,我将非常感激!


推荐阅读
  • 本文介绍了一个在线急等问题解决方法,即如何统计数据库中某个字段下的所有数据,并将结果显示在文本框里。作者提到了自己是一个菜鸟,希望能够得到帮助。作者使用的是ACCESS数据库,并且给出了一个例子,希望得到的结果是560。作者还提到自己已经尝试了使用"select sum(字段2) from 表名"的语句,得到的结果是650,但不知道如何得到560。希望能够得到解决方案。 ... [详细]
  • CF:3D City Model(小思维)问题解析和代码实现
    本文通过解析CF:3D City Model问题,介绍了问题的背景和要求,并给出了相应的代码实现。该问题涉及到在一个矩形的网格上建造城市的情景,每个网格单元可以作为建筑的基础,建筑由多个立方体叠加而成。文章详细讲解了问题的解决思路,并给出了相应的代码实现供读者参考。 ... [详细]
  • Go Cobra命令行工具入门教程
    本文介绍了Go语言实现的命令行工具Cobra的基本概念、安装方法和入门实践。Cobra被广泛应用于各种项目中,如Kubernetes、Hugo和Github CLI等。通过使用Cobra,我们可以快速创建命令行工具,适用于写测试脚本和各种服务的Admin CLI。文章还通过一个简单的demo演示了Cobra的使用方法。 ... [详细]
  • Android Studio Bumblebee | 2021.1.1(大黄蜂版本使用介绍)
    本文介绍了Android Studio Bumblebee | 2021.1.1(大黄蜂版本)的使用方法和相关知识,包括Gradle的介绍、设备管理器的配置、无线调试、新版本问题等内容。同时还提供了更新版本的下载地址和启动页面截图。 ... [详细]
  • 利用Visual Basic开发SAP接口程序初探的方法与原理
    本文介绍了利用Visual Basic开发SAP接口程序的方法与原理,以及SAP R/3系统的特点和二次开发平台ABAP的使用。通过程序接口自动读取SAP R/3的数据表或视图,在外部进行处理和利用水晶报表等工具生成符合中国人习惯的报表样式。具体介绍了RFC调用的原理和模型,并强调本文主要不讨论SAP R/3函数的开发,而是针对使用SAP的公司的非ABAP开发人员提供了初步的接口程序开发指导。 ... [详细]
  • This article discusses the efficiency of using char str[] and char *str and whether there is any reason to prefer one over the other. It explains the difference between the two and provides an example to illustrate their usage. ... [详细]
  • 基于PgpoolII的PostgreSQL集群安装与配置教程
    本文介绍了基于PgpoolII的PostgreSQL集群的安装与配置教程。Pgpool-II是一个位于PostgreSQL服务器和PostgreSQL数据库客户端之间的中间件,提供了连接池、复制、负载均衡、缓存、看门狗、限制链接等功能,可以用于搭建高可用的PostgreSQL集群。文章详细介绍了通过yum安装Pgpool-II的步骤,并提供了相关的官方参考地址。 ... [详细]
  • 生成式对抗网络模型综述摘要生成式对抗网络模型(GAN)是基于深度学习的一种强大的生成模型,可以应用于计算机视觉、自然语言处理、半监督学习等重要领域。生成式对抗网络 ... [详细]
  • 本文介绍了设计师伊振华受邀参与沈阳市智慧城市运行管理中心项目的整体设计,并以数字赋能和创新驱动高质量发展的理念,建设了集成、智慧、高效的一体化城市综合管理平台,促进了城市的数字化转型。该中心被称为当代城市的智能心脏,为沈阳市的智慧城市建设做出了重要贡献。 ... [详细]
  • 本文讨论了在Windows 8上安装gvim中插件时出现的错误加载问题。作者将EasyMotion插件放在了正确的位置,但加载时却出现了错误。作者提供了下载链接和之前放置插件的位置,并列出了出现的错误信息。 ... [详细]
  • 本文详细介绍了Spring的JdbcTemplate的使用方法,包括执行存储过程、存储函数的call()方法,执行任何SQL语句的execute()方法,单个更新和批量更新的update()和batchUpdate()方法,以及单查和列表查询的query()和queryForXXX()方法。提供了经过测试的API供使用。 ... [详细]
  • 如何在服务器主机上实现文件共享的方法和工具
    本文介绍了在服务器主机上实现文件共享的方法和工具,包括Linux主机和Windows主机的文件传输方式,Web运维和FTP/SFTP客户端运维两种方式,以及使用WinSCP工具将文件上传至Linux云服务器的操作方法。此外,还介绍了在迁移过程中需要安装迁移Agent并输入目的端服务器所在华为云的AK/SK,以及主机迁移服务会收集的源端服务器信息。 ... [详细]
  • C++字符字符串处理及字符集编码方案
    本文介绍了C++中字符字符串处理的问题,并详细解释了字符集编码方案,包括UNICODE、Windows apps采用的UTF-16编码、ASCII、SBCS和DBCS编码方案。同时说明了ANSI C标准和Windows中的字符/字符串数据类型实现。文章还提到了在编译时需要定义UNICODE宏以支持unicode编码,否则将使用windows code page编译。最后,给出了相关的头文件和数据类型定义。 ... [详细]
  • 【技术分享】一个 ELF 蠕虫分析
    【技术分享】一个 ELF 蠕虫分析 ... [详细]
  • C++程序员视角下的Rust语言
    自上世纪80年代初问世以来,C就是一门非常重要的系统级编程语言。到目前为止,仍然在很多注重性能、实时性、偏硬件等领域发挥着重要的作用。C和C一样&#x ... [详细]
author-avatar
残念易_138
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有