我有一个加载到oracle表中的1000个地址的列表.
完整地址在单列CompleteAddress Varchar(1000)中
样本数据:
12003 Main St New York NY 00991 123 ANYWHERE BLVD ABINGDON MD 21009
我需要将所有地址拆分为Street No + Street Name,City,State和Zip(有时是zip5 + zip4)
数据中没有逗号或斜杠.如何拆分地址?如果这很重要,我在C#工作.RegEx是适当的方法吗?
到目前为止,我尝试使用SubString,但我认为这不会很好.
string zipcode = completeAddress.Substring(completeAddress.Length - 5, 5); string mystate = completeAddress.Substring(completeAddress.Length - 8, 2);
有任何想法吗?
地址很复杂.非常复杂.它们是高度不规则和主观的东西.物流公司已经花费了数十亿美元以上的过程中几十年的努力使他们的感觉.
更好地利用别人的所作所为,而不是试图重新发明它.
你拥有的数据实际上非常有意义.它只是没有"感觉"非常有意义.企业喜欢将他们的地址数据分成许多小块,但为什么呢?所有这些小碎片意味着什么?为什么他们需要彼此区分?您拥有的数据是"地址".保留它,但添加它.利用现有信息推断更多信息.
使用地理编码API(Google?Bing?其他一些服务?价格等会有所不同)来搜索您拥有的数据并带回更强类型的数据.与您拥有的东西一起存放.例如,你有这个:
12003 Main St New York NY 00991
所以你在这里提出要求:
http://maps.googleapis.com/maps/api/geocode/json?address=12003+Main+St+New+York+NY+00991&sensor=false
你回来了:
{ "results" : [ { "address_components" : [ { "long_name" : "D R Main Street", "short_name" : "D R Main Street", "types" : [ "point_of_interest", "establishment" ] }, { "long_name" : "5", "short_name" : "5", "types" : [ "street_number" ] }, { "long_name" : "West 31st Street", "short_name" : "W 31st St", "types" : [ "route" ] }, { "long_name" : "Midtown", "short_name" : "Midtown", "types" : [ "neighborhood", "political" ] }, { "long_name" : "Manhattan", "short_name" : "Manhattan", "types" : [ "sublocality", "political" ] }, { "long_name" : "New York", "short_name" : "New York", "types" : [ "locality", "political" ] }, { "long_name" : "New York", "short_name" : "New York", "types" : [ "administrative_area_level_2", "political" ] }, { "long_name" : "New York", "short_name" : "NY", "types" : [ "administrative_area_level_1", "political" ] }, { "long_name" : "United States", "short_name" : "US", "types" : [ "country", "political" ] }, { "long_name" : "10001", "short_name" : "10001", "types" : [ "postal_code" ] }, { "long_name" : "4414", "short_name" : "4414", "types" : [] } ], "formatted_address" : "D R Main Street, 5 West 31st Street, New York, NY 10001, USA", "geometry" : { "location" : { "lat" : 40.7468529, "lng" : -73.9865046 }, "location_type" : "APPROXIMATE", "viewport" : { "northeast" : { "lat" : 40.7482018802915, "lng" : -73.98515561970851 }, "southwest" : { "lat" : 40.7455039197085, "lng" : -73.98785358029151 } } }, "partial_match" : true, "types" : [ "point_of_interest", "establishment" ] } ], "status" : "OK" }
现在这是一些有意义的数据.也许不是贵公司某人所认为的数据的"单位",而是有意义和有用的.对于数据中的任何给定地址,您可以自动执行此操作.
让用户按照他们所知的方式输入他们的地址.将该主观地址存储为用户输入的版本.对其进行地理编码以获取更多结构化数据以与其一起存储.