作者:天蝎丿冷傲丨 | 来源:互联网 | 2023-06-13 09:43
我正在加载以下JSON数据。假设这是来自Kafka的deserialized string
。
{"message":{"title": {"titleid": "111","titlename": "AAA","titledesc": null},"customer": {"customerDetail": {"customerid": 1107879,"rates": [{"type": "Commission","amount": 0.0,"currency": null},{"type": "Total CV","currency": null}]}}}}
{"message":{"title": {"titleid": "222","titlename": "BBB","customer": {"customerDetail": {"customerid": 1107875,"currency": null}]}}}}
{"message":{"title": {"titleid": "333","titlename": "CCC","customer": {"customerDetail": {"customerid": 1107882,"currency": null}]}}}}
{"message":{"title": {"titleid": "444","titlename": "DDD","customer": {"customerDetail": {"customerid": 1107880,"currency": null}]}}}}
{"message":{"title": {"titleid": "555","titlename": "EEE","customer": {"customerDetail": {"customerid": 1107884,"currency": null}]}}}}
val ds = spark.read.textFile("./src/main/resources/json/JsonWithNull.txt").as[String]
ds.printSchema()
ds.show(false)
root
|-- value: string (nullable = true)
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|value |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{"message":{"title": {"titleid": "111","currency": null}]}}}}|
|{"message":{"title": {"titleid": "222","currency": null}]}}}}|
|{"message":{"title": {"titleid": "333","currency": null}]}}}}|
|{"message":{"title": {"titleid": "444","currency": null}]}}}}|
|{"message":{"title": {"titleid": "555","currency": null}]}}}}|
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
然后将Dataset [String]作为JSON加载,我看到schema
字段内的curreny
包括rates
字段中的所有列。
val jsOnDF= spark.read.json(ds)
jsonDF.printSchema()
jsonDF.show(false)
root
|-- message: struct (nullable = true)
| |-- customer: struct (nullable = true)
| | |-- customerDetail: struct (nullable = true)
| | | |-- customerid: long (nullable = true)
| | | |-- rates: array (nullable = true)
| | | | |-- element: struct (cOntainsnull= true)
| | | | | |-- amount: double (nullable = true)
| | | | | |-- currency: string (nullable = true)
| | | | | |-- type: string (nullable = true)
| |-- title: struct (nullable = true)
| | |-- titledesc: string (nullable = true)
| | |-- titleid: string (nullable = true)
| | |-- titlename: string (nullable = true)
+-------------------------------------------------------------------+
|message |
+-------------------------------------------------------------------+
|[[[1107879,[[0.0,Commission],[0.0,Total CV]]]],[,111,AAA]]|
|[[[1107875,222,BBB]]|
|[[[1107882,333,CCC]]|
|[[[1107880,444,DDD]]|
|[[[1107884,555,EEE]]|
+-------------------------------------------------------------------+
但是当我使用rates array column
将to_json function
转换为JSON时,它完全忽略了currency
字段,可能是因为其null
。
jsonDF.select(to_json(struct($"message.customer.customerDetail.rates")).as("Rates")).show(false)
输出:
+-------------------------------------------------------------------------------+
|Rates |
+-------------------------------------------------------------------------------+
|{"rates":[{"amount":0.0,"type":"Commission"},{"amount":0.0,"type":"Total CV"}]}|
|{"rates":[{"amount":0.0,"type":"Total CV"}]}|
+-------------------------------------------------------------------------------+
我该如何解决?