使用 Scala 提取 JSON 对象(有重复键)
Posted
技术标签:
【中文标题】使用 Scala 提取 JSON 对象(有重复键)【英文标题】:Extract JSON objects using Scala (Have Duplicates Keys) 【发布时间】:2022-01-05 17:33:15 【问题描述】:有一个像下面这样的示例 JSON,其中包含字段 context 的重复键:
"Production":
"meta_id": "1239826",
"endeca_id": "EN29826",
"Title": "Use The Google Home ™ To Choose The Right CCCM Solution For Your Needs",
"Subtitle": null,
"context":
"researchID": "22",
"researchtitle": " The Google Home ™: Cross-Channel , Q4 2019",
"telconfdoclinkid": null
,
"context":
"researchID": "281",
"researchtitle": " The Google Home ™: Cross-Channel Q3 2019",
"telconfdoclinkid": null
,
"context":
"researchID": "154655",
"researchtitle": " Now Tech: Cross-Channel Campaign Management, Q2 2019",
"telconfdoclinkid": null
,
"uri": "/doc/uri",
"ssd": "ihdfiuhdl",
"id": "dsadfsd221e"
当我在 scala 中为字段“context”解析 JSON 时,它会拒绝 JSON,并出现如下解析错误。
线程“主”org.json.JSONException 中的异常:重复键“上下文”。
您能否建议使用 scala 解析上述格式的 json 的最佳方法。
【问题讨论】:
这是一个有效的 JSON 吗?我不确定图书馆会接受它。 No.. 这不是有效的 JSON(错误:重复键“上下文”)。但是我们在调用 web 服务后得到了这个 JSON 响应。在这种情况下,我们无法更改响应。 您可以推出自己的 JSON 解析器(描述如何做到这一点超出了 *** 答案的范围),它使用允许每个键有多个值的模型。在这方面您不会从现有库中获得太多帮助,但如果无法修复 Web 服务,那基本上就是您必须做的。 首先,这不是一个有效的 JSON,你的解析器在解析键值后尝试做这样的事情:YourClass(field1 = value1, ..., context = valueN, context = valueM)
所以它会被混淆。而且我认为没有任何图书馆会支持这一点。您可以编写自己的解析器(如 Levi Ramsey 所说),或者尝试做这样的事情,尝试通过正则表达式查找上下文值,例如:"\"context\":\[^^]\"
(我不确定正则表达式),然后输入值在数组或其他东西中,然后解析 JSON 的其余部分并将上下文值作为列表附加到它
根据the latest JSON specification it is a valid JSON。将映射用于 JSON 对象的内存表示的 JSON 解析器会忽略键重复或返回错误。其他的可以调整以接受重复。
【参考方案1】:
一些用于 Scala 的 JSON 解析器从 JSON 字节解析到您的数据结构,可以使用自定义编解码器解析重复的键。
以下是如何使用jsoniter-scala 完成的示例:
将依赖项添加到您的 build.sbt
:
libraryDependencies ++= Seq(
// Use the %%% operator instead of %% for Scala.js
"com.github.plokhotnyuk.jsoniter-scala" %% "jsoniter-scala-core" % "2.12.0",
// Use the "provided" scope instead when the "compile-internal" scope is not supported
"com.github.plokhotnyuk.jsoniter-scala" %% "jsoniter-scala-macros" % "2.12.0" % "compile-internal"
)
使用来自以下 sn-p 的数据结构和自定义编解码器:
import com.github.plokhotnyuk.jsoniter_scala.macros._
import com.github.plokhotnyuk.jsoniter_scala.core._
object Example01
case class Context(researchID: String, researchtitle: String, telconfdoclinkid: Option[String])
sealed trait Issue
case class Production(
meta_id: String,
endeca_id: String,
Title: String,
Subtitle: Option[String],
contexts: List[Context],
uri: String,
ssd: String,
id: String) extends Issue
implicit val contextCodec: JsonValueCodec[Context] = JsonCodecMaker.make
implicit val productionCodec: JsonValueCodec[Production] =
new JsonValueCodec[Production]
def nullValue: Production = null
def decodeValue(in: JsonReader, default: Production): Production = if (in.isNextToken(''))
var _meta_id: String = null
var _endeca_id: String = null
var _Title: String = null
var _Subtitle: Option[String] = None
val _contexts = List.newBuilder[Context]
var _uri: String = null
var _ssd: String = null
var _id: String = null
var p0 = 255
if (!in.isNextToken(''))
in.rollbackToken()
var l = -1
while (l < 0 || in.isNextToken(','))
l = in.readKeyAsCharBuf()
if (in.isCharBufEqualsTo(l, "meta_id"))
if ((p0 & 1) != 0 ) p0 ^= 1
else in.duplicatedKeyError(l)
_meta_id = in.readString(_meta_id)
else if (in.isCharBufEqualsTo(l, "endeca_id"))
if ((p0 & 2) != 0) p0 ^= 2
else in.duplicatedKeyError(l)
_endeca_id = in.readString(_endeca_id)
else if (in.isCharBufEqualsTo(l, "Title"))
if ((p0 & 4) != 0) p0 ^= 4
else in.duplicatedKeyError(l)
_Title = in.readString(_Title)
else if (in.isCharBufEqualsTo(l, "Subtitle"))
if ((p0 & 8) != 0) p0 ^= 8
else in.duplicatedKeyError(l)
_Subtitle =
if (in.isNextToken('n')) in.readNullOrError(_Subtitle, "expected value or null")
else
in.rollbackToken()
new Some(in.readString(null))
else if (in.isCharBufEqualsTo(l, "context"))
p0 &= ~16
_contexts += contextCodec.decodeValue(in, contextCodec.nullValue)
else if (in.isCharBufEqualsTo(l, "uri"))
if ((p0 & 32) != 0) p0 ^= 32
else in.duplicatedKeyError(l)
_uri = in.readString(_uri)
else if (in.isCharBufEqualsTo(l, "ssd"))
if ((p0 & 64) != 0) p0 ^= 64
else in.duplicatedKeyError(l)
_ssd = in.readString(_ssd)
else if (in.isCharBufEqualsTo(l, "id"))
if ((p0 & 128) != 0) p0 ^= 128
else in.duplicatedKeyError(l)
_id = in.readString(_id)
else in.skip()
if (!in.isCurrentToken('')) in.objectEndOrCommaError()
if ((p0 & 247) != 0) in.requiredFieldError(f0(java.lang.Integer.numberOfTrailingZeros(p0 & 247)))
new Production(meta_id = _meta_id, endeca_id = _endeca_id, Title = _Title, Subtitle = _Subtitle, contexts = _contexts.result(), uri = _uri, ssd = _ssd, id = _id)
else in.readNullOrTokenError(default, '')
def encodeValue(x: Production, out: JsonWriter): Unit =
out.writeObjectStart()
out.writeNonEscapedAsciiKey("meta_id")
out.writeVal(x.meta_id)
out.writeNonEscapedAsciiKey("endeca_id")
out.writeVal(x.endeca_id)
out.writeNonEscapedAsciiKey("Title")
out.writeVal(x.Title)
x.Subtitle match
case Some(s) =>
out.writeNonEscapedAsciiKey("Subtitle")
out.writeVal(s)
x.contexts.foreach c =>
out.writeNonEscapedAsciiKey("context")
contextCodec.encodeValue(c, out)
out.writeNonEscapedAsciiKey("uri")
out.writeVal(x.uri)
out.writeNonEscapedAsciiKey("ssd")
out.writeVal(x.ssd)
out.writeNonEscapedAsciiKey("id")
out.writeVal(x.id)
out.writeObjectEnd()
private[this] def f0(i: Int): String = ((i: @annotation.switch): @unchecked) match
case 0 => "meta_id"
case 1 => "endeca_id"
case 2 => "Title"
case 3 => "Subtitle"
case 4 => "context"
case 5 => "uri"
case 6 => "ssd"
case 7 => "id"
implicit val issueCodec: JsonValueCodec[Issue] = JsonCodecMaker.make(CodecMakerConfig.withDiscriminatorFieldName(None))
def main(args: Array[String]): Unit =
val issue = readFromArray[Issue](
"""
|
| "Production":
| "meta_id": "1239826",
| "endeca_id": "EN29826",
| "Title": "Use The Google Home &trade To Choose The Right CCCM Solution For Your Needs",
| "Subtitle": null,
| "context":
| "researchID": "22",
| "researchtitle": " The Google Home ™: Cross-Channel , Q4 2019",
| "telconfdoclinkid": null
| ,
| "context":
| "researchID": "281",
| "researchtitle": " The Google Home ™: Cross-Channel Q3 2019",
| "telconfdoclinkid": null
| ,
| "context":
| "researchID": "154655",
| "researchtitle": " Now Tech: Cross-Channel Campaign Management, Q2 2019",
| "telconfdoclinkid": null
| ,
| "uri": "/doc/uri",
| "ssd": "ihdfiuhdl",
| "id": "dsadfsd221e"
|
|
|""".stripMargin.getBytes("UTF-8"))
println(issue)
预期输出:
Production(1239826,EN29826,Use The Google Home &trade To Choose The Right CCCM Solution For Your Needs,None,List(Context(22, The Google Home ™: Cross-Channel , Q4 2019,None), Context(281, The Google Home ™: Cross-Channel Q3 2019,None), Context(154655, Now Tech: Cross-Channel Campaign Management, Q2 2019,None)),/doc/uri,ihdfiuhdl,dsadfsd221e)
【讨论】:
【参考方案2】:Json4s 可以解析重复键:
scala> import org.json4s.native.JsonMethods._
import org.json4s.native.JsonMethods._
scala> parse(""" "hello": true, "context": "value": "A", "context": "value": "B" """)
res2: org.json4s.JValue = JObject(List((hello,JBool(true)), (context,JObject(List((value,JString(A))))), (context,JObject(List((value,JString(B)))))))
Here's the documentation for json4s
【讨论】:
以上是关于使用 Scala 提取 JSON 对象(有重复键)的主要内容,如果未能解决你的问题,请参考以下文章
在 json 文档中没有指定键的 bigquery 中从 json 字符串中提取键和值
如何在使用 Scala-Play Json 框架解析 Json 时获取键和值?