Elastic Search 中带有评论的博客文章的数据模型
Posted
技术标签:
【中文标题】Elastic Search 中带有评论的博客文章的数据模型【英文标题】:Data Model for Blog Post with comments in Elastic Search 【发布时间】:2021-02-04 05:47:25 【问题描述】:使用 elasticsearch 构建帖子/评论系统的最佳方式是什么? 我使用 elasticsearch 作为辅助数据库。
会有一个带有多线程评论系统的帖子,可能是两个 层次深。 每个帖子最多可以有 500-1000 cmets。 每个喜欢和 cmets 都会有增量计数器 评论和发布。这意味着很多索引。 另外,我想用他们的 cmets 获取关于应用过滤器的博客文章。
现在,我的结构是这样的。在这一篇中,博客文章和用户详细信息很少编辑,但标签和 cmets 会经常添加。
"_index": "brainstormer_ideas_with_comments",
"_type": "_doc",
"_id": "1",
"_version": 1,
"_seq_no": 0,
"_primary_term": 1,
"found": true,
"_source":
"id": 1,
"brainstormer_id": 1,
"idea": "cCZhvy",
"description": "2jJPo3hYbqeh2VBnDJeGtylVu7qfe_MRp77hTK6t7SN57GzeQG8c",
"user":
"id": "user-1",
"login": "pO2DSqIS--"
,
"created_at": "2020-08-13T20:35:17+00:00",
"like_count": 41,
"comment_count": 45,
"tags": [
"bU37X",
"a_Rl5b",
"vxD.ZMo",
"AmvtHVuQ",
"yx9oSx-_D"
],
"comments": [
"id": "comment-1",
"comment": "7ewh-Cqf4gQqmIK53jXbR7",
"tags": [
"mJN",
"jFm-",
"hV0pi",
"ONGNOw",
"HtzmDfO",
"dawVLk09"
],
"created_at": "2020-08-08T20:35:17+00:00",
"user":
"id": "user-1",
"login": "Tl6CDNawUh"
,
"id": "comment-1",
"comment": "BKj8sAcbJJXWxAPk3HQFTZWtvQm",
"tags": [
"sYj",
"XRLw",
"xtAeH",
"Oq6dBR",
"lj4_hOI",
"n3lhc2ig"
],
"created_at": "2020-09-21T20:35:17+00:00",
"user":
"id": "user-2",
"login": "AF3KT415uf"
,
"id": "comment-1",
"comment": "vzt7XEe2WIP3OszpLmcF8J",
"tags": [
"YCH",
"kodm",
"RGv2B",
"Qk5R1D",
"ICrDjmz",
"4mmfLK16"
],
"created_at": "2020-07-08T20:35:17+00:00",
"user":
"id": "user-3",
"login": "7xTLOuCeWD"
,
"id": "comment-1",
"comment": "Jm6E3PrlOI",
"tags": [
"IrZ",
"TJlf",
"__HQy",
"5VH2Vs",
"btvxG51",
"5iRoVR_k"
],
"created_at": "2020-07-19T20:35:17+00:00",
"user":
"id": "user-4",
"login": "zr32RlxNak"
,
"id": "comment-1",
"comment": "jKGzoZhCpUv4DrvoebamXLnmvyX_CK0",
"tags": [
"Osa",
"OKlQ",
"cBcjt",
"2BcQD7",
"K7lLhS7",
"ZK1t_GXl"
],
"created_at": "2020-07-14T20:35:17+00:00",
"user":
"id": "user-5",
"login": "B8LGMpPWwv"
,
"id": "comment-1",
"comment": "L-PryTXsa1FbEnIJdH_5vlsdpfnckB1kmMJI4EVwszhc45qlW6e",
"tags": [
"kRJ",
"Mkka",
"ari.I",
"pgWcUk",
"w78vFir",
"eOx.zRx9"
],
"created_at": "2020-08-07T20:35:17+00:00",
"user":
"id": "user-6",
"login": "IG1Oo_fOcr"
]
使用嵌套对象或父/子或其他东西会更好吗? 关于结构和多久更新一次弹性搜索的任何建议都是 非常感谢。
谢谢,
【问题讨论】:
嗨,已经有一段时间了,如果你能接受答案,如果它有帮助的话,那就太好了:) TIA 【参考方案1】:嵌套对象和父子关系都是代价高昂的。
一种方法是在 Elasticsearch 中为主帖子上的每个评论/回复创建一个单独的文档,而不是严格的父子关系,只需有一个字段告诉父帖子是什么,即松散耦合/文件之间的关系。
default refresh interval for elasticsearch is 1 sec for providing the NRT,如果您愿意,可以保留此默认值或根据您的用例和性能要求对其进行微调。
【讨论】:
以上是关于Elastic Search 中带有评论的博客文章的数据模型的主要内容,如果未能解决你的问题,请参考以下文章
Elastic search中使用nested类型的内嵌对象
SpringCloud - Spring Cloud 之 Sleuth分布式链路跟踪;Zipkin埋点数据;Elastic Search 数据持久化(十八)