ES7.X版本mapping中新增了runtime fields功能。众所周知,mapping一旦创建,并且数据索引之后,如果还想在mapping中增加字段,仍需要重建索引,并在新索引中增加字段来达到目的。此时,如果使用runtime fields功能,则可在原来的mapping中定义runtime类型字段来实现此功能,除此之外还有:
增加mapping字段,无需重新reindexing原来的数据(上述说明)
在不知道数据结构的情况下就可index并使用数据
在查询时重写从索引字段返回的值
不改变底层schema的条件下定义字段当作特殊的用途
运行时字段在处理日志数据时很有用,特别是在不确定数据结构时。
缺点:查询时重写从索引字段返回的值,所以搜索速度会下降。
优点:不需要将所有的字段都定义为常规意义上的mapping字段,索引大小要小得多。
结合runtime fields,您可以在不使用索引的前提下更快地处理日志。
Queries against runtime fields are considered expensive. If search.allow_expensive_queries is set to false, expensive queries are not allowed and Elasticsearch will reject any queries against runtime fields.
https://www.elastic.co/guide/en/elasticsearch/reference/7.x/runtime.html
可以通过设置allow_expensive_queries=false,禁止使用runtime fields
下面我们通过五个场景来一一介绍runtime fields的功能,带领你全方位了解
|
查询语句中定义runtime fields
Step One
POST my-index-000001/_bulk?refresh=true{"index":{}}{"@timestamp":1516729294000,"model_number":"QVKC92Q","measures":{"voltage":"5.2","start": "300","end":"8675309"}}{"index":{}}{"@timestamp":1516642894000,"model_number":"QVKC92Q","measures":{"voltage":"5.8","start": "300","end":"8675309"}}{"index":{}}{"@timestamp":1516556494000,"model_number":"QVKC92Q","measures":{"voltage":"5.1","start": "300","end":"8675309"}}{"index":{}}{"@timestamp":1516470094000,"model_number":"QVKC92Q","measures":{"voltage":"5.6","start": "300","end":"8675309"}}{"index":{}}{"@timestamp":1516383694000,"model_number":"HG537PU","measures":{"voltage":"4.2","start": "400","end":"8625309"}}{"index":{}}{"@timestamp":1516297294000,"model_number":"HG537PU","measures":{"voltage":"4.0","start": "400","end":"8625309"}}
step two
GET my-index-000001/_search{"aggs": {"avg_start": {"avg": {"field": "measures.start"}},"avg_end": {"avg": {"field": "measures.end"}}}}
由于数据bulk入库后ES自动创建mapping,measures.start和measures.end被定义为text,avg执行不会成功,此时可利用runtime功能。
step three
PUT my-index-000001/_mapping{"runtime": {"measures.start": {"type": "long"},"measures.end": {"type": "long"}}}
执行step three后在执行step two,结果成功agg:
"aggregations" : {"avg_start" : {"value" : 333.3333333333333},"avg_end" : {"value" : 8658642.333333334}}
并且mapping被设置成:
"mappings" : {"runtime" : {"measures.end" : {"type" : "long"},"measures.start" : {"type" : "long"}},"properties" : {......}
此外可以在查询语句结构中定义runtime_mapping,类似于在mapping中定义字段。
step four
GET my-index-000001/_search{"runtime_mappings": {"duration": {"type": "long","script": {"source": """emit(doc['measures.end'].value - doc['measures.start'].value);"""}}},"aggs": {"duration_stats": {"stats": {"field": "duration"}}}}
成功查出结果:
"aggregations" : {"duration_stats" : {"count" : 6,"min" : 8624909.0,"max" : 8675009.0,"avg" : 8658309.0,"sum" : 5.1949854E7}}
通过runtime_mapping在DSL request语句中定义duration字段,并查询出long型的duration的stats,形同mapping定义,但是mapping中确实不会存在此字段step four执行后,mapping中也不会增加定义为runtime fields的duration字段)。
查询语句中重写field
如果创建了一个mapping中已有的field,runtime field会覆盖mapping中原有的field,查询时ES基于script计算runtime field,返回结果。可以在不改变mapping的前提下重写此field值。
step one
POST my-index-000002/_bulk?refresh=true{"index":{}}{"@timestamp":1516729294000,"model_number":"QVKC92Q","measures":{"voltage":5.2}}{"index":{}}{"@timestamp":1516642894000,"model_number":"QVKC92Q","measures":{"voltage":5.8}}{"index":{}}{"@timestamp":1516556494000,"model_number":"QVKC92Q","measures":{"voltage":5.1}}{"index":{}}{"@timestamp":1516470094000,"model_number":"QVKC92Q","measures":{"voltage":5.6}}{"index":{}}{"@timestamp":1516383694000,"model_number":"HG537PU","measures":{"voltage":4.2}}{"index":{}}{"@timestamp":1516297294000,"model_number":"HG537PU","measures":{"voltage":4.0}}
step two
GET my-index-000002/_search{"query": {"match": {"model_number": "HG537PU"}}}
查询结果:
"hits" : [{"_index" : "my-index-000002","_type" : "_doc","_id" : "aPD4wHsBln4zi13nUGF7","_score" : 1.0296195,"_source" : {"@timestamp" : 1516383694000,"model_number" : "HG537PU","measures" : {"voltage" : 4.2}}},{"_index" : "my-index-000002","_type" : "_doc","_id" : "afD4wHsBln4zi13nUGF7","_score" : 1.0296195,"_source" : {"@timestamp" : 1516297294000,"model_number" : "HG537PU","measures" : {"voltage" : 4.0}}}]
step three
POST my-index-000001/_search{"runtime_mappings": {"measures.voltage": {"type": "double","script": {"source":"""if (doc['model_number.keyword'].value.equals('HG537PU')){emit(1.7 * params._source['measures']['voltage']);}else{emit(params._source['measures']['voltage']);}"""}}},"query": {"match": {"model_number": "HG537PU"}},"fields": ["measures.voltage"]}
定义runtime_mapping,script中将mapping中定义过的model_number拿出来判断,如果等于HG537PU,乘以1.7,否则不作处理,查询结果:
"hits" : [{"_index" : "my-index-000002","_type" : "_doc","_id" : "aPD4wHsBln4zi13nUGF7","_score" : 1.0296195,"_source" : {"@timestamp" : 1516383694000,"model_number" : "HG537PU","measures" : {"voltage" : 4.2}},"fields" : {"measures.voltage" : [7.14]}},{"_index" : "my-index-000002","_type" : "_doc","_id" : "afD4wHsBln4zi13nUGF7","_score" : 1.0296195,"_source" : {"@timestamp" : 1516297294000,"model_number" : "HG537PU","measures" : {"voltage" : 4.0}},"fields" : {"measures.voltage" : [6.8]}}]
搜索runtime fields
step one
PUT my-index-000003/{"mappings": {"dynamic": "runtime","runtime": {"day_of_week": {"type": "keyword","script": {"source": "emit(doc['@timestamp'].value.dayOfWeekEnum.getDisplayName(TextStyle.FULL, Locale.ROOT))"}}},"properties": {"@timestamp": {"type": "date"}}}}
定义mapping,并将dynamic定义为runtime,并定义day_of_week字段将@timestamp转化为周计数。
step two
POST my-index-000003/_bulk?refresh{ "index": {}}{ "@timestamp": "2020-06-21T15:00:01-05:00", "message" : "211.11.9.0 - - [2020-06-21T15:00:01-05:00] \"GET english/index.html HTTP/1.0\" 304 0"}{ "index": {}}{ "@timestamp": "2020-06-21T15:00:01-05:00", "message" : "211.11.9.0 - - [2020-06-21T15:00:01-05:00] \"GET english/index.html HTTP/1.0\" 304 0"}{ "index": {}}{ "@timestamp": "2020-04-30T14:30:17-05:00", "message" : "40.135.0.0 - - [2020-04-30T14:30:17-05:00] \"GET images/hm_bg.jpg HTTP/1.0\" 200 24736"}{ "index": {}}{ "@timestamp": "2020-04-30T14:30:53-05:00", "message" : "232.0.0.0 - - [2020-04-30T14:30:53-05:00] \"GET images/hm_bg.jpg HTTP/1.0\" 200 24736"}{ "index": {}}{ "@timestamp": "2020-04-30T14:31:12-05:00", "message" : "26.1.0.0 - - [2020-04-30T14:31:12-05:00] \"GET images/hm_bg.jpg HTTP/1.0\" 200 24736"}{ "index": {}}{ "@timestamp": "2020-04-30T14:31:19-05:00", "message" : "247.37.0.0 - - [2020-04-30T14:31:19-05:00] \"GET french/splash_inet.html HTTP/1.0\" 200 3781"}{ "index": {}}{ "@timestamp": "2020-04-30T14:31:27-05:00", "message" : "252.0.0.0 - - [2020-04-30T14:31:27-05:00] \"GET images/hm_bg.jpg HTTP/1.0\" 200 24736"}{ "index": {}}{ "@timestamp": "2020-04-30T14:31:29-05:00", "message" : "247.37.0.0 - - [2020-04-30T14:31:29-05:00] \"GET images/hm_brdl.gif HTTP/1.0\" 304 0"}{ "index": {}}{ "@timestamp": "2020-04-30T14:31:29-05:00", "message" : "247.37.0.0 - - [2020-04-30T14:31:29-05:00] \"GET images/hm_arw.gif HTTP/1.0\" 304 0"}{ "index": {}}{ "@timestamp": "2020-04-30T14:31:32-05:00", "message" : "247.37.0.0 - - [2020-04-30T14:31:32-05:00] \"GET images/nav_bg_top.gif HTTP/1.0\" 200 929"}{ "index": {}}{ "@timestamp": "2020-04-30T14:31:43-05:00", "message" : "247.37.0.0 - - [2020-04-30T14:31:43-05:00] \"GET french/images/nav_venue_off.gif HTTP/1.0\" 304 0"}
bulk上面的数据后,由于mapping中 dynamic 定义为runtime,此时查看mapping:
{"my-index-000003" : {"mappings" : {"dynamic" : "runtime","runtime" : {"day_of_week" : {"type" : "keyword","script" : {"source" : "emit(doc['@timestamp'].value.dayOfWeekEnum.getDisplayName(TextStyle.FULL, Locale.ROOT))","lang" : "painless"}},"message" : {"type" : "keyword"}},"properties" : {"@timestamp" : {"type" : "date"}}}}}
message也被定义为runtime field
step three
GET my-index-000003/_search{"fields": ["@timestamp","day_of_week"],"_source": false}
day_of_week在mapping runtime中定义而非在mapping中定义,所以即便document入库时没有对day_of_week字段索引,也不需要通过reindex documents重建索引,就可以通过dsl query查询出想要的结果。这种自由的设定能够允许你不通过修改mapping的方式达到修改任何字段的值和属性的目的。
step three查询结果如下:
"hits" : [{"_index" : "my-index-000003","_type" : "_doc","_id" : "uoL_wHsBtwvU8UtZQYry","_score" : 1.0,"fields" : {"@timestamp" : ["2020-06-21T20:00:01.000Z"],"day_of_week" : ["Sunday"]}},{},......]
由于dynamic设定为runtime,可以向mapping中定义runtime类型的field。如下:
PUT my-index-000003/_mapping{"runtime": {"client_ip": {"type": "ip","script" : {"source" : "String m = doc[\"message\"].value; int end = m.indexOf(\" \"); emit(m.substring(0, end));"}}}}查询:GET my-index-000003/_search{"size": 1,"query": {"match": {"client_ip": "211.11.9.0"}},"fields" : ["*"]}
结果如下:
"hits" : [{"_index" : "my-index-000003","_type" : "_doc","_id" : "uoL_wHsBtwvU8UtZQYry","_score" : 1.0,"_source" : {"@timestamp" : "2020-06-21T15:00:01-05:00","message" : """211.11.9.0 - - [2020-06-21T15:00:01-05:00] "GET english/index.html HTTP/1.0" 304 0"""},"fields" : {"@timestamp" : ["2020-06-21T20:00:01.000Z"],"client_ip" : ["211.11.9.0"],"message" : ["""211.11.9.0 - - [2020-06-21T15:00:01-05:00] "GET english/index.html HTTP/1.0" 304 0"""],"day_of_week" : ["Sunday"]}}]
结果中包含了client_ip为211.11.9.0的document
索引runtime fields
为了让runtime类型能有更大的影响,是可以将runtime类型的字段直接定义到mapping中,作为mapping fields中的一员存在的。index runtime fields之后,不能更新script脚本,想改变script,只能通过创建新的field。
After indexing a runtime field, you cannot update the included script. If you need to change the script, create a new field with the updated script.
https://www.elastic.co/guide/en/elasticsearch/reference/7.x/runtime-indexed.html
具体操作步骤如下:
PUT my-index-000004/{"mappings": {"properties": {"timestamp": {"type": "date"},"temperature": {"type": "long"},"voltage": {"type": "double"},"node": {"type": "keyword"},"voltage_corrected": {"type": "double","on_script_error": "fail","script": {"source": """emit(doc['voltage'].value * params['multiplier'])""","params": {"multiplier": 4}}}}}}POST my-index-000004/_bulk?refresh=true{ "index": {}}{ "timestamp": 1516729294000, "temperature": 200, "voltage": 5.2, "node": "a"}{ "index": {}}{ "timestamp": 1516642894000, "temperature": 201, "voltage": 5.8, "node": "b"}{ "index": {}}{ "timestamp": 1516556494000, "temperature": 202, "voltage": 5.1, "node": "a"}{ "index": {}}{ "timestamp": 1516470094000, "temperature": 198, "voltage": 5.6, "node": "b"}{ "index": {}}{ "timestamp": 1516383694000, "temperature": 200, "voltage": 4.2, "node": "c"}{ "index": {}}{ "timestamp": 1516297294000, "temperature": 202, "voltage": 4.0, "node": "c"}POST my-index-000004/_search{"query": {"range": {"voltage_corrected": {"gte": 16,"lte": 20,"boost": 1.0}}},"fields": ["voltage_corrected", "node"]}
其中step two中"on_script_error": "fail",意思是如果文档在script index过程中报错,document会被拒绝index,如果"on_script_error"设置为"ignore",则会寄存在document的_ignored元数据字段中,并且文档会继续索引。
结果如下:
"hits" : [{"_index" : "my-index-000004","_type" : "_doc","_id" : "yYIcwXsBtwvU8UtZG4rS","_score" : 1.0,"_source" : {"timestamp" : 1516383694000,"temperature" : 200,"voltage" : 4.2,"node" : "c"},"fields" : {"node" : ["c"],"voltage_corrected" : [16.8]}},{"_index" : "my-index-000004","_type" : "_doc","_id" : "yoIcwXsBtwvU8UtZG4rS","_score" : 1.0,"_source" : {"timestamp" : 1516297294000,"temperature" : 202,"voltage" : 4.0,"node" : "c"},"fields" : {"node" : ["c"],"voltage_corrected" : [16.0]}}]
使用runtime fields探索你的数据
日志中,我们通常会关注@timestamp和message两个字段,并通过runtime 功能对这两个字段做一些操作。
下面体验一下此操作:
PUT /my-index-000005/{"mappings": {"properties": {"@timestamp": {"format": "strict_date_optional_time||epoch_second","type": "date"},"message": {"type": "wildcard"}}}}POST /my-index-000005/_bulk?refresh{"index":{}}{"timestamp":"2020-04-30T14:30:17-05:00","message":"40.135.0.0 - - [30/Apr/2020:14:30:17 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}{"index":{}}{"timestamp":"2020-04-30T14:30:53-05:00","message":"232.0.0.0 - - [30/Apr/2020:14:30:53 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}{"index":{}}{"timestamp":"2020-04-30T14:31:12-05:00","message":"26.1.0.0 - - [30/Apr/2020:14:31:12 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}{"index":{}}{"timestamp":"2020-04-30T14:31:19-05:00","message":"247.37.0.0 - - [30/Apr/2020:14:31:19 -0500] \"GET /french/splash_inet.html HTTP/1.0\" 200 3781"}{"index":{}}{"timestamp":"2020-04-30T14:31:22-05:00","message":"247.37.0.0 - - [30/Apr/2020:14:31:22 -0500] \"GET /images/hm_nbg.jpg HTTP/1.0\" 304 0"}{"index":{}}{"timestamp":"2020-04-30T14:31:27-05:00","message":"252.0.0.0 - - [30/Apr/2020:14:31:27 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}{"index":{}}{"timestamp":"2020-04-30T14:31:28-05:00","message":"not a valid apache log"}PUT my-index-000005/_mappings{"runtime": {"http.clientip": {"type": "ip","script": """String clientip=grok('%{COMMONAPACHELOG}').extract(doc["message"].value)?.clientip;if (clientip != null) emit(clientip);"""}}}GET my-index-000005/_search{"query": {"match": {"http.clientip": "40.135.0.0"}},"fields" : ["*"]}
通过增加runtime fields,将http.clientip加入到mapping中。上面http.clientip runtime fields脚本定义了一个grok模式,该模式从文档中的单个文本字段中提取结构化字段。grok模式类似于正则表达式,它支持可以重用的别名表达式。
Grokking grok
https://www.elastic.co/guide/en/elasticsearch/reference/7.x/grok.html
该脚本匹配%{COMMONAPACHELOG}日志模式,该模式理解Apache日志的结构。如果匹配,脚本将发送匹配的IP地址的值。如果模式不匹配(clientip != null),脚本只返回字段值而不会崩溃。
条件 : if (clientip != null) emit(clientip);如果脚本不包含这个条件,将在任何不匹配的分片上查询失败。包含这个条件,将跳过与grok模式不匹配的数据。
查询结果:
"hits" : [{"_index" : "my-index-000005","_type" : "_doc","_id" : "dvAuwXsBln4zi13nW2E1","_score" : 1.0,"_source" : {"timestamp" : "2020-04-30T14:30:17-05:00","message" : """40.135.0.0 - - [30/Apr/2020:14:30:17 -0500] "GET /images/hm_bg.jpg HTTP/1.0" 200 24736"""},"fields" : {"message" : ["""40.135.0.0 - - [30/Apr/2020:14:30:17 -0500] "GET /images/hm_bg.jpg HTTP/1.0" 200 24736"""],"http.clientip" : ["40.135.0.0"],"timestamp" : ["2020-04-30T19:30:17.000Z"]}}]
相对于正则表达式的处理方式,采用定义变量解析log的方式更易实现,可采用dissect模式而不是grok模式。解析模式匹配固定分隔符,通常比grok快。
在上面my-index-000005的基础上,执行以下代码:
PUT my-index-000005/_mappings{"runtime": {"http.response": {"type": "long","script": """String response=dissect('%{clientip} %{ident} %{auth} [%{@timestamp}] "%{verb} %{request} HTTP/%{httpversion}" %{response} %{size}').extract(doc["message"].value)?.response;if (response != null) emit(Integer.parseInt(response));"""}}}
定义runtime fields http.response,建立查询:
GET my-index-000005/_search{"query": {"match": {"http.response": "304"}},"fields" : ["*"]}
返回结果:
"hits" : [{"_index" : "my-index-000005","_type" : "_doc","_id" : "evAuwXsBln4zi13nW2E1","_score" : 1.0,"_source" : {"timestamp" : "2020-04-30T14:31:22-05:00","message" : """247.37.0.0 - - [30/Apr/2020:14:31:22 -0500] "GET /images/hm_nbg.jpg HTTP/1.0" 304 0"""},"fields" : {"http.response" : [304],"message" : ["""247.37.0.0 - - [30/Apr/2020:14:31:22 -0500] "GET /images/hm_nbg.jpg HTTP/1.0" 304 0"""],"http.clientip" : ["247.37.0.0"],"timestamp" : ["2020-04-30T19:31:22.000Z"]}}]
找到包含 "http.response": "304"的文档。
最后,如果想要在mapping中删除runtime fields,执行以下代码即可:
PUT my-index-000001/_mapping{"runtime": {"http.response": null}}
原创不易,转载请注明来源--微信公众号青阳大君




