es
入门
elastic search核⼼概念的介绍
索引(index)
⼀个索引可以理解成⼀个关系型数据库。
类型(type)
⼀种type就像⼀类表,⽐如user表,order表。
注意:
ES 5.x中⼀个index可以有多种type。
ES 6.x中⼀个index只能有⼀种type。
ES 7.x以后已经移除type这个概念。
映射(mapping)
mapping定义了每个字段的类型等信息。相当于关系型数据库中的表结构。
⽂档(document)
⼀个document相当于关系型数据库中的⼀⾏记录。
字段(field)
相当于关系型数据库表的字段
集群(cluster)
集群由⼀个或多个节点组成,⼀个集群有⼀个默认名称"elasticsearch"。
节点(node)
集群的节点,⼀台机器或者⼀个进程
分⽚和副本(shard) 副本是分⽚的副本。
分⽚有主分⽚(primary Shard)和副本分⽚(replica Shard)之分。 ⼀个Index数据在物理上被分布在多个主分⽚中,每个主分⽚只存放部分数据。 每个主分⽚可以有多个副本,叫副本分⽚,是主分⽚的复制。
RESTful⻛格的介绍
action | 描述 |
---|---|
HEAD | 只获取某个资源的头部信息 |
GET | 获取资源 |
POST | 创建或更新资源 |
PUT | 创建或更新资源 |
DELETE | 删除资源 |
索引的介绍和使⽤
新增
curl -X PUT "localhost:9200/nba"
PUT /nba1
{
"settings":{
"number_of_shards":8,
"number_of_replicas":1
}
}获取
curl -X GET "localhost:9200/nba"
删除
curl -X DELETE "localhost:9200/nba"
批量获取
curl -x GET "localhost:9200/nba,cba"
获取所有
curl -X GET "localhost:9200/_all"
curl -X GET "localhost:9200/_cat/indices?v"
存在
curl -I "localhost:9200/nba"
关闭
curl -X POST "localhost:9200/nba/_close"
打开
curl -X POST "localhost:9200/nba/_open"
映射的介绍和使⽤
新增
PUT /nba/_mapping
{
"properties":{
"name":{
"type":"text"
},
"team_name":{
"type":"text"
},
"position":{
"type":"keyword"
},
"play_year":{
"type":"keyword"
},
"jerse_no":{
"type":"keyword"
}
}
}获取
GET /nba/_mapping
批量获取
GET /nba,customer_care/_mapping
GET /_all/_mapping
修改
PUT /nba/_mapping
{
"properties":{
"name":{
"type":"text"
},
"team_name":{
"type":"text"
},
"position":{
"type":"keyword"
},
"play_year":{
"type":"keyword"
},
"jerse_no":{
"type":"keyword"
},
"country":{
"type":"keyword"
}
}
}
⽂档的增删改查
新增⽂档
PUT /nba/_doc/1
{
"name":"哈登",
"team_name":"⽕箭",
"position":"得分后卫",
"play_year":"10",
"jerse_no":"13"
}POST /nba/_doc
{
"name":"库⾥",
"team_name":"勇⼠",
"position":"组织后卫",
"play_year":"10",
"jerse_no":"30"
}指定操作类型
PUT /nba/_doc/1?op_type=create
{
"name":"哈登",
"team_name":"⽕箭",
"position":"得分后卫",
"play_year":"10",
"jerse_no":"13"
}查看⽂档
GET /nba/_doc/1
查看多个⽂档
POST /_mget
{
"docs":[
{
"_index":"nba",
"_type":"_doc",
"_id":"1"
},
{
"_index":"nba",
"_type":"_doc",
"_id":"2"
}
]
}POST /nba/_mget
{
"docs":[
{
"_type":"_doc",
"_id":"1"
},
{
"_type":"_doc",
"_id":"2"
}
]
}POST /nba/_doc/_mget
{
"docs":[
{
"_id":"1"
},
{
"_id":"2"
}
]
}GET /nba/_doc/_mget
{
"ids":[
"1",
"2"
]
}修改⽂档
POST /nba/_update/1
{
"doc":{
"team_name":"⽕箭",
"position":"双能卫",
"play_year":"10",
"jerse_no":"13"
}
}向_source字段,增加⼀个字段
POST /nba/_update/1
{
"script":"ctx._source.age = 18"
}从_source字段,删除⼀个字段
POST /nba/_update/1
{
"script": "ctx._source.remove(\"age\")"
}根据参数值,更新指定⽂档的字段
POST /nba/_update/1
{
"script":{
"source":"ctx._source.age += params.age",
"params":{
"age":4
}
}
}- upsert 当指定的⽂档不存在时,upsert参数包含的内容将会被插⼊到索引中,作为⼀个 新⽂档;如果指定的⽂档存在,ElasticSearch引擎将会执⾏指定的更新逻辑。
POST /nba/_update/3
{
"script":{
"source":"ctx._source.allstar += params.allstar",
"params":{
"allstar":4
}
},
"upsert":{
"allstar":1
}
}删除⽂档
DELETE /nba/_doc/1
搜索的简单使⽤
准备⼯作
DELETE /nba
PUT /nba
{
"mappings":{
"properties":{
"name":{
"type":"text"
},
"team_name":{
"type":"text"
},
"position":{
"type":"text"
},
"play_year":{
"type":"long"
},
"jerse_no":{
"type":"keyword"
}
}
}
}
PUT /nba/_doc/1
{
"name": "哈登",
"team_name": "⽕箭",
"position": "得分后卫",
"play_year": 10,
"jerse_no": "13"
}
PUT /nba/_doc/2
{
"name": "库⾥",
"team_name": "勇⼠",
"position": "控球后卫",
"play_year": 10,
"jerse_no": "30"
}
PUT /nba/_doc/3
{
"name": "詹姆斯",
"team_name": "湖⼈",
"position": "⼩前锋",
"play_year": 15,
"jerse_no": "23"
}term(词条)查询和full text(全⽂)查询
词条查询:词条查询不会分析查询条件,只有当词条和查询字符串完全匹配时,才匹配搜索。
全⽂查询:ElasticSearch引擎会先分析查询字符串,将其拆分成多个分词,只要已分析的字 段中包含词条的任意⼀个,或全部包含,就匹配查询条件,返回该⽂档;如果不包含任意⼀ 个分词,表示没有任何⽂档匹配查询条件
单条term查询
POST /nba/_search
{
"query": {
"term": {
"jerse_no":"23"
}
}
}多条term查询
POST /nba/_search
{
"query": {
"terms": {
"jerse_no": [
"23",
"13"
]
}
}
}match_all
POST /nba/_search
{
"query": {
"match_all": {}
},
"from": 0,
"size": 10
}match
POST /nba/_search
{
"query": {
"match": {
"position": "后卫"
}
}
}multi_match 多字段查询
POST /nba/_update/2
{
"doc": {
"name": "库⾥",
"team_name": "勇⼠",
"position": "控球后卫",
"play_year": 10,
"jerse_no": "30",
"title": "the best shooter"
}
}
POST /nba/_search
{
"query": {
"multi_match": {
"query": "shooter",
"fields": [
"title",
"name"
]
}
}
}match_phrase 准确查询
POST /nba/_search
{
"query": {
"match_phrase": {
"position": "得分后卫"
}
}
}
常⻅的字段类型
数据类型
核⼼数据类型
复杂数据类型
专⽤数据类型
核⼼数据类型
字符串
text ⽤于全⽂索引,该类型的字段将通过分词器进⾏分词
keyword 不分词,只能搜索该字段的完整的值
数值型
long, integer, short, byte, double, float, half_float, scaled_float
布尔
boolean
⼆进制 - binary
该类型的字段把值当做经过 base64 编码的字符串,默认不存储,且不可搜索
范围类型
范围类型表示值是⼀个范围,⽽不是⼀个具体的值
integer_range, float_range, long_range, double_range, date_range
譬如 age 的类型是 integer_range,那么值可以是 {"gte" : 20, "lte" : 40};搜索 "term" : {"age": 21} 可以搜索该值
⽇期 - date
由于Json没有date类型,所以es通过识别字符串是否符合format定义的格式来判断是否 为date类型
format默认为:strict_date_optional_time||epoch_millis
格式
"2022-01-01" "2022/01/01 12:10:30" 这种字符串格式
从开始纪元(1970年1⽉1⽇0点) 开始的毫秒数
从开始纪元开始的秒数
PUT /nba/_mapping
{
"properties": {
"name": {
"type": "text"
},
"team_name": {
"type": "text"
},
"position": {
"type": "text"
},
"play_year": {
"type": "long"
},
"jerse_no": {
"type": "keyword"
},
"title": {
"type": "text"
},
"date": {
"type": "date"
}
}
}
POST /nba/_doc/4
{
"name": "蔡x坤",
"team_name": "勇⼠",
"position": "得分后卫",
"play_year": 10,
"jerse_no": "31",
"title": "",
"date": "2020-01-01"
}
POST /nba/_doc/5
{
"name": "杨x越",
"team_name": "猴急",
"position": "得分后卫",
"play_year": 10,
"jerse_no": "32",
"title": "",
"date": 1610350870
}
POST /nba/_doc/6
{
"name": "吴X凡",
"team_name": "湖⼈",
"position": "得分后卫",
"play_year": 10,
"jerse_no": "33",
"title": "",
"date": 1641886870000
}复杂数据类型
数组类型 Array
对象类型 Object
专⽤数据类型
IP类型 IP类型的字段⽤于存储IPv4或IPv6的地址, 本质上是⼀个⻓整型字段.
官⽹⽂档
搜索
es之批量量导⼊入数据
Bulk
POST bulk
curl -X POST "localhost:9200/_bulk" -H 'Content-Type: application/json' --data-binary @name
es之term的多种查询
介绍
单词级别查询 - 这些查询通常⽤于结构化的数据,⽐如:number, date, keyword等,⽽不是对text。
也就是说,全⽂本查询之前要先对⽂本内容进⾏分词,⽽单词级别的查询直接在相应字段的 反向索引中精确查找,单词级别的查询⼀般⽤于数值、⽇期等类型的字段上
准备⼯作
删除nba索引
新增nba索引
DELETE /nba
PUT /nba
{
"mappings": {
"properties": {
"birthDay": {
"type": "date"
},
"birthDayStr": {
"type": "keyword"
},
"age": {
"type": "integer"
},
"code": {
"type": "text"
},
"country": {
"type": "text"
},
"countryEn": {
"type": "text"
},
"displayAffiliation": {
"type": "text"
},
"displayName": {
"type": "text"
},
"displayNameEn": {
"type": "text"
},
"draft": {
"type": "long"
},
"heightValue": {
"type": "float"
},
"jerseyNo": {
"type": "text"
},
"playYear": {
"type": "long"
},
"playerId": {
"type": "keyword"
},
"position": {
"type": "text"
},
"schoolType": {
"type": "text"
},
"teamCity": {
"type": "text"
},
"teamCityEn": {
"type": "text"
},
"teamConference": {
"type": "keyword"
},
"teamConferenceEn": {
"type": "keyword"
},
"teamName": {
"type": "keyword"
},
"teamNameEn": {
"type": "keyword"
},
"weight": {
"type": "text"
}
}
}
}
Term query 精准匹配查询(查找号码为23的球员)
POST nba/_search
{
"query": {
"term": {
"jerseyNo": "23"
}
}
}
Exsit Query 在特定的字段中查找⾮空值的⽂档(查找队名⾮空的球员)
POST nba/_search
{
"query": {
"exists": {
"field": "teamNameEn"
}
}
}
Prefix Query 查找包含带有指定前缀term的⽂档(查找队名以Rock开头的球员)
POST nba/_search
{
"query": {
"prefix": {
"teamNameEn": "Rock"
}
}
}
Wildcard Query ⽀持通配符查询,*表示任意字符,?表示任意单个字符(查找⽕箭队的球员)
POST nba/_search
{
"query": {
"wildcard": {
"teamNameEn": "Ro*s"
}
}
}
Regexp Query 正则表达式查询(查找⽕箭队的球员)
POST nba/_search
{
"query": {
"regexp": {
"teamNameEn": "Ro.*s"
}
}
}
Ids Query(查找id为1和2的球员)
POST nba/_search
{
"query": {
"ids": {
"values": [
1,
2
]
}
}
}
es的范围查询
查找指定字段在指定范围内包含值(⽇期、数字或字符串)的⽂档。
查找在nba打了2年到10年以内的球员
POST nba/_search
{
"query": {
"range": {
"playYear": {
"gte": 2,
"lte": 10
}
}
}
}查找1980年到1999年出⽣的球员
POST nba/_search
{
"query": {
"range": {
"birthDay": {
"gte": "01/01/1999",
"lte": "2022",
"format": "dd/MM/yyyy||yyyy"
}
}
}
}
es的布尔查询
布尔查询
type description must 必须出现在匹配⽂档中 filter 必须出现在⽂档中,但是不打分 must_not 不能出现在⽂档中 should 应该出现在⽂档中 must (查找名字叫做James的球员)
POST /nba/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"displayNameEn": "james"
}
}
]
}
}
}
filter 效果同must,但是不打分(查找名字叫做James的球员)
POST /nba/_search
{
"query": {
"bool": {
"filter": [
{
"match": {
"displayNameEn": "james"
}
}
]
}
}
}
must_not (查找名字叫做James的⻄部球员)
POST /nba/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"displayNameEn": "james"
}
}
],
"must_not": [
{
"term": {
"teamConferenceEn": {
"value": "Eastern"
}
}
}
]
}
}
}
should(查找名字叫做James的打球时间应该在11到20年⻄部球员) 即使匹配不到也返回,只是评分不同
POST /nba/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"displayNameEn": "james"
}
}
],
"must_not": [
{
"term": {
"teamConferenceEn": {
"value": "Eastern"
}
}
}
],
"should": [
{
"range": {
"playYear": {
"gte": 11,
"lte": 20
}
}
}
]
}
}
}如果minimum_should_match=1,则变成要查出名字叫做James的打球时间在11到20年⻄部 球员
POST /nba/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"displayNameEn": "james"
}
}
],
"must_not": [
{
"term": {
"teamConferenceEn": {
"value": "Eastern"
}
}
}
],
"should": [
{
"range": {
"playYear": {
"gte": 11,
"lte": 20
}
}
}
],
"minimum_should_match": 1
}
}
}
es的排序查询
⽕箭队中按打球时间从⼤到⼩排序的球员
POST nba/_search
{
"query": {
"match": {
"teamNameEn": "Rockets"
}
},
"sort": [
{
"playYear": {
"order": "desc"
}
}
]
}
⽕箭队中按打球时间从⼤到⼩,如果年龄相同则按照身⾼从⾼到低排序的球员
POST nba/_search
{
"query": {
"match": {
"teamNameEn": "Rockets"
}
},
"sort": [
{
"playYear": {
"order": "desc"
}
},
{
"heightValue": {
"order": "asc"
}
}
]
}
es聚合查询之指标聚合
ES聚合分析是什么
聚合分析是数据库中重要的功能特性,完成对⼀个查询的数据集中数据的聚合计算,如:找 出某字段(或计算表达式的结果)的最⼤值、最⼩值,计算和、平均值等。ES作为搜索引擎 兼数据库,同样提供了强⼤的聚合分析能⼒。 - 对⼀个数据集求最⼤、最⼩、和、平均值等指标的聚合,在ES中称为指标聚合
⽽关系型数据库中除了有聚合函数外,还可以对查询出的数据进⾏分组group by,再在组上 进⾏指标聚合。在ES中称为桶聚合
max min sum avg
求出⽕箭队球员的平均年龄
POST /nba/_search
{
"query": {
"term": {
"teamNameEn": {
"value": "Rockets"
}
}
},
"aggs": {
"avgAge": {
"avg": {
"field": "age"
}
}
},
"size": 0
}
value_count 统计⾮空字段的⽂档数
求出⽕箭队中球员打球时间不为空的数量
POST /nba/_search
{
"query": {
"term": {
"teamNameEn": {
"value": "Rockets"
}
}
},
"aggs": {
"countPlayerYear": {
"value_count": {
"field": "playYear"
}
}
},
"size": 0
}查出⽕箭队有多少名球员
POST nba/_count
{
"query": {
"term": {
"teamNameEn": {
"value": "Rockets"
}
}
}
}
Cardinality 值去重计数
查出⽕箭队中年龄不同的数量
POST /nba/_search
{
"query": {
"term": {
"teamNameEn": {
"value": "Rockets"
}
}
},
"aggs": {
"counAget": {
"cardinality": {
"field": "age"
}
}
},
"size": 0
}
stats 统计count max min avg sum 5个值
查出⽕箭队球员的年龄stats
POST /nba/_search
{
"query": {
"term": {
"teamNameEn": {
"value": "Rockets"
}
}
},
"aggs": {
"statsAge": {
"stats": {
"field": "age"
}
}
},
"size": 0
}
Extended stats ⽐stats多4个统计结果: 平⽅和、⽅差、标准差、平均值加/减两个标准差的区间
查出⽕箭队球员的年龄Extend stats
POST /nba/_search
{
"query": {
"term": {
"teamNameEn": {
"value": "Rockets"
}
}
},
"aggs": {
"extendStatsAge": {
"extended_stats": {
"field": "age"
}
}
},
"size": 0
}
es聚合查询之桶聚合
Terms Aggregation 根据字段项分组聚合
⽕箭队根据年龄进⾏分组
POST /nba/_search
{
"query": {
"term": {
"teamNameEn": {
"value": "Rockets"
}
}
},
"aggs": {
"aggsAge": {
"terms": {
"field": "age",
"size": 10
}
}
},
"size": 0
}
order 分组聚合排序
⽕箭队根据年龄进⾏分组,分组信息通过年龄从⼤到⼩排序 (通过指定字段)
POST /nba/_search
{
"query": {
"term": {
"teamNameEn": {
"value": "Rockets"
}
}
},
"aggs": {
"aggsAge": {
"terms": {
"field": "age",
"size": 10,
"order": {
"_key": "desc"
}
}
}
},
"size": 0
}每⽀球队按该队所有球员的平均年龄进⾏分组排序 (通过分组指标值)
POST /nba/_search
{
"query": {
"term": {
"teamNameEn": {
"value": "Rockets"
}
}
},
"aggs": {
"aggsAge": {
"terms": {
"field": "age",
"size": 10,
"order": {
"_count": "desc"
}
}
}
},
"size": 0
}每⽀球队按该队所有球员的平均年龄进⾏分组排序 (通过分组指标值)
POST /nba/_search
{
"aggs": {
"aggsTeamName": {
"terms": {
"field": "teamNameEn",
"size": 30,
"order": {
"avgAge": "desc"
}
},
"aggs": {
"avgAge": {
"avg": {
"field": "age"
}
}
}
}
},
"size": 0
}
筛选分组聚合
湖⼈和⽕箭队按球队平均年龄进⾏分组排序 (指定值列表)
POST /nba/_search
{
"aggs": {
"aggsTeamName": {
"terms": {
"field": "teamNameEn",
"include": [
"Lakers",
"Rockets",
"Warriors"
],
"exclude": [
"Warriors"
],
"size": 30,
"order": {
"avgAge": "desc"
}
},
"aggs": {
"avgAge": {
"avg": {
"field": "age"
}
}
}
}
},
"size": 0
}湖⼈和⽕箭队按球队平均年龄进⾏分组排序 (正则表达式匹配值)
POST /nba/_search
{
"aggs": {
"aggsTeamName": {
"terms": {
"field": "teamNameEn",
"include": "Lakers|Ro.*|Warriors.*",
"exclude": "Warriors",
"size": 30,
"order": {
"avgAge": "desc"
}
},
"aggs": {
"avgAge": {
"avg": {
"field": "age"
}
}
}
}
},
"size": 0
}
Range Aggregation 范围分组聚合
NBA球员年龄按20,20-35,35这样分组
POST /nba/_search
{
"aggs": {
"ageRange": {
"range": {
"field": "age",
"ranges": [
{
"to": 20
},
{
"from": 20,
"to": 35
},
{
"from": 35
}
]
}
}
},
"size": 0
}NBA球员年龄按20,20-35,35这样分组 (起别名)
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 566,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"ageRange" : {
"buckets" : [
{
"key" : "A",
"to" : 20.0,
"doc_count" : 15
},
{
"key" : "B",
"from" : 20.0,
"to" : 35.0,
"doc_count" : 531
},
{
"key" : "C",
"from" : 35.0,
"doc_count" : 20
}
]
}
}
}
Date Range Aggregation 时间范围分组聚合
NBA球员按出⽣年⽉分组
POST /nba/_search
{
"aggs": {
"birthDayRange": {
"date_range": {
"field": "birthDay",
"format": "MM-yyy",
"ranges": [
{
"to": "01-1989"
},
{
"from": "01-1989",
"to": "01-1999"
},
{
"from": "01-1999",
"to": "01-2009"
},
{
"from": "01-2009"
}
]
}
}
},
"size": 0
}
Date Histogram Aggregation 时间柱状图聚合
按天、⽉、年等进⾏聚合统计。可按 year (1y), quarter (1q), month (1M), week (1w), day (1d), hour (1h), minute (1m), second (1s) 间隔聚合
NBA球员按出⽣年分组
POST /nba/_search
{
"aggs": {
"birthday_aggs": {
"date_histogram": {
"field": "birthDay",
"format": "yyyy",
"interval": "year"
}
}
},
"size": 0
}
elastic search的⾼级使⽤
es之refresh操作
理想的搜索:新的数据⼀添加到索引中⽴⻢就能搜索到,但是真实情况不是这样的。
强制刷新
PUT /star/_doc/666?refresh
{ "displayName": "杨超越" }修改默认更更新时间
PUT /star/_settings
{
"index": {
"refresh_interval": "5s"
}
}