elasticsearch索引的创建过程index create逻辑分析

2025-04-02 02:59:52

索引的创建过程

从本篇开始，就进入了Index的核心代码部分。这里首先分析一下索引的创建过程。elasticsearch中的索引是多个分片的集合，它只是逻辑上的索引，并不具备实际的索引功能，所有对数据的操作最终还是由每个分片完成。

创建索引的过程，从elasticsearch集群上来说就是写入索引元数据的过程，这一操作只能在master节点上完成。这是一个阻塞式动作，在加上分配在集群上均衡的过程也非常耗时，因此在一次创建大量索引的过程master节点会出现单点性能瓶颈，能够看到响应过程很慢。

在开始具体源码分析之前，首先回顾一下Action部分的内容（参考index action分析），elasticsearch的每一个功能都对应两个Action，*action和Transport*action。*action中定义了每个功能对应的路径，同时Action的instance绑定对应的Transport*Action。所有功能请求都需要在集群上转发，这大概也是每个功能都有Transport*Action的原因吧。对于create当然也不例外，它的开始点也是TransportCreateAction。另外，在action support分析中分析过，不同的action需要经过和需要操作的节点也不同。create index只能由master节点进行，而且也只在master节点上进行，保证集群数据的一致性。

materOperation方法实现

因此TransportCreateAction继承了TransportMasterNodeOperationAction，并实现了materOperation方法。它的方法如下所示：

protected void masterOperation(final CreateIndexRequest request, final ClusterState state, final ActionListener&lt;CreateIndexResponse&gt; listener) throws ElasticsearchException {
        String cause = request.cause();
        if (cause.length() == 0) {
            cause = "api";
        }
        final CreateIndexClusterStateUpdateRequest updateRequest = new CreateIndexClusterStateUpdateRequest(request, cause, request.index())
                .ackTimeout(request.timeout()).masterNodeTimeout(request.masterNodeTimeout())
                .settings(request.settings()).mappings(request.mappings())
                .aliases(request.aliases()).customs(request.customs());
        createIndexService.createIndex(updateRequest, new ActionListener&lt;ClusterStateUpdateResponse&gt;() {
            @Override
            public void onResponse(ClusterStateUpdateResponse response) {
                listener.onResponse(new CreateIndexResponse(response.isAcknowledged()));
            }
            @Override
            public void onFailure(Throwable t) {
                if (t instanceof IndexAlreadyExistsException) {
                    logger.trace("[{}] failed to create", t, request.index());
                } else {
                    logger.debug("[{}] failed to create", t, request.index());
                }
                listener.onFailure(t);
            }
        });
    }

这里看上很简单，只是调用了createIndexService（它其实是MetaDataCreateIndexService）的方法，就是修改集群matedata过程。

clusterservice处理

修改前首先获取到index名称对应的lock，这样保证操作数据一致性，然后生成updatetask，交给clusterservice处理。代码如下所示：

public void createIndex(final CreateIndexClusterStateUpdateRequest request, final ActionListener&lt;ClusterStateUpdateResponse&gt; listener) {
        // 获取锁，只对该索引的操作加锁，而不是整个cluster
        final Semaphore mdLock = metaDataService.indexMetaDataLock(request.index());
        // 如果能够获取锁离开创建索引，否则在下面启动新的线程进行
        if (mdLock.tryAcquire()) {
            createIndex(request, listener, mdLock);
            return;
        }
        threadPool.executor(ThreadPool.Names.MANAGEMENT).execute(new ActionRunnable(listener) {
            @Override
            public void doRun() throws InterruptedException {
                if (!mdLock.tryAcquire(request.masterNodeTimeout().nanos(), TimeUnit.NANOSECONDS)) {
                    listener.onFailure(new ProcessClusterEventTimeoutException(request.masterNodeTimeout(), "acquire index lock"));
                    return;
                }
                createIndex(request, listener, mdLock);
            }
        });
    }

createIndex方法，会封装create请求，然后向cluster发送一个updatetask。代码如下所示：

private void createIndex(final CreateIndexClusterStateUpdateRequest request, final ActionListener&lt;ClusterStateUpdateResponse&gt; listener, final Semaphore mdLock) {
        ImmutableSettings.Builder updatedSettingsBuilder = ImmutableSettings.settingsBuilder();
        updatedSettingsBuilder.put(request.settings()).normalizePrefix(IndexMetaData.INDEX_SETTING_PREFIX);
        request.settings(updatedSettingsBuilder.build());
        clusterService.submitStateUpdateTask("create-index [" + request.index() + "], cause [" + request.cause() + "]", Priority.URGENT, new AckedClusterStateUpdateTask&lt;ClusterStateUpdateResponse&gt;(request, listener)

建立索引修改配置

增加或者修改mapping都是对集群状态修改，它们的过程都很相似，都是通过clusterService提交一个更新操作，同时附带有优先级。clusterservice会根据优先级和更新状态task的类型来进行对应的操作。如下所示：

public void submitStateUpdateTask(final String source, Priority priority, final ClusterStateUpdateTask updateTask) {
        if (!lifecycle.started()) {
            return;
        }
        try {
            final UpdateTask task = new UpdateTask(source, priority, updateTask);//根据优先级新建不同的task
            if (updateTask instanceof TimeoutClusterStateUpdateTask) {//超时任务，这类任务需要即时返回，因此立刻执行。
                final TimeoutClusterStateUpdateTask timeoutUpdateTask = (TimeoutClusterStateUpdateTask) updateTask;
                updateTasksExecutor.execute(task, threadPool.scheduler(), timeoutUpdateTask.timeout(), new Runnable() {
                    @Override
                    public void run() {
                        threadPool.generic().execute(new Runnable() {
                            @Override
                            public void run() {
                                timeoutUpdateTask.onFailure(task.source(), new ProcessClusterEventTimeoutException(timeoutUpdateTask.timeout(), task.source()));
                            }
                        });
                    }
                });
            } else {//其它类型，可以延迟执行，则交给线程池来执行。
                updateTasksExecutor.execute(task);
            }
        } catch (EsRejectedExecutionException e) {
            // ignore cases where we are shutting down..., there is really nothing interesting
            // to be done here...
            if (!lifecycle.stoppedOrClosed()) {
                throw e;
            }
        }
    }

说完它们的执行过程，再来看一下create index的具体逻辑。这个逻辑在matedataservice所提交的AckedClusterStateUpdateTask中的execute方法中。总体来说，这一过程就是将request中关于索引的配置mapping等取出来加入到当前的clustermatedata中，构造一个新的matedata的过程。这一过程还是比较复杂，限于篇幅将在下次中进行分析。

总结

创建索引的过程就是master节点更新集群matedata的过程，为了保证数据一致性，需要获取锁。

因此存在单点瓶颈。对于外部调用来说，跟其它功能一样，外部接口调用CreateIndexAction的相关方法，然后通过TransPortCreateIndexAction讲请求发送到集群上，进行索引创建。

以上就是elasticsearch索引创建过程index create的详细内容，更多关于elasticsearch索引创建过程index create的资料请关注我们其它相关文章！

elasticsearch索引index之put mapping的设置分析

目录 mapping的设置过程 put mapping updateTask响应总结 mapping的设置过程 mapping机制使得elasticsearch索引数据变的更加灵活,近乎于no schema.mapping可以在建立索引时设置,也可以在后期设置. 后期设置可以是修改mapping(无法对已有的field属性进行修改,一般来说只是增加新的field)或者对没有mapping的索引设置mapping. put mapping操作必须是master节点来完成,因为它涉及到集群mate
elasticsearch索引index之engine读写控制结构实现

目录 engine的实现结构 Engine类的方法: 如index方法的实现: 总结 engine的实现结构 elasticsearch对于索引中的数据操作如读写get等接口都封装在engine中,同时engine还封装了索引的读写控制,如流量.错误处理等.engine是离lucene最近的一部分. engine的实现结构如下所示: engine接口有三个实现类,主要逻辑都在InternalEngine中. ShadowEngine之实现了engine接口的部分读方法,主要用于对于索引的读操作.
elasticsearch索引index之Translog数据功能分析

目录跟大多数分布式系统一样,es也通过临时写入写操作来保证数据安全.因为lucene索引过程中,数据会首先据缓存在内存中直到达到一个量(文档数或是占用空间大小)才会写入到磁盘.这就会带来一个风险,如果在写入磁盘前系统崩溃,那么这些缓存数据就会丢失.es通过translog解决了这个问题,每次写操作都会写入一个临时文件translog中,这样如果系统需要恢复数据可以从translog中读取.本篇就主要分析translog的结构及写入方式. 这一部分主要包括两部分translog和tanslogF
elasticsearch索引index数据功能源码示例

从本篇开始,对elasticsearch的介绍将进入数据功能部分(index),这一部分包括索引的创建,管理,数据索引及搜索等相关功能.对于这一部分的介绍,首先对各个功能模块的分析,然后详细分析数据索引和搜索的整个流程. 这一部分从代码包结构上可以分为:index, indices及lucene(common)几个部分.index包中的代码主要是各个功能对应于lucene的底层操作,它们的操作对象是index的shard,是elasticsearch对lucene各个功能的扩展和封装.indic
elasticsearch源码分析index action实现方式

目录 action的作用 TransportAction的类图 OperationTransportHandler的代码 primary操作的方法总结 action的作用上一篇从结构上分析了action的,本篇将以index action为例仔分析一下action的实现方式. 再概括一下action的作用:对于每种功能(如index)action都会包括两个基本的类*action(IndexAction)和Transport*action(TransportIndexAction),前者类中
elasticsearch索引index之Mapping实现关系结构示例

目录 Mapping的实现关系结构 parse方法部分Field Mapping的实现关系结构 Lucene索引的一个特点就filed,索引以field组合.这一特点为索引和搜索提供了很大的灵活性.elasticsearch则在Lucene的基础上更近一步,它可以是 no scheme.实现这一功能的秘密就Mapping.Mapping是对索引各个字段的一种预设,包括索引与分词方式,是否存储等,数据根据字段名在Mapping中找到对应的配置,建立索引.这里将对Mapping的实现结构简单分析,
elasticsearch索引创建create index集群matedata更新

目录创建索引更新集群index matedata 首先创建index的create方法从indice中获取对应的IndexService 总结创建索引更新集群index matedata 创建索引需要创建索引并且更新集群index matedata,这一过程在MetaDataCreateIndexService的createIndex方法中完成.这里会提交一个高优先级,AckedClusterStateUpdateTask类型的task.索引创建需要即时得到反馈,异常这个task需要返回,
elasticsearch索引index之merge底层机制的合并讲解

merge是lucene的底层机制,merge过程会将index中的segment进行合并,生成更大的segment,提高搜索效率.segment是lucene索引的一种存储结构,每个segment都是一部分数据的完整索引,它是lucene每次flush或merge时候形成.每次flush就是将内存中的索引写出一个独立segment的过程.所以随着数据的不断增加,会形成越来越多的segment.因为segment是不可变的,删除操作不会改变segment内部数据,只是会在另外的地方记录某些数据删
elasticsearch索引的创建过程index create逻辑分析

目录索引的创建过程 materOperation方法实现 clusterservice处理建立索引修改配置总结索引的创建过程从本篇开始,就进入了Index的核心代码部分.这里首先分析一下索引的创建过程.elasticsearch中的索引是多个分片的集合,它只是逻辑上的索引,并不具备实际的索引功能,所有对数据的操作最终还是由每个分片完成. 创建索引的过程,从elasticsearch集群上来说就是写入索引元数据的过程,这一操作只能在master节点上完成.这是一个阻塞式动作,在加上分配
使用canal监控mysql数据库实现elasticsearch索引实时更新

业务场景使用elasticsearch作为全文搜索引擎,对标题.内容等,实现智能搜索.输入提示.拼音搜索等 elasticsearch索引与数据库数据不一致,导致搜索到不应被搜到的结果,或者搜不到已有数据索引相关业务,影响其他业务操作,如索引删除失败导致数据库删除失败为了减少对现有业务的侵入,基于数据库层面,对信息表进行监控,但需要索引的字段变动时,更新索引由于使用的是mysql数据库,故决定采用alibaba的canal中间件主要是监控信息基表base,监控这一张表的数据变动,mq消
使用canal监控mysql数据库实现elasticsearch索引实时更新问题

目录业务场景安装下载安装数据库启用row binlog 使用修改配置文件canal.properties 配置单个连接配置多个连接配置rabbitMQ 程序改动 canal源码微服务消费mq 业务场景使用elasticsearch作为全文搜索引擎,对标题.内容等,实现智能搜索.输入提示.拼音搜索等 elasticsearch索引与数据库数据不一致,导致搜索到不应被搜到的结果,或者搜不到已有数据索引相关业务,影响其他业务操作,如索引删除失败导致数据库删除失败为了减少对现有业务

elasticsearch索引的创建过程index create逻辑分析

目录

索引的创建过程

materOperation方法实现

clusterservice处理

建立索引 修改配置

总结

相关推荐

随机推荐

建立索引修改配置