Search

Elasticsearch Sizing and Configuration

Last Updated: Dec 28, 2022

Articles

The reference of the configuration values and parameters used in this document are from one of the production and non-production Automate instances hosted in JIFFY Managed Cloud.

The non-production elasticsearch is been shared between different Automate servers where as the production elasticsearch is a dedicated one.

ElasticSearch/Opendistro Version

This document is applicable for elasticsearch versions 1.13.2 and 1.13.3. The following are the JIFFY Automate instances considered to standardize the parameters.

Environment	Version
Prod	opendistro-for-elasticsearch:1.13.2
Non-Prod	opendistro-for-elasticsearch:1.13.3

Ingress Controller Parameters

Nginx Proxy Body Size:

If the request body size exceeds the maximum allowed size of the client request body, then the NGINX Ingress Controller returns an HTTP 413 error. Use the client_max_body_size parameter to configure a larger size.

The default value of the proxy-body-size is 1m. Make sure to change the number to the size you need.

Ingress Name	Non-Prod Value	Prod Value
opendistro-es-client	500m	400m
opendistro-es-kibana	400m	400m

Elasticsearch Parameters

Heap size settings:

By default, Elasticsearch automatically sets the JVM heap size based on a node’s roles and total memory.JIFFY recommends the default sizing for most production environments.

To override the default heap size, set the minimum and maximum heap size settings, Xms and Xmx. The minimum and maximum values must be the same.

Set Xms and Xmx to no more than 50% of your total memory. Elasticsearch requires memory for purposes other than the JVM heap. For example, Elasticsearch uses off-heap buffers for efficient network communication and relies on the operating system’s file system cache for efficient access to files. The JVM itself also requires some memory. It’s normal for Elasticsearch to use more memory than the limit configured with the Xmx setting.

Default Heap Memory:

You should always set the min and max JVM heap size to the same value. For example, to set the heap to 4 GB.

Find the default heap memory setting here, connect to the cluster, then check the JVM configurations.

Check the opendistro pods:
kubectl get pod -A
Connect to each opendistro pods and check the jvm.options file.
kubectl exec -it <POD NAME> -n <NAMESPACE > – /bin/bash
cat config\jvm.options

Pods Name	NonProd	Prod
opendistro-es-master-0	-Xms1g -Xmx1g	-Xms1g -Xmx1g
opendistro-es-data-0	-Xms1g -Xmx1g	-Xms1g -Xmx1g
opendistro-es-client	-Xms1g -Xmx1g	-Xms1g -Xmx1g
opendistro-es-kibana	Not Available	Not Available

Override Heap Memory: The heap memory size needs to be adjusted based on the documents processed or the search queries in use. This can be achieved by editing the jvm.option file with updated size or through the JIFFY manifest file provided.

For describing the pods configurations:
kubectl get pod
kubectl describe -n <NAMESPACE > pod <POD NAME > – /bin/bash
PROCESSORS: node allocatable (limits.cpu)
ES_JAVA_OPTS: -Xms2048m -Xmx2048m

Pods Name	NonProd	Prod
opendistro-es-master-0	-Xms2048m -Xmx2048m	-Xms1024m -Xmx1024m
opendistro-es-data-0	-Xms5120m -Xmx5120m	-Xms1024m -Xmx1024m
opendistro-es-client	-Xms3072m -Xmx3072m	-Xms1024m -Xmx1024m
opendistro-es-kibana	Not Available	Not Available

Validating heap memory via kibana interface:

Shards and Replica

Elasticsearch uses the concept of the shard to subdivide the index into multiple pieces and allows us to make one or more copies of index shards called replicas.

If there is an index with three shards, and each has two replicas, then it means there are a total of nine shards, but only three shards are in active use at that time. If shard allocation is not done in the right way, then it can cause performance issues in the cluster.

The number of shards cannot be changed after an index is created. If you later find it necessary to change the number of shards, then you will have to reindex all the documents again.

To decide the number of shards, you will have to choose a starting point and then try to find the optimal size through testing with your data and queries.

Replicas tend to improve search performance (not always). But, it is recommended to have at least 1 replica (so that data is preserved in case of hardware failure).

For modifying shard settings:

The shards size is calculated based on the per day index size, number and the retention required.

Shards and Replica	NonProd	Prod
Shards	“persistent” : “max_shards_per_node” : “7000” “transient” : “max_shards_per_node” : “7000”	“persistent” : “max_shards_per_node” : “7000” “transient” : “max_shards_per_node” : “7000”
Replica	1	1

JIFFY cloud shards value (52) is calculated based on 26 indexes daily with 1 shard and 1 replica.

Log Level

The storage size returns to normal when the log level enabled is “INFO” which is the default. The storage space will grow significantly if the log level is set to “Debug” because the index size will cross 10GB each day. So keep the log level to “INFO” in all environments.

Storage

The storage sizing has to be done based on index size and the required retention policy.

The index size varies according to how an application is used; the example of the index size below is on a per-day basis.

Sample calculation:

Index Names	Retention Days	NonProd(Per Day)	Prod(Per Day)	Expected Size
jiffy.notify_task	7 Days	0	0	0
jiffy.execution	7 Days	116.5kb	887.6kb	7 MB
jiffy.purge_task	7 Days	60.7kb	73.7kb	7 MB
jiffy.schedule_task	7 Days	1.3mb	144.8kb	7 MB
jiffy.scheduler	7 Days	531.8kb	103.8kb	7 MB
jiffy.alice_design_request	7 Days	968kb	4.9mb	35 MB
jiffy.sys	7 Days	3.9mb	2.2mb	35 MB
security-auditlog	7 Days	22.3mb	4.7mb	100 MB
jiffy.sentry	7 Days	3mb	14.1mb	140 MB
Total Storage size		3 GB (Per Day)	22 GB (Per Day)	175 GB
jiffy.alice_design_response	7 Days	666.4kb	31.7mb	245 MB
jiffy.oreng_design	7 Days	2.1mb	32.5mb	350 MB
jiffy.zeus	7 Days	811.8kb	36.9mb	350 MB
jiffy.del_agent	7 Days	16.4mb	64.3mb	490 MB
jiffy.alice_execution_request	7 Days	15.2mb	293mb	2100 MB
jiffy.anthill	7 Days	3.5mb	356mb	2800 MB
jiffy.gus	7 Days	184mb	458.9mb	3500 MB
jiffy.qreader	7 Days	3.5mb	507.6mb	04200 MB
jiffy.utang	7 Days	8.3mb	709.6mb	7168 MB
jiffy.alice_execution_response	7 Days	603.8mb	1.8gb	14336 MB
jiffy.fileserver	7 Days	5.1mb	1.6gb	14336 MB
jiffy.oreng_exec	7 Days	4.4mb	1.7gb	14336 MB
jiffy.coral	7 Days	226.4kb	2.1gb	15036 MB
jiffy.audit	365 Days	171.3kb	32.2mb	18250 MB
jiffy.jsm	7 Days	904.1mb	2.6gb	21000 GB
jiffy.jiffy	7 Days	69.8mb	3.4gb	28000 MB
jiffy.mangrove	7 Days	709.4kb	4.1gb	28000 MB

PV (Storage)	NonProd	Prod
opendistro-es-data-0	550 Gb	250 Gb

Extend the Allocated Storage: The modification of allocated storage of a Kubernetes stateful set is not a straight-forward method and needs to be modified using the following method.

kubectl get sts -n default
kubectl get sts -n default -o yaml <opendistro-data > | sed 's/storage: existing/storage: new/g' | kubectl apply -f -
#ex: kubectl get sts -o yaml opendistro-es-1-1633090360-data | sed 's/storage: 150Gi/storage: 550Gi/g' | kubectl apply -f -

Retention Policy:

Index State Management (ISM) is a plugin that lets you automate these periodic, administrative operations by triggering them based on changes in the index age, index size, or number of documents. Using the ISM plugin, you can define policies that automatically handle index rollovers or deletions to fit your use case.

Sets the priority of the index as soon as the policy enters the hot, warm, or cold phase. Higher priority indices are recovered before indices with lower priorities following a node restart.

Policy Name	Retention	Replica Count	Transitions	Priority	Index Patterns
jiffyLogs	7 Days	1	Delete	1	jiffy., security-auditlog.
jiffyAudit	365 Days	1	Delete	2	jiffy.audit.*

Log Retention Policy:

In jiffyLog policy, the logs retention is 7 days, hence indexes will be cleared after 7 days. The retention per index can be configured in this file.

JiffyLog:
{
"policy": {
"policy_id": "jiffyLogs",
"description": "A simple policy that changes the replica count between hot and cold states and then deletes the logs after 7d days"
"schema_version": 1,
"error_notification": null,
"default_state": "hot",
"states": [
{
"name": "hot",
"actions": [
{
"replica_count": {
"number_of_replicas": 1
}
}
],
"transitions": [
{
"state_name": "delete",
"conditions": {
"min_index_age": "7d"
}
}
]
},
{
"name": "delete",
"actions": [
{
"delete": {}
}
],
"transitions": []
}
],
"ism_template": {
"index_patterns":[
"jiffy.*",
"security-auditlog*"
],
"priority": 1
}
}
}

Audit Log Retention Policy

In jiffyAudit policy, the logs retention is 365 days, hence indexes will be cleared after 365 days. The retention per index can be configured in this file.

JiffyAudit
{
"policy": {
"policy_id": "jiffyAuditLogs",
"description": "A simple policy that changes the replica count between hot and cold states and then deletes the audit logs after 365d days"
"schema_version": 1,
"error_notification": null,
"default_state": "hot",
"states": [
{
"name": "hot",
"actions": [
{
"replica_count": {
"number_of_replicas": 1
}
}
],
"transitions": [
{
"state_name": "delete",
"conditions": {
"min_index_age": "365d"
}
}
]
},
{
"name": "delete",
"actions": [
{
"delete": {}
}
],
"transitions": []
}
],
"ism_template": {
"index_patterns":[
"jiffy.audit*"
],
"priority": 2
}
}
}

Email Alerts:

Monitoring and Alerting are important aspects of Log Analytics. It helps to monitor the application and also alerts you through different channels like email, slack, Amazon chime, etc about any issues proactively.