Search

Elasticsearch Sizing and Configuration

Articles

The reference of the configuration values and parameters used in this document are from one of the production and non-production Automate instances hosted in JIFFY Managed Cloud.

The non-production elasticsearch is been shared between different Automate servers where as the production elasticsearch is a dedicated one.

ElasticSearch/Opendistro Version

This document is applicable for elasticsearch versions 1.13.2 and 1.13.3. The following are the JIFFY Automate instances considered to standardize the parameters.

Environment Version
Prod opendistro-for-elasticsearch:1.13.2
Non-Prod opendistro-for-elasticsearch:1.13.3

Ingress Controller Parameters

Nginx Proxy Body Size:

If the request body size exceeds the maximum allowed size of the client request body, then the NGINX Ingress Controller returns an HTTP 413 error. Use the client_max_body_size parameter to configure a larger size.

The default value of the proxy-body-size is 1m. Make sure to change the number to the size you need.

Ingress Name Non-Prod Value Prod Value
opendistro-es-client 500m 400m
opendistro-es-kibana 400m 400m

Elasticsearch Parameters

Heap size settings:

By default, Elasticsearch automatically sets the JVM heap size based on a node’s roles and total memory.JIFFY recommends the default sizing for most production environments.

To override the default heap size, set the minimum and maximum heap size settings, Xms and Xmx. The minimum and maximum values must be the same.

Set Xms and Xmx to no more than 50% of your total memory. Elasticsearch requires memory for purposes other than the JVM heap. For example, Elasticsearch uses off-heap buffers for efficient network communication and relies on the operating system’s file system cache for efficient access to files. The JVM itself also requires some memory. It’s normal for Elasticsearch to use more memory than the limit configured with the Xmx setting.

Default Heap Memory:

You should always set the min and max JVM heap size to the same value. For example, to set the heap to 4 GB.

Find the default heap memory setting here, connect to the cluster, then check the JVM configurations.

Check the opendistro pods:
kubectl get pod -A
Connect to each opendistro pods and check the jvm.options file.
kubectl exec -it <POD NAME> -n <NAMESPACE > – /bin/bash
cat config\jvm.options

Pods Name NonProd Prod
opendistro-es-master-0 -Xms1g -Xmx1g -Xms1g -Xmx1g
opendistro-es-data-0 -Xms1g -Xmx1g -Xms1g -Xmx1g
opendistro-es-client -Xms1g -Xmx1g -Xms1g -Xmx1g
opendistro-es-kibana Not Available Not Available

Override Heap Memory: The heap memory size needs to be adjusted based on the documents processed or the search queries in use. This can be achieved by editing the jvm.option file with updated size or through the JIFFY manifest file provided.

For describing the pods configurations:
kubectl get pod
kubectl describe -n <NAMESPACE > pod <POD NAME > – /bin/bash
PROCESSORS: node allocatable (limits.cpu)
ES_JAVA_OPTS: -Xms2048m -Xmx2048m

Pods Name NonProd Prod
opendistro-es-master-0 -Xms2048m -Xmx2048m -Xms1024m -Xmx1024m
opendistro-es-data-0 -Xms5120m -Xmx5120m -Xms1024m -Xmx1024m
opendistro-es-client -Xms3072m -Xmx3072m -Xms1024m -Xmx1024m
opendistro-es-kibana Not Available Not Available

Validating heap memory via kibana interface: Image description

Shards and Replica

Elasticsearch uses the concept of the shard to subdivide the index into multiple pieces and allows us to make one or more copies of index shards called replicas.

If there is an index with three shards, and each has two replicas, then it means there are a total of nine shards, but only three shards are in active use at that time. If shard allocation is not done in the right way, then it can cause performance issues in the cluster.

The number of shards cannot be changed after an index is created. If you later find it necessary to change the number of shards, then you will have to reindex all the documents again.

To decide the number of shards, you will have to choose a starting point and then try to find the optimal size through testing with your data and queries.

Replicas tend to improve search performance (not always). But, it is recommended to have at least 1 replica (so that data is preserved in case of hardware failure).

For modifying shard settings:

Image description The shards size is calculated based on the per day index size, number and the retention required.

Shards and Replica NonProd Prod
Shards “persistent” :
“max_shards_per_node” : “7000”
“transient” :
“max_shards_per_node” : “7000”
“persistent” :
“max_shards_per_node” : “7000”
“transient” :
“max_shards_per_node” : “7000”
Replica 1 1

JIFFY cloud shards value (52) is calculated based on 26 indexes daily with 1 shard and 1 replica.

Log Level

The storage size returns to normal when the log level enabled is “INFO” which is the default. The storage space will grow significantly if the log level is set to “Debug” because the index size will cross 10GB each day. So keep the log level to “INFO” in all environments.

Storage

The storage sizing has to be done based on index size and the required retention policy.

The index size varies according to how an application is used; the example of the index size below is on a per-day basis.

Sample calculation:

Index Names Retention Days NonProd(Per Day) Prod(Per Day) Expected Size
jiffy.notify_task 7 Days 0 0 0
jiffy.execution 7 Days 116.5kb 887.6kb 7 MB
jiffy.purge_task 7 Days 60.7kb 73.7kb 7 MB
jiffy.schedule_task 7 Days 1.3mb 144.8kb 7 MB
jiffy.scheduler 7 Days 531.8kb 103.8kb 7 MB
jiffy.alice_design_request 7 Days 968kb 4.9mb 35 MB
jiffy.sys 7 Days 3.9mb 2.2mb 35 MB
security-auditlog 7 Days 22.3mb 4.7mb 100 MB
jiffy.sentry 7 Days 3mb 14.1mb 140 MB
**Total Storage size ** 3 GB (Per Day) 22 GB (Per Day) 175 GB
jiffy.alice_design_response 7 Days 666.4kb 31.7mb 245 MB
jiffy.oreng_design 7 Days 2.1mb 32.5mb 350 MB
jiffy.zeus 7 Days 811.8kb 36.9mb 350 MB
jiffy.del_agent 7 Days 16.4mb 64.3mb 490 MB
jiffy.alice_execution_request 7 Days 15.2mb 293mb 2100 MB
jiffy.anthill 7 Days 3.5mb 356mb 2800 MB
jiffy.gus 7 Days 184mb 458.9mb 3500 MB
jiffy.qreader 7 Days 3.5mb 507.6mb 04200 MB
jiffy.utang 7 Days 8.3mb 709.6mb 7168 MB
jiffy.alice_execution_response 7 Days 603.8mb 1.8gb 14336 MB
jiffy.fileserver 7 Days 5.1mb 1.6gb 14336 MB
jiffy.oreng_exec 7 Days 4.4mb 1.7gb 14336 MB
jiffy.coral 7 Days 226.4kb 2.1gb 15036 MB
jiffy.audit 365 Days 171.3kb 32.2mb 18250 MB
jiffy.jsm 7 Days 904.1mb 2.6gb 21000 GB
jiffy.jiffy 7 Days 69.8mb 3.4gb 28000 MB
jiffy.mangrove 7 Days 709.4kb 4.1gb 28000 MB
PV (Storage) NonProd Prod
opendistro-es-data-0 550 Gb 250 Gb

Extend the Allocated Storage: The modification of allocated storage of a Kubernetes stateful set is not a straight-forward method and needs to be modified using the following method.

kubectl get sts -n default
kubectl get sts -n default -o yaml <opendistro-data > | sed 's/storage: existing/storage: new/g' | kubectl apply -f -
#ex: kubectl get sts -o yaml opendistro-es-1-1633090360-data | sed 's/storage: 150Gi/storage: 550Gi/g' | kubectl apply -f -

Retention Policy:

Index State Management (ISM) is a plugin that lets you automate these periodic, administrative operations by triggering them based on changes in the index age, index size, or number of documents. Using the ISM plugin, you can define policies that automatically handle index rollovers or deletions to fit your use case.

Sets the priority of the index as soon as the policy enters the hot, warm, or cold phase. Higher priority indices are recovered before indices with lower priorities following a node restart.

Policy Name Retention Replica Count Transitions Priority Index Patterns
jiffyLogs 7 Days 1 Delete 1 jiffy., security-auditlog.
jiffyAudit 365 Days 1 Delete 2 jiffy.audit.*

Log Retention Policy:

In jiffyLog policy, the logs retention is 7 days, hence indexes will be cleared after 7 days. The retention per index can be configured in this file.

JiffyLog:
{
"policy": {
"policy_id": "jiffyLogs",
"description": "A simple policy that changes the replica count between hot and cold states and then deletes the logs after 7d days"
"schema_version": 1,
"error_notification": null,
"default_state": "hot",
"states": [
{
"name": "hot",
"actions": [
{
"replica_count": {
"number_of_replicas": 1
}
}
],
"transitions": [
{
"state_name": "delete",
"conditions": {
"min_index_age": "7d"
}
}
]
},
{
"name": "delete",
"actions": [
{
"delete": {}
}
],
"transitions": []
}
],
"ism_template": {
"index_patterns":[
"jiffy.*",
"security-auditlog*"
],
"priority": 1
}
}
}

Audit Log Retention Policy

In jiffyAudit policy, the logs retention is 365 days, hence indexes will be cleared after 365 days. The retention per index can be configured in this file.

JiffyAudit
{
"policy": {
"policy_id": "jiffyAuditLogs",
"description": "A simple policy that changes the replica count between hot and cold states and then deletes the audit logs after 365d days"
"schema_version": 1,
"error_notification": null,
"default_state": "hot",
"states": [
{
"name": "hot",
"actions": [
{
"replica_count": {
"number_of_replicas": 1
}
}
],
"transitions": [
{
"state_name": "delete",
"conditions": {
"min_index_age": "365d"
}
}
]
},
{
"name": "delete",
"actions": [
{
"delete": {}
}
],
"transitions": []
}
],
"ism_template": {
"index_patterns":[
"jiffy.audit*"
],
"priority": 2
}
}
}

Email Alerts:

Monitoring and Alerting are important aspects of Log Analytics. It helps to monitor the application and also alerts you through different channels like email, slack, Amazon chime, etc about any issues proactively.

.

See Also

Did you find what you were looking for?