Archive and Purge using API

You can archive outdated data (for example, data that is older than a year) for forensic analysis and decision making. Archived data is not the same as backup data. It is recommended to use a proper naming convention for every archive that you create.

You can use the following REST API to archive data:

curl -X POST -u "username:password" -H "content-type:application/json" -H "Accept:application/json"
"http://host:port/rest/apigateway/apitransactions/archives?time_interval&eventType=specific_eventType"

Replace time_interval with a specific time interval using the from and until query parameters or specify transactions older than a certain duration using the olderThan parameter.

Replace specific_eventType with the required event type to archive. For the list of events that you can archive, see List of Indexes that can be archived or purged. You can use eventType=ALL to archive all types of events within a time interval.

When archiving transactions in API Gateway, you can specify the transactions to archive in one of the following ways:

Using a specific time interval

The time interval allows you to archive transactions that occurred within a specific range of time. You define this range using the from and until query parameters. Both parameters use the format YYYY-MM-DD%20HH:MM:SS, where %20 represents a space.

The following API archives data of all event types for the period, 3 June 2021 to 4 June 2021:

curl -X POST -u "Administrator:manage" -H "content-type:application/json" -H "Accept:application/json"
"http://host:port/rest/apigateway/apitransactions/archives?
from=2021-06-03%2000:00:00&until=2021-06-04%2000:00:00&eventType=ALL"

A sample response for the archive API is as follows:

Specifying transactions older than a certain duration

The olderThan parameter allows you to archive transactions that are older than a specified duration. This is useful for archiving transactions based on their age, regardless of when they occurred.

The olderthan parameter is the timeline field and can have one of the following values:

Timeline Name	Syntax	Example	Usage
Year	<number>Y	<1>Year	Archives all data up to last 1 year
Month	<number> M	<1> M	Archives all data up to last 1 month
Days	<number>d	<1>d	Archives all data up to last day
Time	<number>h<number> m<number>s	14h30m2s	Archives all data up to the given time

The following API archives data of all event types older than 1 month:

curl -X POST -u "Administrator:manage" -H "content-type:application/json" -H "Accept:application/json"
"http://host:port/rest/apigateway/apitransactions/archives?
olderThan=1M&eventType=ALL"

A sample response for the archive API is as follows:

You can schedule archiving using cron jobs or any other scheduling methods. Archiving is a resource-intensive operation. You must not schedule it during peak hours.

Monitor archive status

You can monitor the archive job status using the following API:

A sample response if the archive job is successful is as follows:

{
"status": "Completed",
"action": "archive",
"jobId": "9a0765d4-2d17-4a9d-8b30-2f91ebdc7d95",
"creationDate": "2024-06-03 18:55:41 GMT",
"Filename": "default-2024-06-03-1717440941375"
}

If the archive job is successful, the status field in the output displays Completed. The archived data is stored in the default location java.io.tmpdir/tenant_id. For example, if java.io.tmpdir is /tmp and tenantId is default, the location is /tmp/default. You can specify a custom storage location using the backupsharedFilelocation extended setting. The archive file is named in the format tenant_id-yyyy-MM-dd-time_in_milliseconds.

If the archive job fails, the status field in the above output displays Failed. You must configure alerts for failures to get notified about the failed archive jobs. Common reasons for failure include health of the Elasticsearch cluster, load on the system, and so on. You can look into server logs located at /opt/softwareag/IntegrationServer/logs/server.log and analyze the failure reasons.

You can schedule and automate the purge process. You can also purge data manually through the API Gateway UI. To learn more about how to purge data manually, refer to the Archive and Purge using UI section. You can purge the following data using commands.

You can purge analytics data based on timeline or size. As an example of timeline based purging, you can purge data older than an year. As an example of size-based purging, you can purge data greater than 100 GB.

Timeline based purging

You can use the following API to purge the analytics data of the specified event type and period:

curl -X DELETE -u "Administrator:manage" -H "Accept:application/json"
"http://apigw_host:apigw_port/rest/apigateway/apitransactions?
action=purge&eventType=specific_eventType&olderthan=time_interval"

Replace specific_eventType with the required event type to purge. For the list of events that you can specify in the API, see List of Indexes that can be archived or purged.

The from and until parameters use the format YYYY-MM-DD%20HH:MM:SS, where %20 represents a space.

The olderthan parameter can have one of the following values:

Timeline Name	Syntax	Example	Usage
Year	<number>Y	<1>Year	Purges all data up to last 1 year
Month	<number> M	<1> M	Purges all data up to last 1 month
Days	<number>d	<1>d	Purges all data up to last day
Time	<number>h<number> m<number>s	14h30m2s	Purge all data up to the given time

A sample response for the purge API is as follows:

Monitor purge status

You can monitor the purge job status using the following API:

A sample response if the purge job is successful is as follows:

{
"status": "Completed",
"action": "purge",
"jobId": "de332362-738d-40df-939a-ec9e60b5d64a",
"creationDate": "2024-06-04 11:35:43 GMT",
"totalDocuments": 4,
"deletedDocuments": 4
}

If the purge job is successful, the status field in the output displays Completed.

If the purge job fails, the status field in the output displays Failed. You must configure alerts for failures to get notified about the failed purge jobs. You can look into server logs located at /opt/softwareag/IntegrationServer/logs/server.log and analyze the failure reasons.

You can purge data based on size. When the size of an index exceeds the specified 25 GB limit, you must roll over the index. When you roll over an index, a new index is created. When you have new indexes, you can purge the old indexes. For example, if you have set maximum size for analytics data as 300 GB, maximum size of an index to be 25 GB, and if your data grows to 325 GB, then you have 13 indexes and the size of each index is 25 GB. Each index contains a primary and a replica shard. So, when the size of the primary shard of an index equals 12.5 GB, the size of the replica index will also be 12.5 GB. The total size of the index will be 25 GB. Hence, you must check the size of the primary shard of an index to decide whether the index needs to be rolled over.

You must regularly monitor the indexes that need to be purged. For information on calculating index size, see Calculating index size.

If you regularly roll over indexes, it becomes easier to find the oldest indexes and purge them. Purging older and obsolete indexes ensure the quick recovery of disk space.

Perform the following steps to purge an index:

Note:
The above API returns the list of indices in descending order of index name. API Gateway follows the pattern, gateway_default_analytics_epoch_00000n, where the date and time is represented in the epoch format and 'n' denotes any number starting from 1, which increments during rollover.

API Gateway returns the following pattern, aliasname_yyyyMMddhhmm-00000n when no target index suffix parameter is provided during roll over. If a target index suffix parameter is provided during rollover, API Gateway returns aliasname_targetIndexSuffix.

Data backups are created to safeguard data in a repository for restoring in case of any disasters. The backup snapshots created over a period of time occupies a considerable disk space. Hence, it is essential to purge backup snapshots that are older than the data retention period.

For information on purging backup snapshots, see Deleting a Backup File.

API Gateway may have empty indexes due to roll-over and purge operations. It is essential to cleanup the empty indexes. You can delete an index if there are multiple indexes and the index to be deleted is not a write index. It is recommended to perform the purge operation after the scheduled backups.

You can use the following API to check the documents stored by the indexes:

The API returns the following response:

If an index's response value is more than 0, it implies that index is not empty and must not be deleted.

You can use the following API to check if the indexes are write index:

The above API returns the following response:

If an index has a value true, it implies that the index is a write index and should not be deleted.

You can use the following API to delete an index:

To view the list of available indexes, use the following command:

You can schedule the purge operation of indexes using a cron job or some other scheduling method. You can schedule index purging on a daily basis. You can monitor the index delete index by using the following API:

If the deletion is successful, the above API returns status code 404. You must configure alerts for failed purge jobs. When a purge job fails, you must check the Elasticsearch logs to troubleshoot.

You can use the following API to delete expired OAuth tokens.

A sample response for purging OAuth tokens is as follows:

You can schedule the purge operation of indexes using a cron job or some other scheduling method. You can schedule OAuth token purging on a daily basis. You must configure alerts for failed purge jobs. When a purge job fails, you must check the server logs.

You must delete the archive data after it reaches the maximum retention period. There is no API to clear the archive data. You must delete archives manually. You can delete archives on a daily basis.

Important:
API Gateway does not perform the purge operation immediately. Once you initiate the purge process, API Gateway starts to mark the files for deletion. Once all the files are marked, the status of the purge operation shows 100% and this implies that the files are deleted internally. However, the actual disk space which was occupied by the purged files is not freed-up, even after the purge status shows 100%. When the purge status is 100%, it implies that the files are internally deleted and it may take up more time to free up the disk space.

target. Comma-separated list of data streams, indexes, and aliases used to limit the request. This parameter supports wildcards (*). To target all data streams and indices, exclude this parameter or use * or _all.

You can archive outdated transactional data and purge external Elasticsearch logs through the REST API endpoint by leveraging the archiveAndPurge action parameter. This allows you to streamline operations by first archiving older transactional events (eventType=eventtype) and then purging them based on a specific time interval using the from and until or the olderThan query parameters. This approach ensures that you securely store historical data for compliance and auditing purposes while optimizing database performance by removing obsolete records from active systems.

A sample command to archive and purge data is as follows:

curl --location --request DELETE 'http://host:port/rest/apigateway/apitransactions?action=archiveAndPurge&time_interval&eventType=specific_eventType' \
--header 'accept: application/json' \
--header 'Authorization: Basic QWRtaW5pc3RyYXRvcjptYW5hZ2U='

Replace time_interval with a specific time interval using the from and until query parameters or specify transactions older than a certain duration using the olderThan parameter.

Replace specific_eventType with the required event type to archive and purge. For the list of events that you can archive, see List of Indexes that can be archived or purged. You can use eventType=ALL to archive all types of events within a time interval.

Sample response for the archive and purge API is as follows:

Monitor archive and purge status

You can monitor the archive and purge job status using the following API:

A sample response if the archive and purge job is successful is as follows:

{
"status": "Completed",
"action": "archiveAndPurge",
"jobId": "7eb1ecfd-99d1-46a4-8818-c2b3733385f1",
"creationDate": "2024-06-05 06:33:30 GMT",
"totalDocuments": 1,
"deletedDocuments": 1,
"Filename": "default-2024-06-05-1717569210068",
"archiveStatus": "Completed",
"purgeStatus": "Completed"
}

If the archive and purge job is successful, the status field in the output displays Completed. The archived data is stored in the default location java.io.tmpdir/tenant_id. For example, if java.io.tmpdir is /tmp and tenantId is default, the location is /tmp/default. You can specify a custom storage location using the backupsharedFilelocation extended setting. The archive file is named in the format tenant_id-yyyy-MM-dd-time_in_milliseconds.

If the archive and purge job fails, the status field in the output displays Failed. You must configure alerts for failures to get notified about the failed archive jobs. Common reasons for failure include health of the Elasticsearch cluster, load on the system, and so on. You can look into server logs located at /opt/softwareag/IntegrationServer/logs/server.log and analyze the failure reasons.

You can schedule archiving using cron jobs or any other scheduling methods. Archiving is a resource-intensive operation. You must not schedule it during peak hours.