Architecture&Design
EnrichmentEngine can be treated as a proxy for third-party service APIs. A Memcache DB has been introduced in the solution to speed up responses to queries for URLs, IP addresses, domain names or file hashes. By design, cache refreshing will be configured at the TTL per namespace definition level.
The first attempt to query EE will go to MemCache, in a situation where there is no entry or the entry is expunged, a query of an external service will be performed: Shodan, Virustotal, Censys etc.
Example transaction flow:
The document collected by Elasticsearch goes to Logstash.
As a result of document sparsing, the URL, IP address, domain name, file hash are extracted.
An object request is sent to the EE service through the API.
In response, the MemCache result is returned, if such an entry existed.
If there was no entry in MemCache, the query is sent to the external service in question.
The result returned by EE is aggregated into information: Timestamp. URL name, external service name, Security_Flagged [YES/NO] and stored in a dedicated index on Elasticsearch. On request by parametrization in the API call, the entire entry can be retrieved from the external service.