Java API Reference¶
Auto-generated API documentation from Javadoc comments using Doxygen and Breathe.
Note
This section is generated automatically from the source code. To rebuild,
run doxygen Doxyfile from the project root, then make html from the
docs/ directory.
Core Domain Model¶
Warning
doxygenclass: Cannot find class “dev::sieve::core::model::SanctionedEntity” in doxygen xml output for project “sieve” from directory: /home/runner/work/sieve-aml/sieve-aml/docs/../doxygen/xml
Warning
doxygenclass: Cannot find class “dev::sieve::core::model::NameInfo” in doxygen xml output for project “sieve” from directory: /home/runner/work/sieve-aml/sieve-aml/docs/../doxygen/xml
Warning
doxygenclass: Cannot find class “dev::sieve::core::model::Address” in doxygen xml output for project “sieve” from directory: /home/runner/work/sieve-aml/sieve-aml/docs/../doxygen/xml
Warning
doxygenclass: Cannot find class “dev::sieve::core::model::Identifier” in doxygen xml output for project “sieve” from directory: /home/runner/work/sieve-aml/sieve-aml/docs/../doxygen/xml
Warning
doxygenclass: Cannot find class “dev::sieve::core::model::SanctionsProgram” in doxygen xml output for project “sieve” from directory: /home/runner/work/sieve-aml/sieve-aml/docs/../doxygen/xml
Index¶
-
interface EntityIndex¶
Abstraction over a store of
SanctionedEntityinstances.Implementations must be thread-safe. The index supports bulk loading via
addAll, individual inserts viaadd, and various query methods.Subclassed by dev.sieve.core.index.InMemoryEntityIndex
Public Functions
-
void addAll(Collection<SanctionedEntity> entities)¶
Adds all entities in the given collection to the index.
- Parameters:
entities – the entities to add, must not be
-
void add(SanctionedEntity entity)¶
Adds a single entity to the index.
- Parameters:
entity – the entity to add, must not be
-
void clear()¶
Removes all entities from the index.
-
int size()¶
Returns the total number of entities in the index.
- Returns:
entity count, always
-
Collection<SanctionedEntity> all()¶
Returns an unmodifiable view of all entities in the index.
- Returns:
all entities, never
-
Collection<SanctionedEntity> findBySource(ListSource source)¶
Returns all entities from a specific sanctions list source.
- Parameters:
source – the list source to filter by, must not be
- Returns:
matching entities, never
-
Optional<SanctionedEntity> findById(String id)¶
Looks up a single entity by its source-specific ID.
- Parameters:
id – the entity ID, must not be
- Returns:
the entity if found, or empty
-
IndexStats stats()¶
Returns statistical information about the index contents.
- Returns:
current index statistics, never
-
void addAll(Collection<SanctionedEntity> entities)¶
- dev::sieve::core::index::InMemoryEntityIndex : public dev.sieve.core.index.EntityIndex
Thread-safe, in-memory implementation of
EntityIndex.Backed by a
ConcurrentHashMapkeyed on entity ID, with a secondary index byListSourcefor efficient filtered queries. All mutating operations are safe for concurrent access from multiple threads.Public Functions
-
inline InMemoryEntityIndex()¶
Creates a new, empty in-memory entity index.
-
inline void addAll(Collection<SanctionedEntity> entities)¶
Adds all entities in the given collection to the index.
- Parameters:
entities – the entities to add, must not be
-
inline void add(SanctionedEntity entity)¶
Adds a single entity to the index.
- Parameters:
entity – the entity to add, must not be
-
inline void clear()¶
Removes all entities from the index.
-
inline int size()¶
Returns the total number of entities in the index.
- Returns:
entity count, always
-
inline Collection<SanctionedEntity> all()¶
Returns an unmodifiable view of all entities in the index.
- Returns:
all entities, never
-
inline Collection<SanctionedEntity> findBySource(ListSource source)¶
Returns all entities from a specific sanctions list source.
- Parameters:
source – the list source to filter by, must not be
- Returns:
matching entities, never
-
inline Optional<SanctionedEntity> findById(String id)¶
Looks up a single entity by its source-specific ID.
- Parameters:
id – the entity ID, must not be
- Returns:
the entity if found, or empty
-
inline IndexStats stats()¶
Returns statistical information about the index contents.
- Returns:
current index statistics, never
-
inline InMemoryEntityIndex()¶
Match Engine¶
-
interface MatchEngine¶
Service Provider Interface for sanctions screening match engines.
Implementations encapsulate a specific matching algorithm (e.g., exact match, fuzzy match, phonetic match) and produce scored results against the entities in an
EntityIndex.Public Functions
-
List<MatchResult> screen(ScreeningRequest request, EntityIndex index)¶
Screens the given request against all applicable entities in the index.
Results are filtered by the request’s threshold and optional entity type / source filters, then returned in descending score order.
- Parameters:
request – the screening request containing the name and filters
index – the entity index to screen against
- Returns:
matching results sorted by score descending, never
-
List<MatchResult> screen(ScreeningRequest request, EntityIndex index)¶
Warning
doxygenclass: Cannot find class “dev::sieve::core::match::MatchResult” in doxygen xml output for project “sieve” from directory: /home/runner/work/sieve-aml/sieve-aml/docs/../doxygen/xml
Warning
doxygenclass: Cannot find class “dev::sieve::core::match::ScreeningRequest” in doxygen xml output for project “sieve” from directory: /home/runner/work/sieve-aml/sieve-aml/docs/../doxygen/xml
-
class FuzzyMatchEngine : public MatchEngine¶
Match engine that uses Jaro-Winkler fuzzy string similarity.
Compares the screening query against each entity’s primary name and all aliases, keeping the best (highest) score per entity. Results below the request’s threshold are discarded.
Public Functions
-
inline FuzzyMatchEngine(NormalizedNameCache nameCache, NgramIndex ngramIndex)¶
Creates a fuzzy match engine with shared name cache and n-gram index.
-
inline FuzzyMatchEngine(NormalizedNameCache nameCache)¶
Creates a fuzzy match engine with a shared name cache (no n-gram filtering).
-
inline FuzzyMatchEngine()¶
Creates a fuzzy match engine with its own name cache and n-gram index.
-
inline FuzzyMatchEngine(NormalizedNameCache nameCache, NgramIndex ngramIndex)¶
-
class ExactMatchEngine : public MatchEngine¶
Match engine that performs exact (post-normalization) name comparison.
Names are normalized by lowercasing, trimming, and collapsing whitespace before comparison. Checks the entity’s primary name and all aliases. Produces a score of 1.0 for exact matches and 0.0 otherwise.
Public Functions
-
inline ExactMatchEngine(NormalizedNameCache nameCache, NgramIndex ngramIndex)¶
Creates an exact match engine with shared name cache and n-gram index.
-
inline ExactMatchEngine(NormalizedNameCache nameCache)¶
Creates an exact match engine with a shared name cache (no n-gram filtering).
-
inline ExactMatchEngine()¶
Creates an exact match engine with its own name cache and n-gram index.
-
inline ExactMatchEngine(NormalizedNameCache nameCache, NgramIndex ngramIndex)¶
-
class CompositeMatchEngine : public MatchEngine¶
Composite match engine that delegates to multiple underlying engines.
Runs all registered engines, deduplicates results by entity ID, and keeps the highest score per entity. The final result list is sorted by score in descending order.
Public Functions
-
inline CompositeMatchEngine(List<MatchEngine> engines)¶
Creates a composite engine delegating to the given engines.
- Parameters:
engines – the match engines to delegate to, must not be or empty
- Throws:
NullPointerException – if is
IllegalArgumentException – if is empty
-
inline CompositeMatchEngine(List<MatchEngine> engines)¶
-
class NgramIndex¶
Trigram-based inverted index for fast candidate selection.
At build time, every entity’s normalized primary name and alias names are decomposed into overlapping 3-character trigrams. An inverted map is constructed from each trigram to the set of entity IDs containing that trigram.
At query time, the query string is decomposed into trigrams, the candidate entity IDs are collected by trigram overlap, and only entities sharing a minimum fraction of trigrams are returned. This typically reduces the candidate set from tens of thousands to tens of entities.
Thread-safe. Automatically rebuilds when the underlying index size changes.
Public Functions
-
inline void ensureBuilt(EntityIndex index, NormalizedNameCache nameCache)¶
Ensures the index is built and up-to-date for the given entity index.
- Parameters:
index – the entity index
nameCache – the pre-normalized name cache (must already be built)
-
inline Collection<SanctionedEntity> candidates(String normalizedQuery)¶
Returns candidate entities whose names share trigrams with the given normalized query.
Candidates are ranked by the number of shared trigrams and filtered by a minimum overlap ratio. The result is a subset of all entities — typically 1%.
- Parameters:
normalizedQuery – the query string, already normalized
- Returns:
candidate entities, never
-
inline int size()¶
Returns the total number of indexed entities.
-
inline void ensureBuilt(EntityIndex index, NormalizedNameCache nameCache)¶
-
class NormalizedNameCache¶
Cache of pre-normalized entity names for use by match engines.
Normalizing names (lowercasing, trimming, collapsing whitespace) is expensive when repeated for every entity on every query. This cache pre-computes normalized forms once when entities are loaded and serves them on subsequent lookups, eliminating redundant work.
Thread-safe. Automatically rebuilds when the index size changes (indicating new data).
Public Functions
-
inline record NormalizedEntry(String primaryName, List<String> aliases)¶
Pre-normalized names for a single entity.
- Parameters:
primaryName – the normalized primary name
aliases – the normalized alias names, in the same order as the entity’s alias list
-
inline void ensureBuilt(EntityIndex index)¶
Ensures the cache is built and up-to-date for the given index.
If the index size has changed since the last build, the cache is rebuilt. This method should be called once at the start of each screening operation.
- Parameters:
index – the entity index to cache names for
-
inline NormalizedEntry get(SanctionedEntity entity)¶
Returns the pre-normalized names for the given entity, computing on cache miss.
- Parameters:
entity – the entity to look up
- Returns:
the pre-normalized entry, never
-
inline void invalidate()¶
Clears the cache, forcing a full rebuild on the next
ensureBuiltcall.
-
inline record NormalizedEntry(String primaryName, List<String> aliases)¶
Algorithms¶
-
class JaroWinkler¶
Pure-Java implementation of the Jaro-Winkler string similarity algorithm.
Jaro-Winkler is a string metric commonly used in record linkage and name matching. It produces a similarity score between 0.0 (no similarity) and 1.0 (exact match), with a prefix bonus that favors strings sharing a common prefix.
This implementation follows the original Winkler (1990) formulation with a default prefix scaling factor of 0.1 and a maximum prefix length of 4 characters.
See also
Public Static Functions
-
static inline double similarity(String s1, String s2)¶
Computes the Jaro-Winkler similarity between two strings.
Both strings are compared as-is (no normalization is applied). Callers should pre-process strings (e.g., lowercasing, trimming) before invoking this method if case-insensitive comparison is desired.
- Parameters:
s1 – the first string, may be or empty
s2 – the second string, may be or empty
- Returns:
similarity score in the range [0.0, 1.0]
-
static inline double similarityWithThreshold(String s1, String s2, double threshold)¶
Computes the Jaro-Winkler similarity, returning 0.0 early if it cannot meet the threshold.
- Parameters:
s1 – the first string
s2 – the second string
threshold – minimum score; returns 0.0 if the result cannot meet this
- Returns:
similarity score, or 0.0 if below threshold
-
static inline double similarity(String s1, String s2)¶
Ingestion¶
-
interface ListProvider¶
Service Provider Interface for fetching and parsing a specific sanctions list.
Each implementation is responsible for a single
ListSource: downloading the raw data, parsing it intoSanctionedEntityrecords, and tracking metadata for delta detection.Subclassed by dev.sieve.ingest.eu.EuConsolidatedProvider, dev.sieve.ingest.ofac.OfacSdnProvider, dev.sieve.ingest.uk.UkHmtProvider, dev.sieve.ingest.un.UnConsolidatedProvider
Public Functions
-
ListSource source()¶
Returns the sanctions list source this provider handles.
- Returns:
the list source, never
-
ListMetadata metadata()¶
Returns metadata from the most recent successful fetch.
- Returns:
the metadata, never
-
List<SanctionedEntity> fetch()¶
Fetches and parses the sanctions list into normalized entities.
- Throws:
ListIngestionException – if fetching or parsing fails
- Returns:
the parsed entities, never
-
boolean hasUpdates(ListMetadata previousMetadata)¶
Checks whether the remote list has been updated since the given metadata snapshot.
Implementations should use lightweight mechanisms such as HTTP (ETag) or headers to avoid downloading the full file.
- Parameters:
previousMetadata – metadata from a prior fetch to compare against
- Returns:
if the remote list has changed, otherwise
-
ListSource source()¶
-
class IngestionOrchestrator¶
Orchestrates the ingestion of sanctions lists from all registered
ListProviders.Runs each provider, merges results into the
EntityIndex, and produces a detailedIngestionReport. Supports both full and selective (source-filtered) ingestion runs.Public Functions
-
inline IngestionOrchestrator(List<ListProvider> providers)¶
Creates an orchestrator with the given list of providers.
- Parameters:
providers – the providers to orchestrate, must not be or empty
- Throws:
NullPointerException – if is
IllegalArgumentException – if is empty
-
inline IngestionReport ingest(EntityIndex index)¶
Runs all registered providers and loads their entities into the given index.
- Parameters:
index – the entity index to populate
- Returns:
an ingestion report summarizing the results
-
inline IngestionReport ingest(EntityIndex index, Set<ListSource> sources)¶
Runs only the providers matching the given sources and loads their entities into the index.
Providers not in the set are reported as
ProviderResult.Status#SKIPPED.- Parameters:
index – the entity index to populate
sources – the sources to include, or to run all providers
- Returns:
an ingestion report summarizing the results
-
inline ListMetadata getMetadata(ListSource source)¶
Returns cached metadata for the given source from the last successful fetch.
- Parameters:
source – the list source
- Returns:
the metadata, or if the source has not been fetched
-
inline IngestionOrchestrator(List<ListProvider> providers)¶