Data Flow¶
This page describes how data flows through Sieve — from sanctions list ingestion through to screening query responses.
Ingestion Flow¶
sequenceDiagram
participant O as IngestionOrchestrator
participant P as ListProvider (e.g., OFAC)
participant X as XML Source
participant I as EntityIndex
O->>P: fetch()
P->>X: HTTP GET (with ETag)
X-->>P: XML response
P->>P: StAX streaming parse
P-->>O: List<SanctionedEntity>
O->>I: addAll(entities)
Note over I: ConcurrentHashMap<br/>keyed by entity ID
Each ListProvider implementation:
Fetches the XML document from the official source URL
Parses it with StAX streaming (constant memory, handles large files)
Maps XML elements to the unified
SanctionedEntitydomain modelReturns a list of entities to the orchestrator
The IngestionOrchestrator coordinates multiple providers, collects
results, and produces an IngestionReport with per-source statistics.
Screening Flow¶
sequenceDiagram
participant C as Client
participant SC as ScreeningController
participant CE as CompositeMatchEngine
participant NC as NormalizedNameCache
participant NG as NgramIndex
participant JW as JaroWinkler
C->>SC: POST /api/v1/screen
SC->>CE: screen(request, index)
CE->>CE: ExactMatchEngine.screen()
CE->>NC: ensureBuilt(index)
CE->>NG: ensureBuilt(index, nameCache)
CE->>NG: candidates("vladimir putin")
NG-->>CE: ~50 entities
loop For each candidate
CE->>NC: get(entity)
NC-->>CE: NormalizedEntry
CE->>JW: similarityWithThreshold()
JW-->>CE: score
end
CE->>CE: FuzzyMatchEngine.screen()
Note over CE: Same flow as above
CE->>CE: Deduplicate by entity ID
CE-->>SC: List<MatchResult>
SC-->>C: 200 OK + JSON response
Entity Model¶
The unified domain model represents sanctioned entities from all list sources:
classDiagram
class SanctionedEntity {
+String id
+EntityType entityType
+ListSource listSource
+NameInfo primaryName
+List~NameInfo~ aliases
+List~Address~ addresses
+List~Identifier~ identifiers
+List~SanctionsProgram~ programs
+List~String~ nationalities
+String remarks
+Instant lastUpdated
}
class NameInfo {
+String fullName
+String firstName
+String lastName
+NameType nameType
+NameStrength nameStrength
+ScriptType scriptType
}
class Address {
+String street
+String city
+String stateOrProvince
+String postalCode
+String country
+String fullAddress
}
class Identifier {
+IdentifierType type
+String value
+String country
+String issueDate
+String expirationDate
}
class SanctionsProgram {
+String code
+String description
}
SanctionedEntity --> NameInfo : primaryName
SanctionedEntity --> "0..*" NameInfo : aliases
SanctionedEntity --> "0..*" Address : addresses
SanctionedEntity --> "0..*" Identifier : identifiers
SanctionedEntity --> "0..*" SanctionsProgram : programs
All domain objects are immutable Java 21 records.
Index Architecture¶
flowchart TD
subgraph "EntityIndex (interface)"
IM["InMemoryEntityIndex<br/><small>ConcurrentHashMap</small>"]
JPA["JpaEntityIndex<br/><small>PostgreSQL + Hibernate</small>"]
end
subgraph "Acceleration Structures"
NC["NormalizedNameCache<br/><small>Pre-normalized names</small>"]
NG["NgramIndex<br/><small>Trigram inverted index</small>"]
end
IM --> NC
IM --> NG
JPA --> NC
JPA --> NG
The EntityIndex interface is implemented by:
InMemoryEntityIndex — default, uses
ConcurrentHashMapwith secondary index byListSourceJpaEntityIndex — PostgreSQL persistence with Hibernate, activated via the
postgresSpring profile
Both implementations work with the same acceleration structures
(NormalizedNameCache and NgramIndex) for matching.