Correlation model¶
The collector's job is to turn three independent streams into one curated row per interaction. This page documents exactly how that join works, because the join's confidence is itself a data-quality signal you review.
The authoritative implementation is build_curated() in
FDA_Collector.py; a KQL-native equivalent (BuildFdaInteractionsKql) exists for
spot-checks and validation.
Step 1 — pair prompts to responses (source A)¶
Graph rows arrive as individual messages (InteractionType = userPrompt or aiResponse). Within each
ConversationId, sorted by time, every aiResponse is paired with the most recent preceding userPrompt:
for each conversation:
sort messages by CreatedDateTime
for each aiResponse R:
P = last userPrompt with PromptTime <= R.time
emit interaction(Question=P.body, Answer=R.body, Timestamp=R.time, PromptTime=P.time, User, ThreadId)
This yields a candidate interaction with question + answer + identity + a time bracket [PromptTime, Timestamp].
Step 2 — attach executed DAX (source C) by user + time window¶
For each paired interaction, candidate DAX executions are those where:
ExecutingUsermatches the interaction'sUser(case-insensitive), and- the DAX timestamp falls inside
[PromptTime − window, Timestamp + window].
The window defaults to ±90 seconds (CORR_WINDOW_SEC). All matching executions are kept, sorted by time, and
attached as the ordered DaxQueries array — an FDA turn frequently emits several validation/probe queries before
the final one. The last execution in the window is treated as the primary ExecutedDax.
flowchart LR
P[userPrompt<br/>PromptTime] --> R[aiResponse<br/>Timestamp]
subgraph window["match window [PromptTime − 90s … Timestamp + 90s]"]
d1[DAX exec 1] --> d2[DAX exec 2] --> d3[DAX exec 3<br/>= primary ExecutedDax]
end
R -.->|same ExecutingUser<br/>same model| window
Step 3 — score match confidence¶
MatchConfidence |
Meaning |
|---|---|
| Exact | At least one DAX execution falls strictly inside [PromptTime, Timestamp] — the strongest signal that this DAX belongs to this turn. |
| Windowed | DAX matched only within the ±window padding (before the prompt or after the response). Plausible, but flagged. |
| Unmatched | No DAX execution matched. The interaction has question/answer text but no recovered DAX. |
The review app colour-codes this (Exact green, Windowed amber, Unmatched red) so reviewers see join
uncertainty rather than having it hidden.
Step 4 — keep DAX orphans (no interaction matched)¶
Executed-DAX rows that matched no paired interaction are not discarded. Each becomes its own curated row with:
InteractionId = "dax-" + corr_key(...),MatchConfidence = "Unmatched",Sources = ["monitoring"],- empty
Question/Answer(no text surface saw it).
This guarantees no executed DAX is lost, even when Graph coverage is incomplete or absent. If Graph returns no rows at all, the curated build is monitoring-only: DAX without question/answer text.
The correlation key¶
CorrelationKey is stored on every curated row. It is a deterministic, content-derived id (user + minute bucket +
model) used for de-duplication and as a stable handle for orphan DAX rows.
De-duplication¶
Two layers protect against double-counting when the collector re-scans its trailing LOOKBACK_HOURS window:
- Before append, the collector queries
FdaInteractionsfor theInteractionIds it is about to write and drops any that already exist. - Within
Raw_*reads,arg_max(IngestedAt, *)keeps the newest copy per natural key (the KQL-native builder and analyst queries use this pattern).
Latency and back-fill¶
Audit/Graph records can lag the live interaction by minutes to ~30 min; workspace monitoring is near-real-time.
Because the collector always re-scans a trailing window (LOOKBACK_HOURS, default 48), late-arriving records
are picked up on a subsequent run and de-duplicated against what was already curated.
KQL-native validation¶
BuildFdaInteractionsKql(lookback, windowSec) reproduces this join in pure KQL for validation or as a fallback
when you want to spot-check the notebook's output without re-running it. The notebook remains authoritative; the
function is for cross-checking. See KQL functions.