SDK replay notebook¶
fabric/notebooks/FDA_SDK_Replay.py
is optional. Live M365 calls do not expose chain-of-thought; to approximate the reasoning leg of the triple,
this notebook re-asks sampled questions through the Fabric Data Agent SDK, which returns run-steps including the
generated queries and errors.
It's a reconstruction, not the original turn
The SDK replay re-runs questions on the FDA side. The reasoning/steps it captures are what the agent does now, for that question — not a recording of the original M365 interaction. Use it for side-by-side comparison and tuning insight, not as the production record. The SDK is in preview and only runs inside Fabric.
What it produces¶
Results land in Raw_SdkRuns (see data model), so
reconstructed reasoning sits alongside the production triple. A side-by-side comparison query is provided in
Analyst queries → SDK replay vs production.
Flow¶
flowchart LR
eh[(FdaInteractions)] -->|sample recent questions| q[questions]
q --> sdk[FabricOpenAIClient<br/>ask + get_run_details]
sdk --> rows[RunId · Answer · GeneratedQueries · Steps · Status]
rows --> raw[(Raw_SdkRuns)]
rows -.-> lake[lakehouse Files/<br/>fda_sdk_replay.json]
Cell-by-cell¶
| Cell | Responsibility |
|---|---|
| install | %pip install -U fabric-data-agent-sdk |
[parameters] |
DATA_AGENT_NAME, SAMPLE_FROM, EH_QUERY_URI, EH_DATABASE, SAMPLE_SIZE |
[sample-questions] |
Reads recent distinct questions from FdaInteractions using the notebook user's interactive token |
[replay] |
For each question: fda.ask(question), then fda.get_run_details(question); pulls steps and extracts any step whose serialized form contains EVALUATE into GeneratedQueries |
[persist] |
Inline-ingests the batch into Raw_SdkRuns; also writes a local fda_sdk_replay.json to the lakehouse for offline diffing |
Auth difference vs the collector¶
Unlike the collector (service principal), the replay notebook reads/writes with the notebook
user's interactive identity (notebookutils.credentials.getToken). Replay batches are small, so inline management
ingestion is simplest and avoids configuring a separate ingest endpoint.
When to use it¶
- Investigating why a particular question pattern fails or produces slow DAX — the steps show the agent's tool calls and generated queries.
- Validating tuning changes (new example queries, model edits) by replaying a known-bad question set and comparing
SdkGeneratedQueriesagainstProdExecutedDax. - Building a labelled regression set of questions to re-run after each FDA change.