SDK replay notebook¶

fabric/notebooks/FDA_SDK_Replay.py is optional. Live M365 calls do not expose chain-of-thought; to approximate the reasoning leg of the triple, this notebook re-asks sampled questions through the Fabric Data Agent SDK, which returns run-steps including the generated queries and errors.

It's a reconstruction, not the original turn

The SDK replay re-runs questions on the FDA side. The reasoning/steps it captures are what the agent does now, for that question — not a recording of the original M365 interaction. Use it for side-by-side comparison and tuning insight, not as the production record. The SDK is in preview and only runs inside Fabric.

What it produces¶

Results land in Raw_SdkRuns (see data model), so reconstructed reasoning sits alongside the production triple. A side-by-side comparison query is provided in Analyst queries → SDK replay vs production.

Flow¶

flowchart LR
    eh[(FdaInteractions)] -->|sample recent questions| q[questions]
    q --> sdk[FabricOpenAIClient<br/>ask + get_run_details]
    sdk --> rows[RunId · Answer · GeneratedQueries · Steps · Status]
    rows --> raw[(Raw_SdkRuns)]
    rows -.-> lake[lakehouse Files/<br/>fda_sdk_replay.json]

Cell-by-cell¶

Cell	Responsibility
install	`%pip install -U fabric-data-agent-sdk`
`[parameters]`	`DATA_AGENT_NAME`, `SAMPLE_FROM`, `EH_QUERY_URI`, `EH_DATABASE`, `SAMPLE_SIZE`
`[sample-questions]`	Reads recent distinct questions from `FdaInteractions` using the notebook user's interactive token
`[replay]`	For each question: `fda.ask(question)`, then `fda.get_run_details(question)`; pulls steps and extracts any step whose serialized form contains `EVALUATE` into `GeneratedQueries`
`[persist]`	Inline-ingests the batch into `Raw_SdkRuns`; also writes a local `fda_sdk_replay.json` to the lakehouse for offline diffing

Auth difference vs the collector¶

Unlike the collector (service principal), the replay notebook reads/writes with the notebook user's interactive identity (notebookutils.credentials.getToken). Replay batches are small, so inline management ingestion is simplest and avoids configuring a separate ingest endpoint.

When to use it¶

Investigating why a particular question pattern fails or produces slow DAX — the steps show the agent's tool calls and generated queries.
Validating tuning changes (new example queries, model edits) by replaying a known-bad question set and comparing SdkGeneratedQueries against ProdExecutedDax.
Building a labelled regression set of questions to re-run after each FDA change.