Skip to content

SDK replay notebook

fabric/notebooks/FDA_SDK_Replay.py is optional. Live M365 calls do not expose chain-of-thought; to approximate the reasoning leg of the triple, this notebook re-asks sampled questions through the Fabric Data Agent SDK, which returns run-steps including the generated queries and errors.

It's a reconstruction, not the original turn

The SDK replay re-runs questions on the FDA side. The reasoning/steps it captures are what the agent does now, for that question — not a recording of the original M365 interaction. Use it for side-by-side comparison and tuning insight, not as the production record. The SDK is in preview and only runs inside Fabric.

What it produces

Results land in Raw_SdkRuns (see data model), so reconstructed reasoning sits alongside the production triple. A side-by-side comparison query is provided in Analyst queries → SDK replay vs production.

Flow

flowchart LR
    eh[(FdaInteractions)] -->|sample recent questions| q[questions]
    q --> sdk[FabricOpenAIClient<br/>ask + get_run_details]
    sdk --> rows[RunId · Answer · GeneratedQueries · Steps · Status]
    rows --> raw[(Raw_SdkRuns)]
    rows -.-> lake[lakehouse Files/<br/>fda_sdk_replay.json]

Cell-by-cell

Cell Responsibility
install %pip install -U fabric-data-agent-sdk
[parameters] DATA_AGENT_NAME, SAMPLE_FROM, EH_QUERY_URI, EH_DATABASE, SAMPLE_SIZE
[sample-questions] Reads recent distinct questions from FdaInteractions using the notebook user's interactive token
[replay] For each question: fda.ask(question), then fda.get_run_details(question); pulls steps and extracts any step whose serialized form contains EVALUATE into GeneratedQueries
[persist] Inline-ingests the batch into Raw_SdkRuns; also writes a local fda_sdk_replay.json to the lakehouse for offline diffing

Auth difference vs the collector

Unlike the collector (service principal), the replay notebook reads/writes with the notebook user's interactive identity (notebookutils.credentials.getToken). Replay batches are small, so inline management ingestion is simplest and avoids configuring a separate ingest endpoint.

When to use it

  • Investigating why a particular question pattern fails or produces slow DAX — the steps show the agent's tool calls and generated queries.
  • Validating tuning changes (new example queries, model edits) by replaying a known-bad question set and comparing SdkGeneratedQueries against ProdExecutedDax.
  • Building a labelled regression set of questions to re-run after each FDA change.