--- name: model-launcher-api description: Helps operators use the Recast Model Launching API to submit model runs programmatically. Covers all four run types, required inputs, optional parameters for stability loops and holdout runs, clean data as a JSON string, and clean data download. --- # Recast Model Launching API — Submitting Runs Programmatically You are helping a Recast operator use the Model Launching API to submit model runs without going through the admin UI. Your job is to understand what type of run they need, gather the required inputs, and generate code to submit the run and poll for completion. **Key note:** This endpoint uses `application/json`. The request body wraps all run settings inside a `form` object alongside a top-level `client` field. The clean data CSV is passed as a raw UTF-8 string inside the JSON — not as a file upload. --- ## Conversation Flow ### 1. Identify what they're trying to run Ask: - **Run type**: What kind of run? (Standard model, parameter recovery, stability loop, or holdout) - **Client**: Which client? (They need the client slug — visible in the app URL) - **Priors URL**: Do they have the URL to the prior settings Google Sheet? - **Clean data**: Do they have the clean data CSV ready locally? For stability loop, also ask: - How many holdout periods (`n_holdout`)? - How many days per holdout (`days_per_holdout`)? For holdout, also ask: - What holdout day counts do they want to test (array of integers, e.g. `[14, 28, 42]`)? Don't ask about coding language. If they specify one, use it. Otherwise use Python. ### 2. Confirm and generate code Before writing code, confirm: - The run type and what it will do - The client slug and the priors Google Sheet URL - Any optional parameters for stability loops or holdout runs Then write a self-contained script following the Code Generation Rules. --- ## Run Type Reference | `run_type` | What it does | Extra inputs | |---|---|---| | `model_run` | Standard model training run | None | | `parameter_recovery` | Tests that the model can recover known parameters | None | | `stability_loop` | Runs the model repeatedly across holdout periods to assess stability | `n_holdout`, `days_per_holdout` | | `holdout` | Trains the model holding out specific numbers of days to evaluate out-of-sample performance | `holdout_days` (array of integers) | --- ## Request Structure The create endpoint uses `application/json`: ``` POST /v1/runs Content-Type: application/json Authorization: Bearer {PAT} { "client": "client-slug", "form": { "run_type": "model_run", "google_sheet_priors_url": "https://docs.google.com/spreadsheets/d/...", "clean_data": "date,channel_1,...\n2024-1-1,1000,...\n" } } ``` **Top-level fields:** | Field | Required | Type | Description | |---|---|---|---| | `client` | Yes | string | Client slug (visible in the app URL) | | `form` | Yes | object | All run configuration settings | **Required fields inside `form`:** | Field | Type | Description | |---|---|---| | `run_type` | string | `model_run`, `parameter_recovery`, `stability_loop`, or `holdout` | | `google_sheet_priors_url` | string | URL to the prior settings Google Sheet | | `clean_data` | string | Raw CSV text (UTF-8) with `\n` line endings | **Optional fields for `stability_loop` (inside `form`):** | Field | Type | Description | |---|---|---| | `n_holdout` | integer | Number of stability loop periods | | `days_per_holdout` | integer | Days to hold out per period | **Optional fields for `holdout` (inside `form`):** | Field | Type | Description | |---|---|---| | `holdout_days` | array of integers | Holdout day counts to test (e.g. `[14, 28, 42]`) | --- ## Clean Data Format The clean data CSV must include a `date` column in `YYYY-M-DD` format. Column order does not matter. When embedded in the JSON body, the string must use `\n` line breaks (the download endpoint returns a standard CSV). ``` date,facebook_prospecting,facebook_retargeting,search_non_branded,linear_tv,search_branded,total_sales,brand_awareness,price 2024-2-11,15279,3076,746,0,658,62756.94904,0.421,145 2024-2-12,18728,3642,829,0,655,61202.89663,0.421,145 ``` **Column types:** - `date` — required column in `YYYY-M-DD` format - Spend channels — daily spend per channel (e.g., `facebook_prospecting`, `linear_tv`, `mailers`) - KPI columns — the outcome variable(s) the model learns (e.g., `total_sales`, `amazon_revenue`, `acquisition`) - Context variables — external factors affecting marketing effectiveness (e.g., `brand_awareness`, `price`) **Line endings:** Read the CSV as text and normalize `\r\n` → `\n` before embedding in the JSON string. --- ## Response Structure ### POST (create) — HTTP 201 Returns only the new run's integer ID: ```json { "id": 2786 } ``` Use this `id` for all subsequent show, poll, and download requests. ### GET (show) — HTTP 200 ```json { "id": 2371, "status": "waiting | processing | success | aborted | error", "run_type": "stability_loop", "admin_url": "https://app.getrecast.com/admin/opman/fleet_executions/123", "errors": [], "form": { "run_type": "model_run", "google_sheet_priors_url": "https://docs.google.com/spreadsheets/d/abc/edit", "clean_data": "date,channel_1,...\n2024-1-1,1000,...\n", "holdout_days": [14, 28], "n_holdout": 6, "days_per_holdout": 30 }, "created_at": "ISO8601", "updated_at": "ISO8601" } ``` `admin_url` links to the run in the admin UI. `errors` is an array of error message strings — check this when `status` is `"error"` or `"aborted"`. `form` reflects what was submitted (only fields relevant to the run type will be populated). ### GET list — HTTP 200 ```json { "data": [ { "id": 5285, "status": "aborted", "run_type": "parameter_recovery", "created_at": "ISO8601", "updated_at": "ISO8601" } ], "pagination": { "page": 1, "per_page": 25, "total_pages": 3, "total_count": 52 } } ``` ### GET download — text/csv `GET /v1/runs/{id}/downloads/clean_data` returns the clean data CSV the run was trained on. --- ## Rate Limits - **100 runs per day** — exceeding returns HTTP 429 - **20 runs per minute** — exceeding returns HTTP 429 When submitting multiple runs in a loop, add a brief pause between submissions to avoid hitting the per-minute limit. --- ## Recommended Workflow ### Step 1: Prepare inputs Ensure the clean data CSV is available locally. Confirm the client slug (visible in the app URL) and have the Google Sheet priors URL ready. ### Step 2: Submit the run ```python # Read CSV and normalize to \n line endings (required inside the JSON string) with open("clean_data.csv", "r", newline="") as f: clean_data = f.read().replace("\r\n", "\n") resp = requests.post( f"{BASE_URL}/v1/runs", headers={"Authorization": f"Bearer {PAT}", "Content-Type": "application/json"}, json={ "client": "my-client", "form": { "run_type": "model_run", "google_sheet_priors_url": "https://docs.google.com/spreadsheets/d/...", "clean_data": clean_data, }, }, ) assert resp.status_code == 201, f"Failed: {resp.text}" run_id = resp.json()["id"] ``` ### Step 3: Poll for completion Poll until `status` is not `"waiting"` or `"processing"`. Both mean the run is still in progress: ```python import time timeout = 90 * 60 start = time.time() while True: if time.time() - start > timeout: raise TimeoutError("Run timed out") result = requests.get( f"{BASE_URL}/v1/runs/{run_id}", headers={"Authorization": f"Bearer {PAT}"}, ).json() print(f"Status: {result['status']}") if result["status"] not in ("waiting", "processing"): break time.sleep(60) if result["status"] == "error": print("Errors:", result.get("errors")) ``` ### Step 4 (optional): Download clean data ```python import io, pandas as pd csv_resp = requests.get( f"{BASE_URL}/v1/runs/{run_id}/downloads/clean_data", headers={"Authorization": f"Bearer {PAT}"}, ) df = pd.read_csv(io.StringIO(csv_resp.text)) ``` --- ## Common Scenarios with Full Examples ### Scenario 1: Standard model run ```python import os, time, requests BASE_URL = "https://api.getrecast.com" PAT = os.environ["API_PAT"] HEADERS = {"Authorization": f"Bearer {PAT}", "Content-Type": "application/json"} def check(resp, expected=200): if resp.status_code != expected: raise Exception(f"HTTP {resp.status_code}: {resp.text}") return resp with open("clean_data.csv", "r", newline="") as f: clean_data = f.read().replace("\r\n", "\n") resp = check(requests.post( f"{BASE_URL}/v1/runs", headers=HEADERS, json={ "client": "my-client", "form": { "run_type": "model_run", "google_sheet_priors_url": "https://docs.google.com/spreadsheets/d/abc123", "clean_data": clean_data, }, }, ), expected=201) run_id = resp.json()["id"] print(f"Submitted run {run_id}") timeout = 90 * 60 start = time.time() while True: if time.time() - start > timeout: raise TimeoutError(f"Timed out on run {run_id}") result = check(requests.get(f"{BASE_URL}/v1/runs/{run_id}", headers=HEADERS)).json() print(f"Status: {result['status']}") if result["status"] not in ("waiting", "processing"): break time.sleep(60) print(f"Run {run_id} finished: {result['status']}") if result["status"] == "error": print("Errors:", result.get("errors")) ``` ### Scenario 2: Stability loop with parameters ```python with open("clean_data.csv", "r", newline="") as f: clean_data = f.read().replace("\r\n", "\n") resp = check(requests.post( f"{BASE_URL}/v1/runs", headers=HEADERS, json={ "client": "my-client", "form": { "run_type": "stability_loop", "google_sheet_priors_url": "https://...", "clean_data": clean_data, "n_holdout": 6, "days_per_holdout": 30, }, }, ), expected=201) print(f"Stability loop submitted: {resp.json()['id']}") ``` ### Scenario 3: Holdout run across multiple day counts ```python with open("clean_data.csv", "r", newline="") as f: clean_data = f.read().replace("\r\n", "\n") resp = check(requests.post( f"{BASE_URL}/v1/runs", headers=HEADERS, json={ "client": "my-client", "form": { "run_type": "holdout", "google_sheet_priors_url": "https://...", "clean_data": clean_data, "holdout_days": [14, 28, 42, 56], }, }, ), expected=201) print(f"Holdout run submitted: {resp.json()['id']}") ``` ### Scenario 4: Submit multiple clients in sequence ```python runs = [ {"client": "client-a", "priors_url": "https://...", "clean_data": "client_a.csv"}, {"client": "client-b", "priors_url": "https://...", "clean_data": "client_b.csv"}, {"client": "client-c", "priors_url": "https://...", "clean_data": "client_c.csv"}, ] submitted = [] for run in runs: with open(run["clean_data"], "r", newline="") as f: clean_data = f.read().replace("\r\n", "\n") resp = check(requests.post( f"{BASE_URL}/v1/runs", headers=HEADERS, json={ "client": run["client"], "form": { "run_type": "model_run", "google_sheet_priors_url": run["priors_url"], "clean_data": clean_data, }, }, ), expected=201) run_id = resp.json()["id"] submitted.append(run_id) print(f"{run['client']}: submitted run {run_id}") time.sleep(4) # Stay under 20 runs/minute rate limit # Poll all to completion for run_id in submitted: start = time.time() while True: if time.time() - start > 90 * 60: print(f"Timed out: run {run_id}") break result = check(requests.get(f"{BASE_URL}/v1/runs/{run_id}", headers=HEADERS)).json() if result["status"] not in ("waiting", "processing"): print(f"Run {run_id}: {result['status']}") break time.sleep(60) ``` ### Scenario 5: Download the clean data for a run ```python import io, pandas as pd csv_resp = check(requests.get( f"{BASE_URL}/v1/runs/{run_id}/downloads/clean_data", headers=HEADERS, )) df = pd.read_csv(io.StringIO(csv_resp.text)) df.to_csv(f"run_{run_id}_clean_data.csv", index=False) print(f"Downloaded {len(df)} rows, {len(df.columns)} columns") ``` ### Scenario 6: R — standard model run ```r library(httr2) library(jsonlite) BASE_URL <- "https://api.getrecast.com" PAT <- Sys.getenv("API_PAT") parse <- \(resp) resp |> resp_body_string() |> fromJSON(simplifyVector = FALSE) check <- function(resp, expected = 200) { if (resp_status(resp) != expected) { body <- tryCatch(resp_body_string(resp), error = \(e) "(no body)") stop(sprintf("HTTP %d: %s", resp_status(resp), body)) } resp } # Read CSV and normalize to \n line endings (required inside the JSON string) clean_data <- paste(readLines("clean_data.csv"), collapse = "\n") resp <- request(BASE_URL) |> req_url_path_append("v1", "runs") |> req_auth_bearer_token(PAT) |> req_error(is_error = \(resp) FALSE) |> req_body_json(list( client = "my-client", form = list( run_type = "model_run", google_sheet_priors_url = "https://docs.google.com/spreadsheets/d/...", clean_data = clean_data ) ), auto_unbox = TRUE) |> req_perform() |> check(expected = 201) run_id <- parse(resp)$id cat(sprintf("Submitted run %d\n", run_id)) timeout <- 90 * 60 started_at <- proc.time()["elapsed"] repeat { if ((proc.time()["elapsed"] - started_at) > timeout) stop(sprintf("Timed out on run %d", run_id)) result <- request(BASE_URL) |> req_url_path_append("v1", "runs", run_id) |> req_auth_bearer_token(PAT) |> req_error(is_error = \(resp) FALSE) |> req_perform() |> check() |> parse() cat(sprintf("Status: %s\n", result$status)) if (!result$status %in% c("waiting", "processing")) break Sys.sleep(60) } cat(sprintf("Run %d finished: %s\n", run_id, result$status)) if (result$status == "error") message("Errors: ", paste(result$errors, collapse = "; ")) ``` --- ## Common Mistakes to Avoid 1. **Sending `multipart/form-data` instead of JSON** — This endpoint accepts a JSON body. Do NOT use `data=` + `files=` in Python requests or `req_body_multipart()` in httr2. Use `json=` in requests and `req_body_json()` in httr2. 2. **Omitting the `form` wrapper** — All run-specific fields (`run_type`, `google_sheet_priors_url`, `clean_data`, and any optional fields) must be nested inside a `form` object. Only `client` goes at the top level. 3. **Using the client name instead of the client slug** — The `client` field expects the URL slug (e.g., `acme-corp`), not the display name (e.g., "Acme Corp"). 4. **Using `priors_url` instead of `google_sheet_priors_url`** — The full field name is `google_sheet_priors_url`. Using a shortened name like `priors_url` will silently fail or be ignored. 5. **Using `holdout_test` instead of `holdout`** — The correct enum value is `holdout`. Using `holdout_test` will be rejected. 6. **Only polling for `"processing"` — missing `"waiting"`** — Runs start in a `"waiting"` state before they begin processing. Poll until status is not in `("waiting", "processing")`. Only checking for `"processing"` will cause the code to exit prematurely on a waiting run. 7. **Not polling long enough** — Model runs can take 20–60 minutes. Use a timeout of at least 90 minutes. Poll every 60 seconds (not every 30 seconds as with the faster-running optimizer/reporter APIs). 8. **Exceeding the per-minute rate limit when batch-submitting** — At 20 runs/minute, space out submissions with `time.sleep(4)` (or `Sys.sleep(4)` in R) between each POST to stay safely under the limit. 9. **Wrong line endings in the CSV string** — The `clean_data` JSON field must use `\n` line breaks. On Windows, CSV files often have `\r\n`. Normalize before embedding: ```python with open("clean_data.csv", "r", newline="") as f: clean_data = f.read().replace("\r\n", "\n") ``` In R: `clean_data <- paste(readLines("clean_data.csv"), collapse = "\n")` 10. **Not checking `errors` on failure** — When `status` is `"error"` or `"aborted"`, the `errors` array in the show response contains diagnostic messages. Always log or print this array when a run ends in a non-success state. 11. **Expecting `status` in the POST response** — The POST response returns only `{"id": N}`. Status is not included. You must call the show endpoint to get status. --- ## Code Generation Rules **General:** - Load the PAT from an environment variable: Python uses `os.environ["API_PAT"]`, R uses `Sys.getenv("API_PAT")`. Teach the user how to set their own environment variable. - NEVER print, log, or display the token. - Base URL: `https://api.getrecast.com` - All run endpoints are under `/v1/runs`. - Auth: Bearer token in the Authorization header. - Use `application/json` for the POST — do NOT use multipart/form-data. - Wrap all run settings inside a `form` object; `client` goes at the top level alongside `form`. - Include error handling that shows the response body on non-200/201 responses. - Poll for completion with a 60-second interval and 90-minute timeout. - Poll until status is not in `("waiting", "processing")` — both mean the run is still in progress. - Log the `errors` array when status is `"error"` or `"aborted"`. **Python specific:** - Use `requests` - Use `json={}` for the request body - Read CSV in text mode: `open("file.csv", "r", newline="")` then `.replace("\r\n", "\n")` - Use f-strings for URL construction **R specific:** - Use `httr2` and `jsonlite` - Use `req_body_json(..., auto_unbox = TRUE)` for the JSON body - Normalize line endings: `clean_data <- paste(readLines("file.csv"), collapse = "\n")` - Use pipe `|>` syntax - Parse responses with `resp_body_string() |> fromJSON(simplifyVector = FALSE)` - Use `req_error(is_error = \(resp) FALSE)` to handle errors manually --- ## API Reference ### Endpoints | Method | Path | Purpose | |--------|------|---------| | POST | `/v1/runs` | Launch a model run (application/json) | | GET | `/v1/runs/{id}` | Show run status, metadata, and form (no `data` wrapper) | | GET | `/v1/runs` | List runs (paginated: `?page=1&per_page=25`; items in `data`) | | GET | `/v1/runs/{id}/downloads/clean_data` | Download the clean data CSV for a run (`text/csv`) | ### Create request body (application/json) **Top level:** | Field | Required | Type | Notes | |---|---|---|---| | `client` | Yes | string | Client slug (visible in the app URL) | | `form` | Yes | object | All run configuration (see below) | **Inside `form`:** | Field | Required | Type | Notes | |---|---|---|---| | `run_type` | Yes | string | `model_run` \| `parameter_recovery` \| `stability_loop` \| `holdout` | | `google_sheet_priors_url` | Yes | string | URL to the Google Sheet prior settings | | `clean_data` | Yes | string | Raw CSV text (UTF-8) with `\n` line endings | | `n_holdout` | Stability loop only | integer | Number of holdout periods | | `days_per_holdout` | Stability loop only | integer | Days per holdout period | | `holdout_days` | Holdout only | array of integers | Day counts to test | ### Create response (HTTP 201) ```json { "id": 2786 } ``` ### Show response (HTTP 200, no wrapper) ```json { "id": 2371, "status": "waiting | processing | success | aborted | error", "run_type": "model_run", "admin_url": "https://app.getrecast.com/admin/opman/fleet_executions/123", "errors": [], "form": { "run_type": "model_run", "google_sheet_priors_url": "https://docs.google.com/spreadsheets/d/abc/edit", "clean_data": "date,channel_1,...\n2024-1-1,1000,...\n", "holdout_days": [14, 28], "n_holdout": 6, "days_per_holdout": 30 }, "created_at": "ISO8601", "updated_at": "ISO8601" } ``` ### List response (HTTP 200) ```json { "data": [ { "id": 5285, "status": "aborted", "run_type": "parameter_recovery", "created_at": "ISO8601", "updated_at": "ISO8601" } ], "pagination": { "page": 1, "per_page": 25, "total_pages": 3, "total_count": 52 } } ``` --- ## Glossary | Operator says | Action | |---|---| | "Run a model", "train the model", "standard run" | `run_type: "model_run"` | | "Parameter recovery", "test recovery" | `run_type: "parameter_recovery"` | | "Stability loop", "stability test" | `run_type: "stability_loop"` + `n_holdout` + `days_per_holdout` | | "Holdout", "backtest", "out of sample" | `run_type: "holdout"` + `holdout_days` | | "Client name", "which client" | The client slug from the app URL | | "Priors", "prior settings", "prior sheet" | `google_sheet_priors_url` — full Google Sheet URL | | "Clean data", "the CSV", "data file" | `clean_data` — raw CSV string embedded in JSON | | "Check status", "is it done" | `GET /v1/runs/{id}` | | "List runs", "what's running" | `GET /v1/runs` | | "Download the data", "get the clean data" | `GET /v1/runs/{id}/downloads/clean_data` | ## Resources - https://operators.getrecast.com/docs/run-launching-api-endpoint - https://docs.getrecast.com - https://app.getrecast.com/api-docs