additional_features
["Managed ruleset fields (rule_id, priority, status) for governance and review", "Enforces the prescribed rule-generation order for consistency"]
example_commands
["My goal is 'factors influencing policy satisfaction' and the method is logistic regression. Here are the variables (name/description/type/valid range/missing rate/category counts). Create a draft preprocessing ruleset in JSON and CSV.", "Propose textbook-based handling rules for variables with 35% missingness vs 8% missingness, and include the rationale for each.", "Create a rule to merge categorical levels below 2% frequency and include the rationale for preventing perfect separation in logis
gpt_id
g-692c2dd44cdc8191b5b728e93e559980
ideal_use_cases
["Writing an auditable, reproducible preprocessing plan for survey/administrative data", "Choosing missing-data and outlier-handling rules aligned with the intended method (regression, logistic, PCA, clustering, etc.)", "Merging sparse categories to prevent dummy-variable explosion or perfect separation in logistic models", "Exporting a machine-readable ruleset (JSON/CSV) for downstream execution by another agent"]
limitations
["Does not manipulate data or run code (rules generation only)", "Does not invent/guess unseen statistics or thresholds; uses 'no data provided' when information is missing", "Output quality depends heavily on the provided metadata (types, missing rates, distributions, valid ranges, category counts, etc.)"]
target_users
["Public-policy and government researchers", "Data analysts / statisticians working with survey or administrative data", "Users who need a ruleset to hand off to an execution agent (e.g., CleaningGPT)"]