DataPrepCodeGen – SPSS/R Data Preparation Code Generator

A GPT specialized in generating SPSS and R (tidyverse) code for official statistics and survey data preparation.

Overview
Version
v1.0.0
Created
2025-12-11
Updated
2025-12-11
policystatisticsresearch-assistant
dataprepcodegendataprep-gpt
Key functions
  • Generate SPSS data preparation syntax based on survey/statistical metadata and codebooks
  • Translate SPSS preparation syntax into equivalent R (tidyverse) data preparation code (excluding analytical code)
  • Create variable labels, value labels, missing value definitions, and display formats for official statistics datasets
Technical details
_id
dataprepcodegen
gpt_id
dataprepcodegen
viz1
public
viz2
show_url
language
en
Other fields
additional_features
["Interprets metadata schemas (variable roles, semantic types, groups) to draft type conversions, grouped sums/means, and index variables", "Adds concise Korean comments to SPSS and R code to document each preparation step for official statistics workflows"]
example_commands
["Use this survey metadata to generate SPSS VARIABLE LABELS, VALUE LABELS, and MISSING VALUES syntax.", "Convert this SPSS data preparation script into R tidyverse code, excluding any analytical procedures.", "Given these variable roles (id, weight, psu, strata), create a reusable SPSS preparation template.", "From this codebook, draft SPSS FORMATS and any necessary type conversion code for income, age, and date variables.", "Generate SPSS VARSTOCASES syntax to reshape these repeated-measures va
gpt_id
dataprepcodegen
ideal_use_cases
["Automating SPSS variable/value labeling and missing value declarations from questionnaires or metadata files", "Refactoring legacy SPSS preparation scripts into cleaner, documented syntax", "Producing parallel SPSS and R (tidyverse) preparation scripts for the same dataset to support migration"]
limitations
["Does not generate or interpret statistical analysis code (e.g., regression, hypothesis tests); it focuses strictly on data preparation.", "Does not perform file system operations, database connections, or OS-level commands."]
target_users
["Statisticians and data managers working with official statistics and survey microdata", "Practitioners responsible for SPSS-based data cleaning and preparation", "Analysts transitioning from SPSS to R who need parallel preparation scripts"]