Jeongmil Item Extraction – Precesion Questionnaire Extractor

Precisely extracts questionnaire items/options from messy Excel sheets by auto-normalizing merged cells into a clean JSON structure

Overview
surveyquestionnaireexceldata-preprocessingnormalizationmerged-cellsschemaresearch-assistant
jeongmil-munhangprecision-questionnaire-extractorsurvey-qna-extractor
Key functions
  • Automatically detects merged cells and fills values losslessly (horizontal/vertical/block merges)
  • Detects header zones (question code/text/options) and identifies the main data block
  • Infers table orientation (row_questions vs column_questions) and re-interprets when needed
  • Normalizes tables using forward fill to separate question/option fields
  • Builds question, sub-question (parent_question_id, item_id), and options[] structures
  • Applies rule-based patterns to represent skip/branching logic (routing/conditions)
  • Detects missing/extra questions by comparing expected vs extracted code lists
  • Flags sensitive/avoid expressions and assigns review flags (needs_review, etc.)
  • Outputs results in a survey JSON template (metadata/questions/extraction_report)
Technical details
_id
g-6890d378ceb08191ab29b9408453be71
gpt_id
g-6890d378ceb08191ab29b9408453be71
viz1
public
viz2
show_url
language
en
Other fields
additional_features
["Pattern library for conditions/branching rules to structure routing logic", "Terminology and unit/scale dictionaries to provide standardization hints", "Schema-aligned output using a survey JSON template (metadata/questions/extraction_report)"]
example_commands
["Extract questions/options from this Excel while handling merged cells automatically, and output survey JSON.", "Assume row_questions first; if it fails, re-interpret as column_questions and normalize again.", "Build expected_question_list from question-code patterns and include a missing/extra question report.", "Preserve routing text in routing.condition_raw, and parse it into routing.parsed_logic when possible."]
gpt_id
g-6890d378ceb08191ab29b9408453be71
ideal_use_cases
["Convert merged-cell, multi-header questionnaire Excel into structured question/option JSON", "Generate a QA report for missing/extra questions compared to the source patterns", "Organize matrix/sub-items into a parent-child hierarchy", "Preserve and structure skip/branching statements for downstream routing design", "Auto-flag potentially sensitive expressions to speed up human review"]
limitations
["Highly complex layouts or image-based (scanned) inputs may increase needs_review flags", "Does not invent questions/options not present in the source; uncertain structures are minimized and flagged", "Branching/condition parsing quality depends on how clearly the source expresses routing rules"]
target_users
["Survey/Research operations staff (design & QA)", "Data preprocessing/coding staff (codebook & variable dictionary)", "Policy/statistics/academic survey analysts"]