Jeongmil Item Extraction – Precesion Questionnaire Extractor

Precisely extracts questionnaire items/options from messy Excel sheets by auto-normalizing merged cells into a clean JSON structure

Overview

URL

https://chatgpt.com/g/g-6890d378ceb08191ab29b9408453be71-jeongmil-munhangcucul-precesion-questionnaire-extractor

Version

v1.0.0

Created

2025-12-14

Updated

2025-12-14

surveyquestionnaireexceldata-preprocessingnormalizationmerged-cellsschemaresearch-assistant

jeongmil-munhangprecision-questionnaire-extractorsurvey-qna-extractor

Key functions

Automatically detects merged cells and fills values losslessly (horizontal/vertical/block merges)
Detects header zones (question code/text/options) and identifies the main data block
Infers table orientation (row_questions vs column_questions) and re-interprets when needed
Normalizes tables using forward fill to separate question/option fields
Builds question, sub-question (parent_question_id, item_id), and options[] structures
Applies rule-based patterns to represent skip/branching logic (routing/conditions)
Detects missing/extra questions by comparing expected vs extracted code lists
Flags sensitive/avoid expressions and assigns review flags (needs_review, etc.)
Outputs results in a survey JSON template (metadata/questions/extraction_report)

Technical details

_id

g-6890d378ceb08191ab29b9408453be71

gpt_id

g-6890d378ceb08191ab29b9408453be71

viz1

public

viz2

show_url

language

Other fields

additional_features

["Pattern library for conditions/branching rules to structure routing logic", "Terminology and unit/scale dictionaries to provide standardization hints", "Schema-aligned output using a survey JSON template (metadata/questions/extraction_report)"]

example_commands

["Extract questions/options from this Excel while handling merged cells automatically, and output survey JSON.", "Assume row_questions first; if it fails, re-interpret as column_questions and normalize again.", "Build expected_question_list from question-code patterns and include a missing/extra question report.", "Preserve routing text in routing.condition_raw, and parse it into routing.parsed_logic when possible."]

gpt_id

g-6890d378ceb08191ab29b9408453be71

ideal_use_cases

["Convert merged-cell, multi-header questionnaire Excel into structured question/option JSON", "Generate a QA report for missing/extra questions compared to the source patterns", "Organize matrix/sub-items into a parent-child hierarchy", "Preserve and structure skip/branching statements for downstream routing design", "Auto-flag potentially sensitive expressions to speed up human review"]

limitations

["Highly complex layouts or image-based (scanned) inputs may increase needs_review flags", "Does not invent questions/options not present in the source; uncertain structures are minimized and flagged", "Branching/condition parsing quality depends on how clearly the source expresses routing rules"]

target_users

["Survey/Research operations staff (design & QA)", "Data preprocessing/coding staff (codebook & variable dictionary)", "Policy/statistics/academic survey analysts"]