CPU Emulator Refactoring Plan Overview Refactor /Users/johandahlin/dev/pcemu/src/cpu.c to reduce code duplication, unify patterns, and improve maintainability. The file is currently 5600+ lines with significant duplication across opcode handlers. Key Problems Identified 1. Memory Access Inconsistency: read_ea32() and write_ea32() have NO segment limit checking, while 16-bit versions have _checked variants that are often ignored 2. Duplicated ModR/M Pattern: Same 5-line decode pattern repeated 74+ times across opcodes 3. Flag Functions Duplication: 8 pairs of separate 8/16-bit and 32-bit flag functions 4. ALU Operation Repetition: ADD/SUB/AND/OR/XOR/ADC/SBB each have 6 opcode variants with nearly identical code (~400 lines duplicated) 5. Register Access: 6 nearly identical switch statements for get/set reg8/16/32 6. Shift/Rotate: No helper functions, logic repeated 6+ times Refactoring Plan Phase 1: Unified Memory Access with Segment Limit Checking Create new helper functions that combine segment limit checking with memory access: // Read with automatic limit checking - returns false on exception static bool read_mem8(machine_t *m, int is_reg, uint32_t ea, uint8_t *out); static bool read_mem16(machine_t *m, int is_reg, uint32_t ea, uint16_t *out); static bool read_mem32(machine_t *m, int is_reg, uint32_t ea, uint32_t *out); // Write with automatic limit checking - returns false on exception static bool write_mem8(machine_t *m, int is_reg, uint32_t ea, uint8_t val); static bool write_mem16(machine_t *m, int is_reg, uint32_t ea, uint16_t val); static bool write_mem32(machine_t *m, int is_reg, uint32_t ea, uint32_t val); Benefits: - Single source of truth for segment limit checking - Impossible to forget limit checks - Fixes existing bugs in CMP 32-bit, OR/AND/XOR 8-bit Phase 2: Unified Flag Update Functions Consolidate flag functions using bitwidth parameter: // Single function handles all sizes static void update_flags_add(cpu_t *cpu, uint64_t result, uint32_t a, uint32_t b, int bits); static void update_flags_sub(cpu_t *cpu, uint64_t result, uint32_t a, uint32_t b, int bits); static void update_flags_adc(cpu_t *cpu, uint64_t result, uint32_t a, uint32_t b, int carry, int bits); static void update_flags_sbb(cpu_t *cpu, uint64_t result, uint32_t a, uint32_t b, int borrow, int bits); static void update_flags_logic(cpu_t *cpu, uint32_t result, int bits); static void update_flags_inc(cpu_t *cpu, uint32_t result, uint32_t val, int bits); static void update_flags_dec(cpu_t *cpu, uint32_t result, uint32_t val, int bits); // New helper for shift/rotate (currently no helper exists) static void update_flags_shift(cpu_t *cpu, uint32_t result, uint32_t original, int shift_count, int bits, int shift_type); Remove separate *32() variants - use 64-bit intermediates to handle all sizes. Phase 3: Operand Access Macros Create macros to reduce boilerplate in opcode handlers: // Decode ModR/M and get operand - common pattern used 74+ times #define DECODE_RM(size) \ uint8_t modrm = fetch_byte(m); \ int is_reg; \ uint32_t ea = decode_modrm(m, modrm, &is_reg, NULL, 3) // Read operand with automatic size handling and limit check #define READ_RM(var, size) \ do { \ if (!read_mem##size(m, is_reg, ea, &var)) return; \ } while(0) // Write operand with automatic size handling and limit check #define WRITE_RM(val, size) \ do { \ if (!write_mem##size(m, is_reg, ea, val)) return; \ } while(0) Phase 4: ALU Operation Templates Create unified ALU operation dispatcher to reduce ADD/SUB/AND/OR/XOR/ADC/SBB duplication: typedef enum { ALU_ADD, ALU_OR, ALU_ADC, ALU_SBB, ALU_AND, ALU_SUB, ALU_XOR, ALU_CMP } alu_op_t; // Single function handles all ALU operations for all sizes static void do_alu_rm_r(machine_t *m, alu_op_t op, int size); // r/m, r static void do_alu_r_rm(machine_t *m, alu_op_t op, int size); // r, r/m static void do_alu_acc_imm(machine_t *m, alu_op_t op, int size); // AL/AX/EAX, imm This reduces ~400 lines of ALU opcode handlers to ~100 lines. Phase 5: Group Opcode Handlers Consolidate extended opcode groups (0x80-0x83, 0xF6-0xF7, 0xFE-0xFF): // Group 1: ADD/OR/ADC/SBB/AND/SUB/XOR/CMP with immediate static void do_group1(machine_t *m, uint8_t opcode, uint8_t modrm); // Group 3: TEST/NOT/NEG/MUL/IMUL/DIV/IDIV static void do_group3(machine_t *m, uint8_t opcode, uint8_t modrm); // Group 4/5: INC/DEC/CALL/JMP/PUSH static void do_group5(machine_t *m, uint8_t opcode, uint8_t modrm); Phase 6: Register Access Consolidation Replace 6 switch-based functions with array-based access: // Register arrays (already exist as unions in cpu_t) static uint8_t* get_reg8_ptr(cpu_t *cpu, int reg); static uint16_t* get_reg16_ptr(cpu_t *cpu, int reg); static uint32_t* get_reg32_ptr(cpu_t *cpu, int reg); Implementation Order 1. Phase 1: Memory access helpers (fixes bugs, foundation for other changes) 2. Phase 2: Flag update consolidation (reduces 17 functions to ~10) 3. Phase 3: Operand macros (enables cleaner opcode handlers) 4. Phase 4: ALU templates (biggest code reduction) 5. Phase 5: Group handlers (consolidates extended opcodes) 6. Phase 6: Register access (minor cleanup) Files to Modify - /Users/johandahlin/dev/pcemu/src/cpu.c - Main refactoring target - /Users/johandahlin/dev/pcemu/src/cpu.h - May need new type definitions Expected Results - Code reduction: ~30-40% fewer lines (5600 → ~3500) - Bug fixes: CMP 32-bit limit check, OR/AND/XOR 8-bit limit check, read_ea32/write_ea32 limit checks - Maintainability: Single source of truth for common patterns - Testability: Easier to unit test individual helper functions Verification 1. Build with make clean && make 2. Run test suite: python3 test_parallel.py 3. Verify pass rate is maintained or improved (currently 98.38%) 4. Check specific opcodes that had limit check bugs: 0x38/0x39 (CMP), 0x08/0x20/0x30 (OR/AND/XOR)