Architecture Overview
This project is a complete localization pipeline built to translate a massive, proprietary Java-based simulation application into Japanese. The software features an enormous amount of highly technical, lore-rich text spanning over 6,700 complex interactive nodes, environmental descriptions, and proprietary Java logic.
Unlike standard web applications, localizing a proprietary legacy Java engine requires reverse-engineering the text pipelines and establishing a robust workflow to translate massive datasets without breaking the application's compilation, UI rendering, or underlying script calls.
Engineering Challenges Solved
Bytecode Engineering & Architectural Pivot
Localizing this application required overcoming a fundamental limitation: the proprietary UI engine used display strings as internal lookup keys, making runtime translation impossible without crashing the application. After discovering this through a systematic reverse-engineering effort, I pivoted from a dynamic JVM agent architecture to a static bytecode rewriting approach using Javassist's constant pool manipulation. This allowed for zero-crash injection of UTF-8 Japanese glyphs natively, solving encoding constraints without requiring access to the original source code.
Agentic AI Translation Pipeline
Blind machine translation completely failed to capture the highly technical and evocative nuance of the simulation data. I architected an Agile-like workflow deploying specialized LLM subagents. These agents read from a strict Translation Memory (TM) database to ensure perfect context-awareness and consistency across massive data chunks. The pipeline utilizes an automated LLM critic loop and complex regex substitution to temporarily hide invariant terms and $variables behind safe placeholders, preserving functional logic during translation.
Strategic Outcomes
- Zero-Crash Localization: Successfully merged thousands of translated nodes back into the original application parameters, rule IDs, and script calls completely untouched, verified via SHA-256 integrity checks.