← Projects 日本語 →

LEGACY JAVA APPLICATION LOCALIZATION PIPELINE

2026-06-01
Localization Artificial Intelligence Reverse Engineering
Tech Stack:
Java Java
Python Python
Git Git

An agentic AI-driven localization pipeline and bytecode manipulation toolkit for a proprietary Java-based simulation application.

Architecture Overview

This project is a complete localization pipeline built to translate a massive, proprietary Java-based simulation application into Japanese. The software features an enormous amount of highly technical, lore-rich text spanning over 6,700 complex interactive nodes, environmental descriptions, and proprietary Java logic.

Unlike standard web applications, localizing a proprietary legacy Java engine requires reverse-engineering the text pipelines and establishing a robust workflow to translate massive datasets without breaking the application's compilation, UI rendering, or underlying script calls.


Engineering Challenges Solved

Bytecode Engineering & Architectural Pivot

Localizing this application required overcoming a fundamental limitation: the proprietary UI engine used display strings as internal lookup keys, making runtime translation impossible without crashing the application. After discovering this through a systematic reverse-engineering effort, I pivoted from a dynamic JVM agent architecture to a static bytecode rewriting approach using Javassist's constant pool manipulation. This allowed for zero-crash injection of UTF-8 Japanese glyphs natively, solving encoding constraints without requiring access to the original source code.

Agentic AI Translation Pipeline

Blind machine translation completely failed to capture the highly technical and evocative nuance of the simulation data. I architected an Agile-like workflow deploying specialized LLM subagents. These agents read from a strict Translation Memory (TM) database to ensure perfect context-awareness and consistency across massive data chunks. The pipeline utilizes an automated LLM critic loop and complex regex substitution to temporarily hide invariant terms and $variables behind safe placeholders, preserving functional logic during translation.


Strategic Outcomes

  • Zero-Crash Localization: Successfully merged thousands of translated nodes back into the original application parameters, rule IDs, and script calls completely untouched, verified via SHA-256 integrity checks.

View Original Repository (GitHub)

← Projects