← Projects 日本語 →

LEGACY JAVA APPLICATION LOCALIZATION PIPELINE

2026-06-01
Localization Artificial Intelligence Reverse Engineering
Tech Stack:
Java Java
Python Python
Git Git

An agentic AI-driven localization pipeline and bytecode manipulation toolkit for a proprietary Java-based simulation application.

Architecture Overview

This project is a complete localization pipeline built to translate a massive, proprietary Java-based simulation application into Japanese. The software features an enormous amount of highly technical, lore-rich text spanning over 6,700 complex interactive nodes, environmental descriptions, and proprietary Java logic.

Unlike standard web applications, localizing a proprietary legacy Java engine requires reverse-engineering the text pipelines and establishing a robust workflow to translate massive datasets without breaking the application's compilation, UI rendering, or underlying script calls.

Localization Pipeline & Bytecode Injection Architecture

Hover or click components to inspect the offline processing pipeline or runtime execution layers.

OFFLINE LOCALIZATION ENGINE (CLI_ORCHESTRATOR) AST EXTRACTOR Traverses Class AST Regex Placeholders Phase 1 (Extraction) TRANSLATION TM SQLite Cache Fuzzy Match >85% Phase 2 (Preflight) GEMMA TRANSLATOR Contextual Prompts Variable Shielding Phase 3 (Translation) CRITIC VALIDATOR Format Assertions Glossary Check Phase 4 (Critic Loop) PROPRIETARY APPLICATION RUNTIME (DYNAMIC JVM EXECUTION) STATIC OVERRIDES (90% TEXT) Properties & CSV Overrides Dialogue Rules Tables Faction JSON Overrides Loaded natively via resource paths JAVASSIST RUNTIME AGENT (10%) Classloader Bytecode Hook Constant Pool Dynamic Swap Targets Hardcoded String Literals Zero-crash surgical interception SYSTEM UI RENDERER Draws Text on Viewport Output: 100% Japanese TAMPER-FREE STABILITY

INSPECT Localization Pipeline Inspector

Hover or click on any offline pipeline step (AST extraction, Translation Memory, Gemma engine, Critic loop) or runtime execution component (Static data overrides, Javassist agent) to display engineering specifications and live code/data templates here.


Engineering Challenges Solved

Bytecode Engineering & Hybrid Architecture Pivot

Localizing this application required overcoming a fundamental limitation: the proprietary UI engine used display strings as internal lookup keys, making dynamic runtime translation highly unstable.

To resolve this, I pivoted from a pure dynamic JVM agent to a two-layer hybrid delivery architecture:

  1. Static Data Override Layer (90% of content): Standard override bundles for structured CSVs, configuration JSONs, and dialogue tables are injected natively into the application's resource path, requiring no bytecode interception.
  2. Surgical Runtime Agent Layer (10% of content): A lightweight JVM agent utilizing Javassist constant pool manipulation is reserved exclusively to rewrite memory references of hardcoded string literals inside obfuscated classes (such as main menu labels and system warnings), ensuring 100% crash-free runtime substitution.

Modular CLI Tooling & Automated Translation Memory

Manually running multiple isolated cleanup and validation scripts proved unsustainable. I consolidated the entire workflow into a unified command-line orchestrator (app_localizer_cli) executing a deterministic 8-phase pipeline.

To reduce LLM token overhead, the pipeline integrates a local Translation Memory (TM) database in SQLite. Before routing strings to the translation model, the CLI checks the TM for exact and fuzzy matches (>85% similarity). Translated text blocks are automatically processed via an automated LLM critic loop that audits glossary compliance, length limits, and placeholder stability, writing verified results back to the TM database.


Strategic Outcomes

  • Zero-Crash Localization: Successfully merged thousands of translated nodes back into the original application parameters and script calls completely untouched, verified via SHA-256 integrity checks.
  • Performance Stability: Shifting 90% of translations to native static resource overrides eliminated classloader desyncs and memory footprint overhead.

View Original Repository (GitHub)

← Projects