# Bundle formats and unpacking

> Detection order and BundleFormat variants (webpack4/5, browserify, SystemJS, esbuild/Bun, AMD, scope-hoisted); raw vs full unpack; multi-file and directory scan semantics.

- Repository: pionxzh/wakaru
- GitHub: https://github.com/pionxzh/wakaru
- Human docs: https://grok-wiki.com/public/docs/pionxzh-wakaru-77a438a6cc6b
- Complete Markdown: https://grok-wiki.com/public/docs/pionxzh-wakaru-77a438a6cc6b/llms-full.txt

## Source Files

- `docs/architecture.md`
- `crates/core/src/unpacker/mod.rs`
- `crates/core/src/unpacker/webpack4.rs`
- `crates/core/src/unpacker/webpack5.rs`
- `crates/core/src/driver/unpack.rs`
- `crates/cli/src/discovery.rs`

---

---
title: Bundle formats and unpacking
description: Detection order and BundleFormat variants (webpack4/5, browserify, SystemJS, esbuild/Bun, AMD, scope-hoisted); raw vs full unpack; multi-file and directory scan semantics.
---

Wakaru splits bundled JavaScript into per-module source files, then runs the decompile rule pipeline on each module. Unpacking is format-driven: the core unpacker parses the input once, tries detectors in a fixed order, and returns an `UnpackResult` with extracted module code strings. The driver then either emits those strings as-is (`unpack_raw`) or runs the two-phase multi-module decompile pipeline (`unpack`).

## Detection pipeline

Detection starts in `try_unpack_bundle`, which parses the source as an ES module and walks a fixed candidate list. **First match wins** — later detectors never run once a format matches.

```mermaid
flowchart TD
    A[Parse source as ES module] --> B{detect_bundle_candidate<br/>on top-level module}
    B -->|match| Z[Return UnpackResult]
    B -->|no match| C[Collect UMD/AMD unwrap candidates]
    C --> D{detect_bundle_candidate<br/>on each unwrapped module}
    D -->|match| Z
    D -->|no match| E[AMD detector on top-level module]
    E -->|match| Z
    E -->|no match| F[No bundle detected]
```

On the **top-level module**, `detect_bundle_candidate` runs in this order:

1. **webpack5** — IIFE/arrow bootstrap with module factory array or object
2. **webpack5 runtime entry** — standalone runtime IIFE (`require.m`, `require.f`, `require.e`, …) when allowed
3. **webpack4** — `(function(modules){…})([…])` or object-form with `__webpack_require__`
4. **webpack5 chunk** — JSONP chunk push (`webpackChunk…push([[ids], {modules}])`)
5. **browserify** — nested IIFE with module object and entry array
6. **SystemJS** — top-level `System.register(…)` calls
7. **esbuild / Bun** — `__export`, `__commonJS`, or `__esm` helper shapes

If the top-level parse does not match, Wakaru unwraps **UMD and AMD wrapper factories** and retries the same candidate list on inner module bodies (without runtime-entry detection). If that still fails, a dedicated **AMD** pass handles `define(…)`-only modules and plain UMD bodies.

When structural detection returns `None`, the driver can still split the file in **auto mode** using the scope-hoisted heuristic (see below).

## BundleFormat variants

`BundleFormat` is the tag recorded on every successful unpack. It appears in `UnpackOutput.detected_formats` and drives format-specific normalization.

| `BundleFormat` | `as_str()` | Typical input shape | Notes |
|---|---|---|---|
| `Webpack5` | `webpack5` | IIFE bootstrap, runtime entry, or JSONP chunk | Runtime-only files become a single `entry.js` module |
| `Webpack4` | `webpack4` | IIFE with array- or object-form module table | Renames factory params, rewrites `require(id)`, normalizes runtime helpers |
| `Browserify` | `browserify` | `(function e(t,n,r){…})({id:[fn,deps]}, …, [entries])` | Standalone browserify bundles |
| `SystemJs` | `systemjs` | `System.register(name, deps, factory)` | May re-detect nested dynamic-export bundles |
| `Esbuild` | `esbuild` | `__export` / `__commonJS` / `__esm` helpers | Bun emits the same helper shapes; pure ESM without markers falls through |
| `Amd` | `amd` | `define(id, deps, factory)` or UMD-wrapped factories | Runs after wrapper unwrap attempts |
| `ScopeHoisted` | `scope-hoisted` | Large flat top-level with reference-graph clusters | Heuristic split, not a bundler-specific AST matcher |

Each extracted module is an `UnpackedModule` with `id`, `is_entry`, `code`, and `filename`. Filenames come from bundler paths (webpack object keys), numeric indices (`module-N.js`, `entry.js`), or heuristic cluster names.

## Format-specific behavior

### Webpack 5

Three detection paths share the `Webpack5` format tag:

- **Bootstrap bundle** — empty-param IIFE whose body contains the module table and `__webpack_require__`-style runtime.
- **Runtime entry** — IIFE whose body matches webpack5 runtime signatures (`require.m`, `require.f`, `require.e`, `require.u`, `require.t`). The entire source is kept as one `entry.js` module so chunk-loading code stays intact for multi-file unpack.
- **JSONP chunk** — `(self.webpackChunk… = … \|\| []).push([[chunkIds], { numericId: factory, … }])`. Multiple push statements in one file are merged.

For multi-file inputs, `detect_chunk_ids` harvests chunk IDs from each input so numeric `require` references can be rewritten across entry and chunk files.

### Webpack 4

Detects the classic webpack4 IIFE: unary `!function(…)` or plain call, with array-form (`[factory, …]`) or object-form (`{"./src/index.js": factory, …}`) module tables.

Extraction runs `normalize_extracted_webpack_module` on each factory:

- Rename `module` / `exports` / `require` factory parameters
- Rewrite numeric or string `require(id)` to output filenames
- Normalize `require.n` getter access
- Apply `WebpackRuntimeNormalizer` while **preserving** `require.r` / `require.d` ESM markers for the later rule pipeline

### Browserify

Matches the browserify standalone pattern: outer call whose callee is itself a call, first argument is the modules object (`{id: [factory, deps], …}`), third argument is the entry-id array.

### SystemJS

Collects `System.register` calls. Each register becomes a module; dynamic-export bundles may be unpacked again by re-invoking the main detector on reconstructed source.

### Esbuild and Bun

Looks for bundler helper symbols and factory declarations (`var X = __commonJS({ "path"(exports, module) { … } })`). Scope-hoisted ESM is detected via `__export(ns, { name: () => binding })` namespace boundaries.

Pure scope-hoisted ESM from Rollup or Vite **without** `__export` or `__commonJS` markers does not match this detector. Those bundles rely on the scope-hoisted heuristic in auto mode (or single-file decompile).

### AMD

Handles files that are only `define(…)` calls, plus plain UMD module bodies exposed after wrapper peeling.

### Scope-hoisted (heuristic)

`scope_hoist::split_scope_hoisted` is a **reference-graph clusterer**, not a bundler signature matcher. It:

1. Optionally unwraps a top-level IIFE
2. Requires at least 10 top-level declarations
3. Builds a reference graph and union-finds clusters
4. Emits modules only when **two or more** clusters are found

This path sets `allow_cycle_premerge: false` because import cycles among heuristic splits are not safe to pre-merge.

## Auto vs strict detection

CLI unpack mode controls `DecompileOptions.heuristic_split`:

<ParamField body="--unpack / --unpack=auto" type="flag" default>
Structural detectors run first. When they return `None`, scope-hoisted heuristic splitting is attempted. Directory scans also accept heuristic scope-hoisted files.
</ParamField>

<ParamField body="--unpack=strict" type="flag">
Structural detection only. No scope-hoisted fallback. Unrecognized bundles fall back to single-file decompile (explicit file inputs) or are skipped (directory scans).
</ParamField>

Nested re-splitting of already-detected modules is a separate, stricter gate: it requires **both** `heuristic_split` and `RewriteLevel::Aggressive`. When enabled, each extracted module is re-examined with the scope-hoisted clusterer and split further only if resulting import paths resolve.

## Raw vs full unpack

Detection is identical for raw and full unpack — `try_unpack_bundle_raw` delegates to `try_unpack_bundle`. The difference is what happens **after** extraction.

### Full unpack (`unpack` / `unpack_files`)

```
detect → extract modules → two-phase decompile pipeline per module → emit
```

Phase 1 (parallel): parse each module → rules through `UnEsm` → collect cross-module facts.

Phase 2 (parallel): parse again → through-`UnEsm` → cross-module late pass → `UnTemplateLiteral` through `UnReturn` → emit.

When no bundle is detected in auto mode, Wakaru tries scope-hoisted splitting; if that yields multiple modules it unpacks them (with `DceMode::Off` for the heuristic path). Otherwise it falls back to **single-file decompile** and returns one `module.js`.

### Raw unpack (`unpack_raw` / `unpack_files_raw`)

```
detect → extract modules → emit code strings (no rule pipeline)
```

Raw output keeps bundler-specific shapes the decompile pipeline is meant to recover — webpack ESM markers (`require.r`, `require.d`), export getters, and similar. Extractors still apply **bundler-coupled normalization** at the extraction boundary (factory param rename, module ID rewrites, runtime wrapper removal).

For heuristic scope-hoisted splits in raw mode, Wakaru applies a narrow `UnEsm`-only normalization pass so the output is runnable standalone. Normalization failures produce `RawNormalizationFailed` warnings and preserve unparsed code.

If nothing is detected, raw unpack returns the original source as a single `module.js` without running rules.

<AccordionGroup>
<Accordion title="When to use raw vs full">
Use **raw** to inspect extraction boundaries, debug detector output, or compare against `webpack4_unpack_raw` snapshots. Use **full** for readable, idiomatic source — that is the normal `wakaru --unpack` path.
</Accordion>
</AccordionGroup>

## Multi-file unpack

Pass multiple explicit files (entry + chunks) or a scanned directory. `unpack_files` / `unpack_files_raw` process every input independently, then merge.

<Steps>
<Step title="Detect each input">
Each file runs the same detector chain. Non-bundle files in multi-file mode become **fallback modules** (their source is kept under the input basename) rather than being dropped.
</Step>

<Step title="Stabilize the merged module set">
`prepare_multi_source_modules` assigns unique output filenames across inputs and builds a `NumericRewritePlan`. Unambiguous numeric webpack module IDs map to final filenames so `require(529)` in an entry can point at `529.js` from a chunk file. **Duplicate numeric IDs across inputs are left unrewritten** to avoid merging unrelated webpack runtimes from the same directory scan.
</Step>

<Step title="Run the pipeline once">
The combined module set goes through either the two-phase decompile pipeline (full) or raw emission with numeric rewrites (raw). Cross-module facts in full unpack can see imports/exports from every input file.
</Step>
</Steps>

A single-input call delegates to `unpack` / `unpack_raw` and does not run merge preparation.

## Directory scan semantics

When `--unpack` receives a **directory**, the CLI walks it recursively:

- Includes `.js`, `.mjs`, `.cjs`
- Skips hidden files/directories (names starting with `.`)
- Skips `node_modules`
- Sorts discovered paths for stable ordering

Each candidate is filtered through `is_detected_unpack_input`:

- Structural `try_unpack_bundle` match, **or**
- In auto mode, scope-hoisted heuristic that yields more than one module

Non-matching files are **skipped** — they are not decompiled or copied. If a directory scan finds no detected files, the CLI errors with `no bundle or chunk files detected in directory input`.

Explicit **file** inputs do not use detect-only filtering: unrecognized bundles fall back to single-file decompile (auto/full) or raw passthrough.

## Entry points

<CodeGroup>
```bash CLI
# Full unpack with auto detection
wakaru --unpack bundle.js -o out/

# Structural detection only
wakaru --unpack=strict bundle.js -o out/

# Extraction without decompile rules
wakaru --unpack --raw bundle.js -o out/

# Entry + chunk files
wakaru --unpack entry.js chunk.123.js -o out/
```

```rust Core API
use wakaru_core::{unpack, unpack_files, unpack_raw, DecompileOptions, UnpackInput};

let output = unpack(source, DecompileOptions::default())?;
let output = unpack_files(inputs, options)?;
let output = unpack_raw(source, &options)?;
```
</CodeGroup>

<ResponseField name="UnpackOutput" type="object">
<Expandable title="fields">
<ResponseField name="modules" type="Vec<(String, String)>">
Output filename paired with decompiled or raw module source.
</ResponseField>
<ResponseField name="detected_formats" type="Vec<BundleFormat>">
Formats matched across all inputs (deduplicated).
</ResponseField>
<ResponseField name="warnings" type="Vec<UnpackWarning>">
Per-module parse, normalization, or cycle warnings. Consult troubleshooting for `UnpackWarningKind` codes.
</ResponseField>
</Expandable>
</ResponseField>

## Related pages

<CardGroup cols={2}>
<Card title="Unpack bundles" href="/unpack-bundles">
Operational guide for `--unpack` modes, `--raw`, multi-file inputs, directory rules, and `--force`.
</Card>
<Card title="Decompile pipeline" href="/decompile-pipeline">
What happens after extraction: resolver marks, staged rules, and parallel execution.
</Card>
<Card title="Cross-module facts" href="/cross-module-facts">
Two-phase unpack barrier and Phase 2 rules that consume facts from other modules.
</Card>
<Card title="Esbuild and Browserify recipe" href="/esbuild-browserify-recipe">
End-to-end unpacking for `__export` / `__commonJS` bundles and browserify standalone output.
</Card>
<Card title="Webpack bundle recipe" href="/webpack-bundle-recipe">
Entry + chunk workflows for webpack4 and webpack5 test fixtures.
</Card>
<Card title="Core API reference" href="/core-api-reference">
`unpack`, `unpack_files`, `unpack_raw`, and `DecompileOptions` fields.
</Card>
</CardGroup>
