OpenDataLoader LogoOpenDataLoader

Development Workflow

Build from source, run tests, and contribute to OpenDataLoader PDF. Prerequisites for Windows, macOS, and Linux.

This guide covers building from source, running tests, and contributing changes to OpenDataLoader PDF.

Prerequisites

Before you begin, ensure you have the following installed:

ToolVersionPurpose
Java11+Core engine
Maven3.8+Java build system
Python3.10+Python bindings
uvLatestPython package management
Node.js20+Node.js bindings
pnpmLatestNode.js package management

Verify your setup:

java -version
mvn --version
python --version
uv --version
node --version
pnpm --version

OS-Specific Install Commands

ToolmacOS (Homebrew)Ubuntu/DebianWindows
Java 17brew install --cask temurinsudo apt install openjdk-17-jdkAdoptium installer
Mavenbrew install mavensudo apt install mavenDownload or use WSL
uvbrew install uvcurl -LsSf https://astral.sh/uv/install.sh | shpowershell -c "irm https://astral.sh/uv/install.ps1 | iex"
pnpmbrew install pnpmnpm install -g pnpmnpm install -g pnpm

Windows users: We recommend WSL 2 for the smoothest development experience. All shell scripts (./scripts/*.sh) assume a Unix-like environment.

Git LFS

Some test fixtures are stored with Git LFS. Install it before cloning:

# macOS
brew install git-lfs

# Ubuntu/Debian
sudo apt install git-lfs

# Then initialize
git lfs install

Build & Test

Quick Start (Local Development)

Run tests for each package independently:

# Java tests
./scripts/test-java.sh

# Python tests
./scripts/test-python.sh

# Node.js tests
./scripts/test-node.sh

Full CI Build

Build all packages (Java, Python, Node.js) in one command:

./scripts/build-all.sh

Build Java Only

mvn clean install -f java/pom.xml

Successful builds produce artifacts under java/opendataloader-pdf-cli/target, including the shaded CLI JAR.

Run the CLI from Source

After building, run the CLI directly:

java -jar java/opendataloader-pdf-cli/target/opendataloader-pdf-cli-<VERSION>.jar [options] <INPUT>

Refer to the CLI Options Reference for the full flag list.

Code Generation

Warning: After changing CLI options in Java, you must run npm run sync. This regenerates options.json and all Python/Node.js bindings. Forgetting this silently breaks the wrappers.

CLI options and JSON schema documentation are auto-generated from source files. This ensures consistency across all language bindings.

Note: Reference documentation MDX files (CLI options, JSON schema, convert options) are generated by CI at release time and pushed to the opendataloader.org repository. They are not tracked in this repo. Manual documentation also lives in opendataloader.org.

Auto-Generated Files (Do Not Edit)

The following files are generated by npm run sync — edit the Java source instead:

  • options.json
  • node/opendataloader-pdf/src/cli-options.generated.ts
  • node/opendataloader-pdf/src/convert-options.generated.ts
  • python/opendataloader-pdf/src/opendataloader_pdf/cli_options_generated.py
  • python/opendataloader-pdf/src/opendataloader_pdf/convert_generated.py

Available Commands

CommandDescription
npm run syncFull sync: export options from Java + generate all docs
npm run sync-optionsExport options from Java + generate option docs
npm run sync-schemaGenerate schema docs
npm run generate-optionsGenerate option docs only (without Java export)
npm run generate-schemaGenerate schema docs only

After Modifying Java CLI Options

npm run sync-options

This exports options from Java and generates:

Generated FilePurpose
options.jsonCLI options source of truth
node/opendataloader-pdf/src/cli-options.generated.tsNode.js CLI options
node/opendataloader-pdf/src/convert-options.generated.tsNode.js convert options
python/opendataloader-pdf/src/opendataloader_pdf/cli_options_generated.pyPython CLI options
python/opendataloader-pdf/src/opendataloader_pdf/convert_generated.pyPython convert options

After Modifying JSON Schema

Edit schema.json directly, then:

npm run generate-schema

This generates:

Generated FilePurpose
public/schema.jsonPublic schema for web access

Full Sync

To regenerate everything (options + schema):

npm run sync

Project Structure

opendataloader-pdf/
├── java/                          # Core Java engine
│   ├── opendataloader-pdf-core/   # Main library
│   └── opendataloader-pdf-cli/    # CLI application
├── python/                        # Python package
├── node/                          # Node.js package
└── scripts/                       # Build & test scripts

Code Style

  • Java: Follow existing patterns in the codebase
  • Python: PEP 8 with type hints
  • TypeScript: ESLint configuration in project

Resources

On this page