Development Workflow
Build from source, run tests, and contribute to OpenDataLoader PDF. Prerequisites for Windows, macOS, and Linux.
This guide covers building from source, running tests, and contributing changes to OpenDataLoader PDF.
Prerequisites
Before you begin, ensure you have the following installed:
| Tool | Version | Purpose |
|---|---|---|
| Java | 11+ | Core engine |
| Maven | 3.8+ | Java build system |
| Python | 3.10+ | Python bindings |
| uv | Latest | Python package management |
| Node.js | 20+ | Node.js bindings |
| pnpm | Latest | Node.js package management |
Verify your setup:
java -version
mvn --version
python --version
uv --version
node --version
pnpm --versionOS-Specific Install Commands
| Tool | macOS (Homebrew) | Ubuntu/Debian | Windows |
|---|---|---|---|
| Java 17 | brew install --cask temurin | sudo apt install openjdk-17-jdk | Adoptium installer |
| Maven | brew install maven | sudo apt install maven | Download or use WSL |
| uv | brew install uv | curl -LsSf https://astral.sh/uv/install.sh | sh | powershell -c "irm https://astral.sh/uv/install.ps1 | iex" |
| pnpm | brew install pnpm | npm install -g pnpm | npm install -g pnpm |
Windows users: We recommend WSL 2 for the smoothest development experience. All shell scripts (
./scripts/*.sh) assume a Unix-like environment.
Git LFS
Some test fixtures are stored with Git LFS. Install it before cloning:
# macOS
brew install git-lfs
# Ubuntu/Debian
sudo apt install git-lfs
# Then initialize
git lfs installBuild & Test
Quick Start (Local Development)
Run tests for each package independently:
# Java tests
./scripts/test-java.sh
# Python tests
./scripts/test-python.sh
# Node.js tests
./scripts/test-node.shFull CI Build
Build all packages (Java, Python, Node.js) in one command:
./scripts/build-all.shBuild Java Only
mvn clean install -f java/pom.xmlSuccessful builds produce artifacts under java/opendataloader-pdf-cli/target, including the shaded CLI JAR.
Run the CLI from Source
After building, run the CLI directly:
java -jar java/opendataloader-pdf-cli/target/opendataloader-pdf-cli-<VERSION>.jar [options] <INPUT>Refer to the CLI Options Reference for the full flag list.
Code Generation
Warning: After changing CLI options in Java, you must run
npm run sync. This regeneratesoptions.jsonand all Python/Node.js bindings. Forgetting this silently breaks the wrappers.
CLI options and JSON schema documentation are auto-generated from source files. This ensures consistency across all language bindings.
Note: Reference documentation MDX files (CLI options, JSON schema, convert options) are generated by CI at release time and pushed to the opendataloader.org repository. They are not tracked in this repo. Manual documentation also lives in opendataloader.org.
Auto-Generated Files (Do Not Edit)
The following files are generated by npm run sync — edit the Java source instead:
options.jsonnode/opendataloader-pdf/src/cli-options.generated.tsnode/opendataloader-pdf/src/convert-options.generated.tspython/opendataloader-pdf/src/opendataloader_pdf/cli_options_generated.pypython/opendataloader-pdf/src/opendataloader_pdf/convert_generated.py
Available Commands
| Command | Description |
|---|---|
npm run sync | Full sync: export options from Java + generate all docs |
npm run sync-options | Export options from Java + generate option docs |
npm run sync-schema | Generate schema docs |
npm run generate-options | Generate option docs only (without Java export) |
npm run generate-schema | Generate schema docs only |
After Modifying Java CLI Options
npm run sync-optionsThis exports options from Java and generates:
| Generated File | Purpose |
|---|---|
options.json | CLI options source of truth |
node/opendataloader-pdf/src/cli-options.generated.ts | Node.js CLI options |
node/opendataloader-pdf/src/convert-options.generated.ts | Node.js convert options |
python/opendataloader-pdf/src/opendataloader_pdf/cli_options_generated.py | Python CLI options |
python/opendataloader-pdf/src/opendataloader_pdf/convert_generated.py | Python convert options |
After Modifying JSON Schema
Edit schema.json directly, then:
npm run generate-schemaThis generates:
| Generated File | Purpose |
|---|---|
public/schema.json | Public schema for web access |
Full Sync
To regenerate everything (options + schema):
npm run syncProject Structure
opendataloader-pdf/
├── java/ # Core Java engine
│ ├── opendataloader-pdf-core/ # Main library
│ └── opendataloader-pdf-cli/ # CLI application
├── python/ # Python package
├── node/ # Node.js package
└── scripts/ # Build & test scriptsCode Style
- Java: Follow existing patterns in the codebase
- Python: PEP 8 with type hints
- TypeScript: ESLint configuration in project
Resources
- CLI Options Reference — All available command-line options
- JSON Schema — Output format specification
- Javadoc — Java API reference
- Contributing Guide — How to submit changes