Clarity-OMR vs Audiveris: 5 OMR Accuracy Tests

Clarity-OMR vs Audiveris: 5 OMR Accuracy Tests | AI Bytes

Quick Verdict: Which Optical Music Recognition Tool Should You Pick?

The best open-source optical music recognition software depends on what you're scanning. Audiveris is the safer all-around pick with a 44.0 average quality score across 10 classical piano benchmarks. But Clarity-OMR — a brand-new ML-based OMR tool — scores nearly triple Audiveris on clean, rhythmic pieces like Bartók (69.5 vs 25.9) and Joplin's The Entertainer (66.2 vs 33.9).

The real takeaway: OMR accuracy depends heavily on your source material. A developer going by Clarity___ on r/MachineLearning just dropped an entirely ML-driven optical music recognition pipeline that converts sheet music PDFs into MusicXML. It's called Clarity-OMR, and it's going head-to-head with Audiveris — the reigning open-source OMR engine that musicians have relied on for years. Both are free. Both output MusicXML. But they take radically different approaches to the same problem, and those differences produce some surprising results.

Let's break it down.

Comparison Overview

Feature	Clarity-OMR	Audiveris
Approach	Deep learning (DaViT + Transformer)	Hybrid CV + NN glyph classifier
Input Format	Sheet music PDFs	PDFs and images
Output Format	MusicXML	MusicXML
Avg Quality Score	42.8 (mir_eval, 10 pieces)	44.0 (mir_eval, 10 pieces)
Best Single Score	69.5 (Bartók)	~44 (consistent across pieces)
Open Source	Yes	Yes
GPU Required	Yes (CUDA)	No
Language	Python / PyTorch	Java
Price	Free	Free
Maturity	New (2026)	Established (10+ years)

How They Actually Work

Clarity-OMR: A Four-Stage ML Pipeline

Clarity-OMR's architecture is honestly pretty clever in the way it breaks the problem apart. Think of it like an assembly line with four specialized stations instead of one worker trying to do everything at once.

Stage 1: Staff Detection (YOLO). A YOLO model scans each PDF page and identifies individual staves. This is the divide-and-conquer step — instead of processing an entire page end-to-end (which would blur fine details), Clarity-OMR crops each staff and processes them individually at 192px height. That resolution choice matters. It preserves grace notes, articulation marks, and dynamic markings that full-page approaches tend to smear into noise.

Stage 2: Recognition (DaViT + Transformer Decoder). Each cropped staff image hits a DaViT-Base encoder paired with a custom Transformer decoder using RoPE positional encoding. The decoder outputs tokens from a 487-element music vocabulary — essentially a specialized language where each "word" represents a musical element like a note, rest, barline, or dynamic marking. The model uses DoRA rank-64 on all linear layers, a parameter-efficient fine-tuning technique that keeps the model size manageable without sacrificing quality.

Stage 3: Grammar-Constrained Beam Search (FSA). This is where the architecture gets clever. A finite state automaton enforces structural validity during decoding. The model literally can't output musically impossible sequences. Beat consistency, chord well-formedness, measure completeness — all checked in real-time as tokens are generated.

Stage 4: MusicXML Export. The validated token sequence gets converted to standard MusicXML.

Audiveris: The Traditional Approach

Audiveris takes the opposite road entirely. It combines classical computer vision techniques — image processing, connected component analysis, and rule-based systems — with a neural network glyph classifier introduced in version 5.x. The result is a hybrid approach where symbol detection uses trained classifiers, but the overall pipeline still relies heavily on hand-crafted rules for layout analysis and musical interpretation.

Bar chart comparing Clarity-OMR and Audiveris quality scores across three benchmarks

The advantage? Rock-solid predictability. No GPU needed. No Python environment to wrangle. Download a JAR file, double-click, and go. And because Audiveris has been refined over many years of community contribution, it handles a genuinely wide variety of input formats and edge cases.

But even with NN classifiers for symbol recognition, the overall pipeline still depends on hand-crafted rules for layout analysis. Each new notation style or engraving quirk can require someone to manually add another rule. End-to-end ML models like Clarity-OMR learn these patterns from data instead.

Feature-by-Feature Breakdown

1. Recognition Accuracy on Clean Scores

According to benchmarks shared by the developer, tested on 10 classical piano pieces using mir_eval:

Bartók (selected work): Clarity-OMR 69.5 vs Audiveris 25.9
The Entertainer (Joplin): Clarity-OMR 66.2 vs Audiveris 33.9

Those aren't marginal differences. Clarity-OMR is scoring roughly double to nearly triple Audiveris on these pieces. For clean, professionally typeset, rhythmically clear classical music, the ML approach is in a different league.

Clarity-OMR's best performances are 2-3x better than Audiveris on the same pieces. But averages tell a different story — and averages are what matter in production.

2. Recognition Accuracy on Difficult Scores

The overall average across all 10 test pieces gives Audiveris a slight edge: 44.0 vs 42.8. That means Clarity-OMR's worst-case performances drag its average below an engine that doesn't use ML at all.

The developer is refreshingly honest about why: Clarity-OMR struggles "when the notes aren't properly on the stave." Slightly offset noteheads, unusual engraving conventions, dense polyphonic textures — these trip up the model. As of March 2026, this is the main limitation.

So the variance is the story. Audiveris is the tortoise — steady, predictable, mediocre everywhere. Clarity-OMR is the hare — brilliant when conditions align, fragile when they don't.

3. Musical Validity of Output

This is where Clarity-OMR's grammar FSA really earns its keep. By enforcing structural rules during decoding, the output is musically coherent even when individual note recognition stumbles. You won't get a 4/4 measure with five beats. You won't get a chord that violates basic voice-leading constraints.

Audiveris can sometimes output technically valid XML that's musically nonsensical — a misidentified time signature cascading into wrong beat groupings for an entire system. It's like a spell-checker that knows every word but can't tell if a sentence makes sense.

The grammar FSA is Clarity-OMR's secret weapon. It's like having a music theory professor checking every measure as it's decoded — catching structural errors that raw accuracy scores don't capture.

4. Setup and Usability

No sugarcoating this one — Audiveris wins in a landslide.

Musician at piano with printed and digital sheet music side by side

Audiveris: install Java 17+, download the package, and launch the GUI with drag-and-drop PDF import. Straightforward if you already have Java.

Clarity-OMR: install Python, install PyTorch with CUDA support, clone the inference repo from GitHub, download the model weights from Hugging Face, and run from the command line. If you've never set up a Python ML environment before, expect to spend at least an hour on dependencies alone (and that's if CUDA cooperates on the first try).

For a musician who just wants to digitize some Chopin? Audiveris. No question.

5. Extensibility and Future Potential

Both tools are fully open-source, but they invite fundamentally different kinds of contributions.

Audiveris needs Java developers who understand both music notation and image processing — a narrow intersection. Adding support for a new symbol type means writing new detection rules, new template matchers, new heuristics. It scales linearly with effort.

Clarity-OMR needs ML engineers and (critically) better training data. The training code is open-source, so anyone with GPU access can experiment. And the developer has identified clear improvement paths: better polyphonic training data, smarter grammar constraints, and more diverse synthetic score rendering. Add more data, retrain, get better results. That's exponential scaling.

As of March 2026, Clarity-OMR is at version 1.0 with enormous room to grow. Audiveris has had over a decade to mature, and its improvement curve has naturally flattened.

Pricing Comparison

Both tools are completely free and open-source. The only cost difference is hardware.

Cost Factor	Clarity-OMR	Audiveris
Software License	Free (open source)	Free (open source)
Minimum Hardware	NVIDIA GPU with CUDA	Any CPU
Cloud GPU Cost	~$0.50-2.00/hour if needed	$0
Typical Setup Time	30-60 minutes	5 minutes

The hidden cost with Clarity-OMR is the GPU requirement. If you don't own a CUDA-capable card, you're looking at either renting cloud GPU time or simply not using it. That's a meaningful barrier for casual users.

Performance Deep Dive

Don't skip this part. the developer benchmarked both tools on 10 classical piano pieces using mir_eval, which is the standard evaluation framework for music information retrieval. Here's the standout data:

Piece	Clarity-OMR Score	Audiveris Score	Margin
Bartók (selected work)	69.5	25.9	+43.6 Clarity-OMR
The Entertainer (Joplin)	66.2	33.9	+32.3 Clarity-OMR
Overall Average (10 pieces)	42.8	44.0	-1.2 (Audiveris)

The pattern is clear. Clarity-OMR's ceiling is dramatically higher than Audiveris's — but its floor is lower too. If you know your source material is clean and well-typeset, you can confidently pick Clarity-OMR. If you're processing a mixed batch of unknown quality? Audiveris gives you safer, more predictable results.

With cherry-picked scores, Clarity-OMR should outperform Audiveris. But you don't get to cherry-pick in production.

When to Choose Each Tool

Pick Clarity-OMR When:

Your source scores are clean, professionally typeset classical music
You have a CUDA-capable GPU (or budget for cloud GPU time)
Musical validity of output matters more than raw note accuracy
You're comfortable with Python and ML workflows
You want to contribute to or build on a growing ML-based OMR project
You're processing rhythmically straightforward pieces

Pick Audiveris When:

You need reliable results across diverse, unknown-quality scores
You don't have a GPU
You want a GUI workflow with minimal setup
Your source material includes older engravings, unusual layouts, or handwritten notation
You need to batch-process hundreds of scores without GPU bottlenecks
You prefer established, well-documented tools

Where Optical Music Recognition Is Heading

As of March 2026, optical music recognition is hitting an inflection point. Rule-based approaches like Audiveris have been the default for over a decade. But ML-based approaches are catching up fast, and Clarity-OMR proves a single developer with the right architecture can reach competitive results against years of accumulated engineering.

Developer workspace with code editor and music notation on dual monitors

The developer behind Clarity-OMR mentions a fourth possible approach: combining model-based recognition with general-purpose vision models. Imagine feeding a difficult passage to both Clarity-OMR and a vision-language model, then merging the results. That hybrid strategy could cover each system's blind spots — something worth watching as vision-language models continue to improve.

But let's not get ahead of ourselves. Right now, neither tool delivers perfect OMR. The overall quality scores (42.8 and 44.0 out of 100) tell you that optical music recognition remains a genuinely hard problem. Sheet music packs an absurd amount of information density into tiny spatial areas — far more than text OCR, which is the problem most people compare it to.

The tools that will win this space are the ones that can scale with data. And that favors the ML approach.

Final Verdict

For reliability today: Audiveris. It's mature, CPU-friendly, and handles diverse inputs with less variance. A 44.0 vs 42.8 average gap is small, but Audiveris achieves it consistently.

For peak performance on clean scores: Clarity-OMR, and it's not close. Scoring 69.5 where Audiveris hits 25.9 isn't incremental improvement — it's a generational jump on the right material.

For long-term potential: Clarity-OMR. ML approaches improve with more data and compute. Rule-based approaches improve with more hand-written rules. One of those paths scales. The other doesn't.

For most users today: Start with Audiveris. If you're processing clean classical scores and have GPU access, test Clarity-OMR on your specific material. Both tools are free — the best approach is to run both on a sample and compare the MusicXML output directly.

This is a space worth watching. Clarity-OMR is exactly the kind of scrappy, well-architected open-source project that tends to snowball with community support. Give it better training data and a year of iteration, and these benchmarks could look very different.

Sources

Frequently Asked Questions

Can Clarity-OMR recognize handwritten sheet music?

No. As of March 2026, Clarity-OMR is designed for professionally typeset sheet music PDFs. The model struggles even with typeset scores where notes are slightly offset from staff lines. Handwritten notation would require a completely different training dataset and likely architectural changes. For handwritten scores, commercial tools like PlayScore or manual transcription remain more practical options.

What GPU do I need to run Clarity-OMR?

You need an NVIDIA GPU with CUDA support. The model uses a DaViT-Base encoder with DoRA rank-64, which is relatively lightweight by modern ML standards. A card with 6-8 GB of VRAM (like an RTX 3060 or better) should handle inference comfortably. If you don't have a local GPU, cloud options like Google Colab Pro or Lambda Labs work — expect to pay roughly $0.50-2.00 per hour of processing time.

Does Clarity-OMR output MIDI files?

Not directly. Clarity-OMR outputs MusicXML only. However, MusicXML can be easily converted to MIDI using free tools like MuseScore (import the MusicXML, export as MIDI) or command-line utilities like music21 in Python. The MusicXML output preserves richer musical information than MIDI — dynamics, articulations, text markings — so it's actually the better intermediate format.

Can I run Clarity-OMR on a Mac with Apple Silicon?

Not natively. Clarity-OMR requires CUDA, which is NVIDIA-only. Apple Silicon Macs use Metal for GPU acceleration, and PyTorch's MPS backend doesn't support all the operations Clarity-OMR uses. Your best option is running it on a cloud GPU instance (Google Colab, AWS, or Lambda Labs) and uploading your PDFs. Audiveris runs natively on any Mac since it's Java-based and CPU-only.

How long does Clarity-OMR take to process a full page of sheet music?

Processing time depends on the number of staves per page and your GPU. Each staff goes through four pipeline stages (YOLO detection, DaViT encoding, beam search decoding, MusicXML export). On a modern GPU like an RTX 4070, expect roughly 5-15 seconds per page for a typical piano score with two staves. Orchestral scores with 15+ staves per page will take proportionally longer. Audiveris is generally faster per page since it runs on CPU without the overhead of neural network inference.