Parallel HTML Parsing with HSV
Proof-of-concept demonstrating parallel parsing of HTML-like structured documents using HSV format.
Background
HTML parsing has been sequential for 30 years. Research attempts achieved limited results:
| Project | Year | Approach | Result |
|---|---|---|---|
| HPar | 2013 | Speculative data-parallel | 2.4x on 4 cores |
| ZOOMM | 2013 | Parallel browser engine | 2x (whole engine) |
| Servo | 2017 | Off-main-thread parsing | Tokenization only |
HSV solves this by changing the representation, not the parser.
How It Works
- Represent HTML as HSV - use control characters instead of angle brackets
- Split at delimiters - O(n) scan for FS (record separator)
- Parse chunks in parallel - no state synchronization needed
- Reconstruct - results are independent, just collect them
Run Tests
go test -v
Run Benchmarks
go test -bench=. -benchmem
Results
Size Chunks Sequential Parallel
---- ------ ---------- --------
100 100 68µs 77µs
500 500 360µs 349µs
1000 1000 646µs 637µs
2000 2000 1.45ms 1.40ms
Parallel wins at ~500+ elements. For real HTML processing (DOM building, rendering), the advantage would be larger.
Key Points
- No escaping:
<div>,&,"quotes"preserved literally in HSV - Trivial parallelization: ~50 lines of code
- Verified correctness: Sequential and parallel produce identical results
- Linear scaling: No speculation, no state synchronization
Why HSV Succeeds Where Others Struggled
HPar needed speculative parallelization with rollback. Servo moved tokenization off-thread but kept DOM construction sequential. Both fight HTML's stateful parsing model.
HSV changes the question: instead of "how do we parallelize HTML parsing?" it asks "why use a format that requires sequential parsing?"
It's the difference between building a faster horse and building a car.
References
HPar (2013)
Zhijia Zhao, Michael Bebenita, Dave Herman, Jianhua Sun, and Xipeng Shen. "HPar: A practical parallel parser for HTML—taming HTML complexities for parallel parsing." ACM Transactions on Architecture and Code Optimization (TACO), Vol. 10, No. 4, Article 44, December 2013. https://research.csc.ncsu.edu/picture/publications/papers/taco14.pdf
ZOOMM (2013)
Calin Cascaval, Seth Fowler, Pablo Montesinos-Ortego, Wayne Piekarski, Mehrdad Reshadi, Behnam Robatmili, Michael Weber, and Vrajesh Bhavsar. "ZOOMM: A parallel web browser engine for multicore mobile devices." Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '13), February 2013. https://dl.acm.org/doi/10.1145/2442516.2442543
Servo (2017)
"Off main thread HTML parsing in Servo." Servo Blog, August 2017. https://servo.org/blog/2017/08/23/gsoc-parsing/
ParDOM (2011)
Wei Lu and Dennis Gannon. "A data parallel algorithm for XML DOM parsing." Proceedings of the 2007 Workshop on Service-Oriented Computing Performance. https://www.researchgate.net/publication/221412394_A_data_parallel_algorithm_for_XML_DOM_parsing
See Also
HSV was created by Danslav Slavenskoj, Lingenic LLC, 2026.
Dedicated to the public domain under CC0 1.0.