Ambiguity-Injected NL2VIS Data Synthesizer
We developed a data synthesizer that systematically introduces ambiguity into seed visualizations. This approach ensures control over the types of ambiguity while maintaining meaningful, interpretable outputs.
Benchmark Comparison
nvBench 2.0 introduces several key innovations compared to existing NL2VIS benchmarks, particularly its explicit handling of query ambiguity and support for one-to-many mapping between queries and visualizations.
Benchmark Statistics
nvBench 2.0 includes a diverse range of natural language query styles and chart types, ensuring comprehensive coverage for evaluating NL2VIS systems.
nvBench 2.0 includes detailed statistics on ambiguity types and patterns, providing insights into the distribution and frequency of different ambiguity categories.
Table 4: Ambiguity count at each reasoning step.
This table shows the distribution of ambiguities across different reasoning steps in the nvBench 2.0 dataset, highlighting which steps in the visualization process are most prone to ambiguity.
Table 5: Statistics of ambiguity patterns.
Our dataset contains diverse ambiguity patterns, with Channel Encoding (CE) being the most common type of ambiguity (88.06%), followed by Data Transformation (DT) ambiguities (46.00%). Many samples contain multiple types of ambiguity, highlighting the complexity of real-world visualization requests.
Step-NL2VIS for Ambiguous NL2VIS
We propose Step-NL2VIS, an LLM-based model trained on nvBench 2.0, which addresses ambiguity by incorporating a step-wise reasoning process and leveraging preference optimization.
Preference Optimization with Step-DPO
Step-DPO utilizes step-wise paired correct and incorrect samples for preference optimization, delivering rich process supervision signals to the model and fostering improved accuracy at each step.
Where Dp represents a step-wise preference dataset, πθ(·|x, s1~k-1) denotes the policy model to be optimized, πref(·|x, s1~k-1) refers to the reference model, and β controls the divergence between the optimized policy and the reference model.
Experiments
We evaluate the performance of various models on the ambiguous NL2VIS task using nvBench 2.0, comparing our Step-NL2VIS model against state-of-the-art approaches.
Overall Performance
The table below presents the comprehensive performance evaluation of different models on nvBench 2.0. Our proposed Step-NL2VIS achieves state-of-the-art performance across most metrics.
Citation
If you find nvBench 2.0 useful for your work, please cite:
@article{luo2024nvbench2,
author = {Luo, Tianqi and Huang, Chuhan and Shen, Leixian and Li, Boyan and Shen, Shuyu and Zeng, Wei and Tang, Nan and Luo, Yuyu},
title = {nvBench 2.0: A Benchmark for Natural Language to Visualization under Ambiguity},
}
License