In this talk, we explore a preference learning setting where participants select their top $k$ preferred items from an individualized display set. We introduce a distance-based ranking model, inspired by the Mallows model, utilizing a novel distance function called Reverse Major Index (RMJ). Despite requiring summation over all permutations, the RMJ-based model provides simple closed-form expressions for (ranked) choice probabilities. This desirable property enables effective methods to infer the model parameters from (ranked) choice data with theoretically proven consistency. Comprehensive numerical studies demonstrate the model’s favorable generalization power, robustness, and computational efficiency.
Additionally, we utilize the model to investigate the relationship between the richness of feedback structure (represented by $k$) and feedback collection efficiency. We formulate an active preference learning problem where a company sequentially determines the display sets and collects top-$k$ ranked choices from customers to identify the globally top-ranked candidate with minimal samples. We assess the informational efficiency of various $k$ values using the (asymptotic) sample complexity under optimal sequential feedback collection procedures. Our findings reveal that while information efficiency increases with $k$, a small value of $k=2$ is often close (or sometimes equal) to the full-efficiency value.