Modeling Creative Selling with Verbal, Vocal, and Visual Features in Streaming Economy: Predictive, Interpretable, and Generative AI

In the context of hyper competitive livestreaming platforms such as Twitch and Tiktok, the creative selling ability of a streamer becomes of unprecedent significance. However, it is challenging to scientifically model streamer creative selling in an automatic, scalable, and theory-consistent manner. This research customizes multimodal transformer-based deep learning algorithms to measure streamer creative selling. Our algorithm outperforms a host of benchmark deep learning models and reveals that it is not only the multimodal representations of lower-level verbal, vocal, and visual features but also their crossmodal interactions (verbal-vocal, verbal-visual, vocal-visual, and verbal-vocal-visual) that are important in capturing creative selling. We validate our algorithm by showing that the predicted creative selling is indeed correlated with the higher-level four theoretical factors of verbal originality and appropriateness, vocal arousal, and visual body motion in an upstream analysis. Further, in a downstream analysis, we validate that streamers who have higher creative selling scores tend to generate more product sales, and interestingly this relationship is amplified for streamers with higher reputation and for hedonic and high-price products. Moreover, a followup field experiment, using ChatGPT to generate videos with exogenous variations, corroborates the causal impact of creative selling on customer responses, providing more external validity of our algorithmic measure of creative selling. Platforms can leverage our algorithm to rank streamers based on creative selling scores for advertising and promotion activities, and streamers can benefit from it by improving their verbal, vocal, and visual cues of creative selling to increase sales outcomes in streaming commerce.