"Previously on ..." From Recaps to Story Summarization

Recap of Season 8 Episode 22 (S08E22) shown at the begining of Episode 23 (S08E23) of the Series 24.

The Story: In a tense series of events, Jack Bauer pursues justice at the cost of a peace treaty, seeking to expose the conspirators behind recent events. He enlists journalist Meredith Reed to publish evidence implicating the Russians, but President Allison Taylor intervenes, ordering Meredith's arrest and seizing the incriminating documents. At CTU NY, Chloe O'Brian informs Cole Ortiz about Jack's ally, Jim Ricker, as they scramble to support Jack's mission. Jack, using drastic measures, forces Former President Charles Logan to confess details implicating Russian President Yuri Suvarov in the conspiracy, including the order to kill Renee Walker. As the stakes escalate and alliances fracture, Jack's relentless pursuit of truth and justice jeopardizes international relations and personal safety alike.

Abstract

We introduce multimodal story summarization by leveraging TV episode recaps — short video sequences interweaving key story moments from previous episodes to bring viewers up to speed. We propose PlotSnap, a dataset featuring two crime thriller TV shows with rich recaps and long episodes of 40 minutes. Story summarization labels are unlocked by matching recap shots to corresponding substories in the episode. We propose a hierarchical model TaleSumm that processes entire episodes by creating compact shot and dialog representations, and predicts importance scores for each video shot and dialog utterance by enabling interactions between local story groups. Unlike traditional summarization, our method extracts multiple plot points from long videos. We present a thorough evaluation on story summarization, including promising cross-series generalization. TaleSumm also shows good results on classic video summarization benchmarks.

TaleSumm

(A) TaleSumm ingests all video shots and dialogs of the episode and encodes them using (B) and (C). Based on temporal order, we combine tokens into local story groups (illustration shows small groups of 2 shots and 0-2 utterances). To each group, we append a group token and add multiple embeddings, before feeding them to the the episode-level Transformer $\mathsf{ET}$. For each shot or dialog token, a linear classifier predicts its importance. (B) Video shot encoder. For each frame, representations from multiple backbones are fused using attention ($\boxplus$). We feed these to a shot Transformer encoder $\mathsf{ST}$, and tap a shot-level representation from the $\mathsf{CLS}$ token. (C) Utterance encoder uses a fine-tuned language model and avg-pooling across all words of the utterance. (D) Self-attention mask illustrates the block-diagonal self-attention structure across the episode. Group tokens across the episode (purple squares) communicate with each other. (E) Multiple embeddings are added to the tokens to capture modality type, time, and membership to a local story group.

Qualitative Analysis

TaleSumm predictions on S06E22 of (test set). Ours filled-plot illustrates the importance score profile over time, where orange patches indicate story segments selected for summarization. Annotations are shown below: ground-truth (GT), fandom (F), and human annotated (H). We number the grouped frames representing the predicted contiguous orange chunks as shot groups (SG-n), e.g., this episode has 7 SGs. This episode stands out due to its rapid and significant story advancements, where each sub-story holds apparent importance. The story: Amid the high-stakes sequence depicted in the selected groups 1-3, Zhou Yong's team captures Josh Bauer, leading to a firefight with Jack Bauer, who seeks Josh's location. Negotiations with Phillip Bauer over Josh's return for a vital circuit board escalate global tensions between Russia and the USA. Simultaneously, Mike Doyle defies Jack's wishes and departs with Josh by helicopter (segment 7). Parallely, Lisa, backed by Tom Lennox, confronts a Russian agent, leading to her injury (4, 6). Morris attempts to console Nadia for Milo's loss at CTU in 5. Escalating global tensions and the imminent showdown mark the episode.

TaleSumm predictions on S05E21 of 24 (test set). This episode has 5 SGs. This episode stands out due to its rapid and significant story advancements, where each sub-story holds apparent importance. Also, human annotations are a bit off in comparison to the ground-truth. Importantly, our model considers opinions from all the sources. The story: In the SG-1,2, President Logan, pretending surprised, learns from Admiral Kirkland that Flight 520, now under Jack's control, is a potential threat. Despite Mike's doubts about Jack's intentions, Kirkland urges immediate action, advocating for shooting down the plane. Logan, pretending shock, reluctantly authorizes the attack. Karen alerts Jack to the order, leading to a tense situation. As the plane assumes a landing profile, Kirkland suggests calling off the strike, but Pres. Logan insists on taking it down. Further, in SG-3, Graem criticizes Logan's decision, emphasizing the importance of capturing Jack, but Logan assures Graem of recapturing him. Meanwhile, in SG-4, Jack, having secured incriminating evidence, vows to make Logan pay for President Palmer's assassination. In a surprising turn, as shown in SG-5, President Logan contemplates suicide, but an unexpected call from Miles Papazian presents an alternative — the destruction of the recording. Encouraging Miles to act, Logan faces a critical juncture in the unfolding crisis. Overall the entire episode sets the stage for a series of dramatic events, stressing the depth of deceit and the potential consequences for key characters.

TaleSumm predictions on S06E20 of 24 (test set). This episode has 8 SGs. The story: The White House directs CTU to locate Cheng, as depicted in SG-1, who possesses a Russian sub-circuit board that threatens national security. In SG-2, President Suvarov warns of military consequences if the Chinese agent with the circuit board isn't intercepted. SG-3,4,7 shows how Lennox suspects a spy within the administration and uncovers Lisa's treason. President Noah Daniels instructs Lisa to bring the component back by misleading her partner, Mark Bishop. In SG-5,6, Jack questions Audrey about Cheng, leading to a standoff with Doyle. Audrey mentions "Bloomfield" prompting research by Chloe. In his holding room, Heller warns Jack to stay away from Audrey due to the deadly consequences associated with him (SG-8). Intricate relationships and the imminent threat of international conflict mark the overall content of this episode.

TaleSumm predictions on S07E22 of 24 (test set). This episode has also 8 SGs. The story: In a tense sequence, as shown in SG-1, Jack resorts to torture to extract information from Harbinson about the impending attack but is left empty-handed. In SG-2, following the murder of Jonas Hodges, Olivia Taylor faces scrutiny from the Justice Department. Meeting with Martin Collier, she denies transferring funds, revealing a sinister plot. Meanwhile, SG-3 shows Kim Bauer's plans are disrupted by a flight delay, leading to a strained father-daughter relationship. SG-4,5 displays how Jack, aided by Chloe O'Brian and Renee Walker, captures Tony Almeida and interrogates him about a dangerous canister, followed by Renee uncovering Jibraan's location, and a high-stakes exchange ensues at the Washington Center station. Jack detonates the canister, succumbing to its effects. As a consequence (SG-6), Cara Bowden reports Tony's failure to Alan Wilson, adding tension to the unfolding crisis. Olivia returns to the White House, explaining her absence to Aaron Pierce (SG-7), which beautifully connects back to the SG-2. The narrative takes a dire turn as Cara blackmails Jack for the safety of Kim (SG-8), introducing a new layer of suspense and complexity to the unfolding events. The presence of SG-1,6,7 (absent in GT) clearly highlights our model's ability to complete the overall story arc..

Acknowledgement

We thank the Bank of Baroda for partial travel support, and IIIT-H's faculty seed grant and Adobe Research India for funding. Special thanks to Varun Gupta for assisting with the experiments, Hardik Mittal for his helping hand in this project page, and the Katha-AI group members for user studies.

BibTeX

@inproceedings{singh2024previously, title={{"Previously on ..." From Recaps to Story Summarization}}, author={Aditya Kumar Singh and Dhruv Srivastava and Makarand Tapaswi}, booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2024}, }