End-to-end dense video captioning with parallel decoding T Wang, R Zhang, Z Lu, F Zheng, R Cheng, P Luo ICCV 2021, 6847-6857, 2021 | 150 | 2021 |
Event-centric hierarchical representation for dense video captioning T Wang, H Zheng, M Yu, Q Tian, H Hu IEEE Transactions on Circuits and Systems for Video Technology 31 (5), 1890-1900, 2020 | 67 | 2020 |
Caption anything: Interactive image description with diverse multimodal controls T Wang*, J Zhang*, J Fei*, Y Ge, H Zheng, Y Tang, Z Li, M Gao, S Zhao, ... arXiv preprint arXiv:2305.02677, 2023 | 40 | 2023 |
VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix T Wang, W Jiang, Z Lu, F Zheng, R Cheng, C Yin, P Luo ICML 2022, 2022 | 26 | 2022 |
Set-level guidance attack: Boosting adversarial transferability of vision-language pre-training models D Lu, Z Wang, T Wang, W Guan, H Gao, F Zheng Proceedings of the IEEE/CVF International Conference on Computer Vision, 102-111, 2023 | 12 | 2023 |
Knowledge-aware prompt tuning for generalizable vision-language models B Kan, T Wang, W Lu, X Zhen, W Guan, F Zheng Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 11 | 2023 |
-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation C Wu, T Wang, Y Ge, Z Lu, R Zhou, Y Shan, P Luo International Conference on Machine Learning, 37713-37727, 2023 | 10 | 2023 |
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline T Geng, T Wang, J Duan, R Cong, F Zheng IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023 | 10 | 2023 |
Dense-captioning events in videos: Sysu submission to activitynet challenge 2020 T Wang, H Zheng, M Yu CVPR Workshops, 2020 | 10 | 2020 |
Image caption with endogenous–exogenous attention T Wang, H Hu, C He Neural Processing Letters 50, 431-443, 2019 | 10 | 2019 |
Video understanding with large language models: A survey Y Tang, J Bi, S Xu, L Song, S Liang, T Wang, D Zhang, J An, J Lin, R Zhu, ... arXiv preprint arXiv:2312.17432, 2023 | 8 | 2023 |
Accelerating Vision-Language Pretraining with Free Language Modeling T Wang, Y Ge, F Zheng, R Cheng, Y Shan, X Qie, P Luo IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023 | 6 | 2023 |
Transferable decoding with visual entities for zero-shot image captioning J Fei, T Wang, J Zhang, Z He, C Wang, F Zheng Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 6 | 2023 |
Llmva-gebc: Large language model with video adapter for generic event boundary captioning Y Tang, J Zhang, X Wang, T Wang, F Zheng arXiv preprint arXiv:2306.10354, 2023 | 4 | 2023 |
Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos T Wang*, J Zhang*, F Zheng, W Jiang, R Cheng, P Luo arXiv preprint arXiv:2303.06378, 2023 | 4 | 2023 |
Semantic-aware pretraining for dense video captioning T Wang, Z Liu, F Zheng, Z Lu, R Cheng, P Luo arXiv preprint arXiv:2204.07449, 2022 | 4 | 2022 |
Multi-modal segment assemblage network for ad video editing with importance-coherence reward Y Tang, S Xu, T Wang, Q Lin, Q Lu, F Zheng Proceedings of the Asian Conference on Computer Vision, 3519-3535, 2022 | 4 | 2022 |
PTVD: A Large-Scale Plot-Oriented Multimodal Dataset Based on Television Dramas C Li, X Peng, T Wang, Y Ge, M Liu, X Xu, Y Wang, Y Shan arXiv preprint arXiv:2306.14644, 2023 | 1 | 2023 |
Show, Tell and Rephrase: Diverse Video Captioning via Two-Stage Progressive Training Z Liu, T Wang, J Zhang, F Zheng, W Jiang, K Lu IEEE Transactions on Multimedia, 2022 | 1 | 2022 |
Two in One Go: Single-stage Emotion Recognition with Decoupled Subject-context Transformer X Li, T Wang, J Zhao, S Mao, J Wang, F Zheng, X Peng, X Li arXiv preprint arXiv:2404.17205, 2024 | | 2024 |