Junlin Yang’s Personal Website
Hi, I’m Junlin Yang, a third-year student in Department of Computer Science and Technology at Tsinghua University, . Currently, I am fortunate to be a research intern at the XLANG Lab at The University of Hong Kong, under the guidance of Prof. Tao Yu. In the past, I’ve had the privilege of interning at the Tsinghua Pervasive HCI Group, where I was advised with Prof. Chun Yu and Prof. Yuanchun Shi.
I am actively seeking PhD opportunities for 2026 Fall. Feel free to reach out if you’re interested in my research, looking for collaboration, or just want to chat!
Research Interest
I am particularly interested in Machine Learning and Human-Computer Interaction, with a focus on NLP (especially Language Grounding), Multimodal Learning, Neuro-Symbolic Concepts&Thinkings and Reinforcement Learning. My recent research focuses on building embodied agents, especially computer agents, that can excel in solving human tasks and collaborating effectively with people. I remain curious and open to exploring various research questions in ML and HCI. My ultimate aim is to guide technological development through insights derived from human and social behavior, enabling technology to better serve humanity and contribute meaningfully to society.
Research Experience
AgentNet: Multimodal Computer Agent Data Scaling
2024.7 - present Co-lead
Advised by Prof. Tao Yu, The University of Hong Kong
Motivation: Considering the current GUI Agent datasets lack long-horizon tasks, real-world usage scenarios, and coverage across various applications, we introduce AgentNet—a diverse, challenging, and real-world dataset of computer usage scenarios aggregated from well-defined application contexts.
Method: We developed an efficient data collection system with efficient algorithms. After crowdsourcing and verification, we analyzed these data to gain valuable insights and perform extensive experiments to enhance the capabilities of our computer-use agents and investigate the characteristics of human-generated GUI data.
Results: We acquired tens of thousands of long trajectories. Leveraging this data, we trained VLMs and explored the scaling law of VLMs as computer use agents.
Decompositional Learning for GUI Visual Grounding based on GUI Elements
2024.11 - present Co-lead
Advised by Prof. Caiming Xiong and Prof. Tao Yu, The University of Hong Kong
Motivation: Current works on GUI visual grounding mainly focus on improving an agent’s localization-positions from instructions and screenshots. However, based on neural-symbolic thinking, we argue that cognition—identifying what an icon or component is, how to interact with it, and whether it matches the description—is also crucial but often overlooked.
Method: We collect GUI elements (icons, components, etc.) and build grounding data based on them. Then, we apply algorithms to enhance the VLM’s localization and cognition abilities, thereby improving its grounding capabilities in a more comprehensive manner.
Results(Expected): We hope to validate that a bottom-up, component-based learning approach, combined with traditional localization-based learning methods, can fundamentally improve the grounding ability of GUI agents.
VideoAgentTrek: Extract Agent Trajectories from Hindsight Videos
2024.7 - present
Advised by Prof. Tao Yu, The University of Hong Kong
Motivation: Numerous online computer use videos are not well utilized for GUI agent learning. Inspired by OpenAI’s Video PreTraining (VPT), we decided to enable GUI agent to learn from online videos.
Method: We trained an Inverse Dynamics Model (IDM) to transform unlabeled online tutorial videos into pretraining trajectory data for autonomous agents.
Results(Expected): Using these pretraining trajectory we transformed, we pretrained a vision language model(VLM) using large-scale data generated from online video to enhance its GUI agent abilities.
EchoMind: Enhancing Group Discussions through Human-AI Collaborative Issue Mapping
2023.10 - 2024.9
Advised by Prof. Chun Yu and Prof. Yuanchun Shi, Tsinghua University
Motivation: In group discussions, diverse perspectives are combined while a facilitator guides the conversation. However, the facilitator may struggle to keep track of the conversation and structure its content, leading to unproductive outcomes.
Method: We build a collaborative system for visualizing discussion knowledge through real-time issue mapping, leveraging Large Language Models(LLMs).
Results: User studies indicates that EchoMind helps clarify objectives and enhance productivity.
Project Experience
MartialArtsLM: Pretraining and Fine-tuning a Model Capable of Answering Questions about Martial Arts Novels
2023.8 - 2023.9
- Pretrained the LM(Language Model) using preprocessed data from novels by Louis Cha.
- Fine-tuned the LM with different data volumes and iteration counts, Synthesize an estimate of 400,000 pieces of Q&A data in 12 categories generated by LLM (Large Language Model).
- Built an interactive dialog system for Q&A with LM using Gradio.
What’s the Buzz: Analysis of the Trend of Technology News Popularity on Chinese Websites in 2023
2023.8 - 2023.9
- Crawled 8,495 news articles from sina.com from 2023.1 to 2023.8, and built an information retrieval system, using methods including inverted index, TF-IDF.
- Discovered and categorized trending tech events, using TfidfVectorizer, k-means algorithm, and t-SNE algorithm.
Courses
GPA: 3.91/4.00
Selected courses over 4.0
CS
Introduction to Artificial Intelligence, Fundamentals of Computer Science, Foundation of Programming, Programming Training, Software Engineering, Computer Network, Formal Languages and Automata, Foundation of Object-Oriented Programming, Discrete Mathematics(2) etc.
Math
Probability Theory and Mathematical Statistic, Calculus A, Linear Algebra, Advanced topics in Linear Algebra, Introduction to Complex Analysis, University Physic etc.
Awards and Honors
- Overall Excellence Scholarship, Tsinghua University, 2024
- Overall Excellence Scholarship, Tsinghua University, 2023
- Freshman Scholarship, Tsinghua University, 2022
- Outstanding Student Cadre, Tsinghua University, 2023
Languages
- Mandarin (Native), English (Fluent)
- TOEFL: 110 (R:29, L:29, S:25, W:27)
Skills
- Curiosity and passion for research and strong teamwork sense and skills
- Experience in pretraining, fine-tuning LM and VLM, and agent framework designing
- Experience in OpenRLHF, PyTorch, Sklearn
- Programming languages: Python > C/C++ > system verilog
Service and Leadership
- Class Monitor, Tsinghua University, 2022.9 - 2023.9
- Honored with Outstanding Class Award
- Organized several activities including lab visits, corporate visits, and interviews with PhDs in HCI, AI, etc., to help classmates discover research areas and future planning.
- Founder of an Alumni Mutual-Help Platform, Shenzhen Experimental School, 2023.1 - 2024.1
- The WeChat Official Account platform gained over 50k reads and over 2.5k followers
- Disseminated comprehensive knowledge about university academic life to high school students from diverse economic, family, and cognitive backgrounds.
Miscs
Senior Mentors
At Tsinghua, I was fortunate to meet some incredibly kind, talented, and supportive seniors, including Yuxuan Li and Zirui Cheng, who gave me my first real insight into the world of research. During my internship at HKU, I was lucky to work closely with Tianbao Xie and Yiheng Xu, who patiently guided me step by step through agent-based and multimodal research and helped me gain a deeper understanding of academic life. I’m also grateful to have collaborated with Xinyuan Wang and Bowen Wang on projects like AgentNet. I learned much from them, and together we created meaningful work.
Hobbies
- Athletics: Tennis, Badminton, jogging
- Arts: Music, film, reading