Junlin Yang’s Personal Website

Hi, I’m Junlin Yang, a third-year student in Department of Computer Science and Technology at Tsinghua University, . Currently, I am fortunate to be a research intern at the XLANG Lab at The University of Hong Kong, under the guidance of Prof. Tao Yu. In the past, I’ve had the privilege of interning at the Tsinghua Pervasive HCI Group, where I was advised with Prof. Chun Yu and Prof. Yuanchun Shi.

I am actively seeking PhD opportunities for 2026 Fall. Feel free to reach out if you’re interested in my research, looking for collaboration, or just want to chat!

Research Interest

I am particularly interested in Machine Learning and Human-Computer Interaction, with a focus on NLP (especially Language Grounding), Multimodal Learning, Neuro-Symbolic Concepts&Thinkings and Reinforcement Learning. My recent research focuses on building embodied agents, especially computer agents, that can excel in solving human tasks and collaborating effectively with people. I remain curious and open to exploring various research questions in ML and HCI. My ultimate aim is to guide technological development through insights derived from human and social behavior, enabling technology to better serve humanity and contribute meaningfully to society.

替代文本

Research Experience

AgentNet: Multimodal Computer Agent Data Scaling

2024.7 - present Co-lead

Advised by Prof. Tao Yu, The University of Hong Kong

  • Motivation: Considering the current GUI Agent datasets lack long-horizon tasks, real-world usage scenarios, and coverage across various applications, we introduce AgentNet—a diverse, challenging, and real-world dataset of computer usage scenarios aggregated from well-defined application contexts.

  • Method: We developed an efficient data collection system with efficient algorithms. After crowdsourcing and verification, we analyzed these data to gain valuable insights and perform extensive experiments to enhance the capabilities of our computer-use agents and investigate the characteristics of human-generated GUI data.

  • Results: We acquired tens of thousands of long trajectories. Leveraging this data, we trained VLMs and explored the scaling law of VLMs as computer use agents.

Decompositional Learning for GUI Visual Grounding based on GUI Elements

2024.11 - present Co-lead

Advised by Prof. Caiming Xiong and Prof. Tao Yu, The University of Hong Kong

  • Motivation: Current works on GUI visual grounding mainly focus on improving an agent’s localization-positions from instructions and screenshots. However, based on neural-symbolic thinking, we argue that cognition—identifying what an icon or component is, how to interact with it, and whether it matches the description—is also crucial but often overlooked.

  • Method: We collect GUI elements (icons, components, etc.) and build grounding data based on them. Then, we apply algorithms to enhance the VLM’s localization and cognition abilities, thereby improving its grounding capabilities in a more comprehensive manner.

  • Results(Expected): We hope to validate that a bottom-up, component-based learning approach, combined with traditional localization-based learning methods, can fundamentally improve the grounding ability of GUI agents.

VideoAgentTrek: Extract Agent Trajectories from Hindsight Videos

2024.7 - present

Advised by Prof. Tao Yu, The University of Hong Kong

  • Motivation: Numerous online computer use videos are not well utilized for GUI agent learning. Inspired by OpenAI’s Video PreTraining (VPT), we decided to enable GUI agent to learn from online videos.

  • Method: We trained an Inverse Dynamics Model (IDM) to transform unlabeled online tutorial videos into pretraining trajectory data for autonomous agents.

  • Results(Expected): Using these pretraining trajectory we transformed, we pretrained a vision language model(VLM) using large-scale data generated from online video to enhance its GUI agent abilities.

EchoMind: Enhancing Group Discussions through Human-AI Collaborative Issue Mapping

2023.10 - 2024.9

Advised by Prof. Chun Yu and Prof. Yuanchun Shi, Tsinghua University

  • Motivation: In group discussions, diverse perspectives are combined while a facilitator guides the conversation. However, the facilitator may struggle to keep track of the conversation and structure its content, leading to unproductive outcomes.

  • Method: We build a collaborative system for visualizing discussion knowledge through real-time issue mapping, leveraging Large Language Models(LLMs).

  • Results: User studies indicates that EchoMind helps clarify objectives and enhance productivity.

Project Experience

MartialArtsLM: Pretraining and Fine-tuning a Model Capable of Answering Questions about Martial Arts Novels

2023.8 - 2023.9

  • Pretrained the LM(Language Model) using preprocessed data from novels by Louis Cha.
  • Fine-tuned the LM with different data volumes and iteration counts, Synthesize an estimate of 400,000 pieces of Q&A data in 12 categories generated by LLM (Large Language Model).
  • Built an interactive dialog system for Q&A with LM using Gradio.

What’s the Buzz: Analysis of the Trend of Technology News Popularity on Chinese Websites in 2023

2023.8 - 2023.9

  • Crawled 8,495 news articles from sina.com from 2023.1 to 2023.8, and built an information retrieval system, using methods including inverted index, TF-IDF.
  • Discovered and categorized trending tech events, using TfidfVectorizer, k-means algorithm, and t-SNE algorithm.

Courses

GPA: 3.91/4.00

Selected courses over 4.0

CS

Introduction to Artificial Intelligence, Fundamentals of Computer Science, Foundation of Programming, Programming Training, Software Engineering, Computer Network, Formal Languages and Automata, Foundation of Object-Oriented Programming, Discrete Mathematics(2) etc.

Math

Probability Theory and Mathematical Statistic, Calculus A, Linear Algebra, Advanced topics in Linear Algebra, Introduction to Complex Analysis, University Physic etc.

Awards and Honors

  • Overall Excellence Scholarship, Tsinghua University, 2024
  • Overall Excellence Scholarship, Tsinghua University, 2023
  • Freshman Scholarship, Tsinghua University, 2022
  • Outstanding Student Cadre, Tsinghua University, 2023

Languages

  • Mandarin (Native), English (Fluent)
  • TOEFL: 110 (R:29, L:29, S:25, W:27)

Skills

  • Curiosity and passion for research and strong teamwork sense and skills
  • Experience in pretraining, fine-tuning LM and VLM, and agent framework designing
  • Experience in OpenRLHF, PyTorch, Sklearn
  • Programming languages: Python > C/C++ > system verilog

Service and Leadership

  • Class Monitor, Tsinghua University, 2022.9 - 2023.9
    • Honored with Outstanding Class Award
    • Organized several activities including lab visits, corporate visits, and interviews with PhDs in HCI, AI, etc., to help classmates discover research areas and future planning.
  • Founder of an Alumni Mutual-Help Platform, Shenzhen Experimental School, 2023.1 - 2024.1
    • The WeChat Official Account platform gained over 50k reads and over 2.5k followers
    • Disseminated comprehensive knowledge about university academic life to high school students from diverse economic, family, and cognitive backgrounds.

Miscs

Senior Mentors

At Tsinghua, I was fortunate to meet some incredibly kind, talented, and supportive seniors, including Yuxuan Li and Zirui Cheng, who gave me my first real insight into the world of research. During my internship at HKU, I was lucky to work closely with Tianbao Xie and Yiheng Xu, who patiently guided me step by step through agent-based and multimodal research and helped me gain a deeper understanding of academic life. I’m also grateful to have collaborated with Xinyuan Wang and Bowen Wang on projects like AgentNet. I learned much from them, and together we created meaningful work.

Hobbies

  • Athletics: Tennis, Badminton, jogging
  • Arts: Music, film, reading