About Me 关于我

I am a graduate student pursuing a Master of Science in Data Science at the University of Michigan, Ann Arbor (LSA). I completed my Bachelor of Engineering in Data Science and Big Data Technology at Tongji University, Shanghai (July 2025).

In the spring semester of 2023, I studied as an exchange student in the Department of Electrical Engineering at National Taiwan University, Taipei.

我目前就读于密歇根大学安娜堡(LSA学院)数据科学硕士(Master of Data Science)专业。本科毕业于同济大学计算机科学与技术学院数据科学与大数据技术专业(2025年7月)。

2023年春季学期,我赴台北国立台湾大学电机工程学系进行交换学习。

Research Interests 研究兴趣

Speculative Decoding Stream Video Agent Remote Sensing Cloud Image Vector Font Generation Characters Inpainting

Education 教育经历

University of Michigan, Ann Arbor 密歇根大学安娜堡
Master of Science in Data Science — LSA College 数据科学理学硕士 — LSA学院
Ann Arbor, MI, USA 美国密歇根州安娜堡
Tongji University 同济大学
Bachelor of Engineering — Data Science and Big Data Technology 工学学士 — 数据科学与大数据技术
Shanghai, China 中国·上海
National Taiwan University 国立台湾大学
Exchange Program — Dept. of Electrical Engineering 交换项目 — 电机工程学系
Taipei, China 中国·台北

Publications 发表成果

TMLR Sub. Yifan Zhang, Yuren Wang, Yunta Hsieh, Xin Wang, Ping Zhang, Ziyi Yang, Jianing Ma, Zesen Zhao, Boyuan Zheng, Hei Ting Una Chan, Jiarui Li, Xueshen Liu, Kunxiao Gao, Yanheng Shang, Ruoyan Zhang, Ruiyao Liu, Jingxuan Zhang, Junchen Li, Zhongwei Wan, Ziheng Zhang et al. (2026). "Speculative Decoding for Multimodal Models: A Survey." Submitted to TMLR, Apr 2026. Preprints.org
ECCV'26 Sub. Hui Shen, Xin Wang, Ping Zhang, Yunta Hsieh, Boyuan Zheng, Qi Han, Zhongwei Wan, Ziheng Zhang, Jingxuan Zhang, Jing Xiong, Ziyuan Liu, Yifan Zhang, Hangrui Cao, Chenyang Zhao, Mi Zhang (2026). "MMSpec: Benchmarking Speculative Decoding for Vision-Language Models." ECCV 2026 Conference Submission, Feb 2026. arXiv:2603.14989
ECCV'26 Sub. Ruiyao Liu, Hui Shen, Ping Zhang, Yunta Hsieh, Yifan Zhang, Jing Xu, Sicheng Chen, Junchen Li, Jiawei Lu, Jianing Ma, Jiaqi Mo, Qi Han, Zhen Zhang, Zhongwei Wan, Jing Xiong, Xin Wang, Ziyuan Liu, Hangrui Cao, Ngai Wong (2026). "MathGen: Benchmarking Mathematical Generation for Text-to-Image Models." ECCV 2026 Conference Submission, Feb 2026. arXiv:2603.27959
Preprint Yifan Zhang, Qian Chen, Yi Liu, Wengen Li, Jihong Guan (2026). "SADER: Structure-Aware Diffusion Framework with DEterministic Resampling for Multi-Temporal Remote Sensing Cloud Removal." arXiv preprint arXiv:2602.00536, 2026. arXiv:2602.00536
CVPR'26 Sub. Jiawei Ma, Haolong Li, Yifan Zhang, Song Yang, Chen Ye (2025). "Autoregressive Extraction of Chinese Character Strokes with Diffusion Models." CVPR 2026 Conference Submission, Dec 2025.
AAAI'24 Haolong Li, Chenghao Du, Ziheng Jiang, Yifan Zhang, Jiawei Ma, Chen Ye (2024). "Towards Automated Chinese Ancient Character Restoration: A Diffusion-Based Method with a New Dataset." Proceedings of the AAAI Conference on Artificial Intelligence, 38(4): 3073–3081. doi:10.1609/aaai.v38i4.28090
ICBASE'24 Shatong Zhu, Yifan Zhang (2023). "Coordinated Optimization and Configuration Optimization of Wind, Photovoltaics and Energy Storage based on Particle Swarm Optimization Algorithm." ICBASE 2024, Dec 2023.
Patent [Filed, 3rd author] A Cold Diffusion Model-based Method for Restoration of Inscription Text 【已申请,第三作者】基于冷扩散模型的碑文文字修复方法
Patent [Filed, 2nd author] A Method to Synthesize an Ancient Chinese Inscription Dataset That Simulates Real Erosion 【已申请,第二作者】一种模拟真实侵蚀效果的古代汉字碑刻数据集合成方法

Research Experience 科研经历

Key Laboratory of Embedded System and Service Computing, Ministry of Education — Tongji University 教育部嵌入式系统与服务计算重点实验室 — 同济大学
Undergraduate Researcher  |  Supervisor: Prof. Chen Ye 本科研究员  |  指导教师:叶晨 教授
  • Introduced a new Chinese Ancient Rubbing and Manuscript Character Dataset (ARMCD), comprising 15,553 real-world ancient single-character images across 42 rubbings and manuscripts.
  • Provided synthetic masks by overlaying local erosion region masks extracted from real-world eroded images.
  • Proposed a cold diffusion model-based method called DiffACR (Diffusion Model for Automated Chinese Ancient Character Restoration) for the ACACR task by treating synthesis of eroded images as a special form of cold diffusion on pristine images.
  • Extracted prior masks directly from eroded images to guide the restoration process.
  • Published one conference paper (AAAI'24); completed 2 derivative design patents (currently under review).
  • 构建了新的古代碑刻与手稿汉字数据集 ARMCD,收录来自42种碑刻与手稿的15,553张真实汉字图像。
  • 通过叠加真实侵蚀图像的局部侵蚀区域掩码,生成合成掩码用于训练。
  • 提出基于冷扩散模型的方法 DiffACR,以端到端方式实现自动化古汉字修复(ACACR任务)。
  • 从侵蚀图像中直接提取先验掩码,为修复区域提供更精确的引导信息。
  • 已发表AAAI'24论文一篇;完成2项外观设计专利(审查中)。

Project Experience 项目经历

Automatic Construction of Knowledge Graphs based on Wikipedia 基于维基百科的知识图谱自动构建
  • Conducted zero-shot named entity recognition and relation extraction using pre-trained language models (BERT-based CKIP, Ernie-based UIE, MetaNER), extracting 21,960 entities of 19 types from Chinese Wikipedia.
  • Performed zero-shot knowledge triplet extraction via OpenNRE and Ernie-UIE, extracting 3,999 inter-entity relationships and entity attributes.
  • Performed entity merging and disambiguation using context-based and SimBERT-based knowledge embeddings.
  • Built a knowledge graph with Neo4J to visualize entities, relationships, and attributes.
  • 使用预训练语言模型(BERT系CKIP、Ernie-UIE、MetaNER)进行零样本命名实体识别与关系抽取,从中文维基百科提取19类共21,960个实体。
  • 通过OpenNRE和Ernie-UIE进行知识三元组抽取,提取3,999条实体间关系及属性。
  • 利用上下文嵌入和SimBERT知识嵌入完成实体消歧与合并。
  • 使用Neo4J构建并可视化知识图谱,展示实体、关系及属性信息。
Research on Financial Big Data based on Machine Learning 基于机器学习的金融大数据研究
  • Collected CSI 300 index stock data; performed data preprocessing and feature engineering.
  • Applied deep learning models (RNN, LSTM, CNN, Transformer) for time-series analysis and stock price forecasting.
  • Trained and evaluated models, plotted learning curves, and compared model performance.
  • 采集沪深300指数股票数据,进行数据预处理和特征工程。
  • 使用RNN、LSTM、CNN、Transformer等深度学习模型进行时序分析和股价预测。
  • 完成模型训练与测试,绘制学习曲线,对比各模型性能。
Large Language Models in Software Engineering Development Lifecycle 大语言模型在软件工程生命周期中的应用
  • Conducted a comprehensive survey on LLM applications across the software engineering lifecycle (planning, design, development, testing, maintenance).
  • Explored LLM support for requirements extraction, code generation, test case creation, code review, and vulnerability repair.
  • Analyzed potential benefits and challenges of LLM-assisted software engineering.
  • 对LLM在软件工程全生命周期(规划、设计、开发、测试、维护)中的应用进行系统调研。
  • 探讨LLM在需求提取、代码生成、测试用例创建、代码审查及漏洞修复等方面的支持能力。
  • 分析LLM辅助软件工程的潜在优势与挑战。
Intelligent Vessel Monitoring System 智能船舶监控系统
  • Developed a comprehensive real-time vessel information and trajectory management platform with monitoring and status analysis features.
  • Collected marine vessel data; preprocessed features and stored data using Django's database backend.
  • Designed CRUD functionality and implemented chart visualization with front-end/back-end interaction, enabling real-time query.
  • 开发实时船舶信息与轨迹管理平台,实现船舶运行状态监控与分析。
  • 采集海洋船舶数据进行预处理,使用Django数据库进行存储管理。
  • 设计增删改查功能,通过前后端交互实现图表可视化与实时查询。
AI Algorithm Simulation for Texas Hold'em 德州扑克AI算法仿真
  • Implemented a reinforcement learning algorithm using D3QN and Monte Carlo methods to simulate Texas Hold'em and calculate win/loss ratios.
  • Achieved AI-based strategy selection in Texas Hold'em gameplay.
  • 使用D3QN与蒙特卡洛方法实现强化学习算法,模拟德州扑克并计算胜负率。
  • 实现基于AI的策略选择,提升德州扑克游戏决策能力。
Research on Dynamic Social Networks in Financial Fraud Detection & Prevention 动态社会网络在金融欺诈检测中的研究
  • Processed docking point features and enriched graph nodes/edges through feature engineering.
  • Applied multiple GNN variants (GCN, GAT, GAT-V2, SAGE, EMBSAGE) for financial fraud detection.
  • Compared accuracy and efficiency across models.
  • 处理对接点特征,通过特征工程丰富图节点和边的属性信息。
  • 使用多种图神经网络变体(GCN、GAT、GAT-V2、SAGE、EMBSAGE)进行金融欺诈检测。
  • 比较分析各模型的准确率与效率。
Prediction of Criminal Sentence Reduction Length 罪犯减刑时长预测
Phase 1 — CCF Big Data & Computing Intelligence Contest  |  Tongji University 第一阶段 — CCF大数据与计算智能大赛  |  同济大学
  • Applied Ensemble Learning with text processing; fine-tuned pre-trained ERNIE model; extended training set via text-translation data augmentation.
  • Achieved Rank 7 / 381 teams in the 10th CCF Big Data & Computing Intelligence Contest — National Division.
  • 采用集成学习结合文本处理,微调预训练ERNIE模型;通过文本翻译进行数据增强扩充训练集。
  • 获第十届CCF大数据与计算智能大赛全国赛道全国第7名(共381支队伍)
Phase 2 — University of Michigan, Ann Arbor  |  Supervisor: Prof. David Jurgens 第二阶段 — 密歇根大学安娜堡  |  指导教师:David Jurgens 教授
  • Extended the task to U.S. federal sentencing downward departure prediction using the Monitoring of Federal Criminal Sentences dataset; designed a full preprocessing pipeline including departure filtering, feature consolidation, and ordinal target transformation.
  • Compared 6 models spanning classical ML (ElasticNet, XGBoost, CatBoost) and deep learning (LSTM, BERT, ERNIE); BERT achieved the best overall performance by leveraging contextualized representations of synthesized fact text.
  • 将任务扩展至美国联邦量刑下行偏离预测,使用联邦刑事量刑监测数据集,设计完整预处理流程,包括案例筛选、特征整合与目标变量序数化转换。
  • 对比6种模型(ElasticNet、XGBoost、CatBoost;LSTM、BERT、ERNIE),通过将结构化特征转化为事实文本,BERT凭借上下文感知表示取得最佳整体效果。

Internship 实习经历

Baidu — Software Engineer 百度 — 开发工程师
Beijing, China 中国·北京
  • Wrote functional test cases and conducted functional testing of Baidu Cloud Compute (BCC) in 3212 Haiguang and Haiyang intelligent computing environments.
  • Carried out performance testing (memory, CPU, disk, system) and compared performance across different versions of Haiguang hardware.
  • Studied the system architecture, API, and back-end development principles of cloud computing servers.
  • 编写功能测试用例,在3212海光与海洋智能计算环境中对百度云计算(BCC)进行功能测试。
  • 开展内存、CPU、磁盘、系统等维度的性能测试,比较不同版本海光硬件的性能差异。
  • 学习云计算服务器的系统架构、API接口及后端开发原理。

Honors & Awards 荣誉与奖项

  • Second Prize, The 16th "Chinese Society for Electrical Engineering Cup" National University Students Electrical Math Modeling Competition05/2024
  • Second Prize (Non-mathematical Category), The 14th National Student Mathematics Competition01/2023
  • Rank 7/381, The 10th CCF Big Data & Computing Intelligence Contest — National Division, "Prediction of Criminal Sentence Reduction Length"12/2022
  • Second Prize (Non-mathematical Category), The 14th Shanghai Mathematics Competition for College Students (Higher Education Community Cup)12/2022
  • Third Prize, China Undergraduate Mathematical Contest in Modeling — Shanghai12/2021
  • Third Prize (Non-Physics A), The 38th National Physics Competition for College Students — Shanghai12/2021
  • Second Prize in Non-Physics Category, Tongji University Physics Competition10/2021
  • Third Prize of Undergraduate Group, The 11th MathorCup College Mathematical Modeling Challenge06/2021
  • Third Prize, Tongji University Mathematical Contest in Modeling06/2021
  • 二等奖,第十六届"中国电机工程学会杯"全国高校电工数学建模竞赛2024.05
  • 二等奖(非数学专业组),第十四届全国大学生数学竞赛2023.01
  • 全国第7名(381支队伍),第十届CCF大数据与计算智能大赛 — "罪犯减刑时长预测"赛道2022.12
  • 二等奖(非数学专业组),第十四届上海市高校大学生数学竞赛(高教社杯)2022.12
  • 三等奖,中国本科数学建模竞赛上海赛区2021.12
  • 三等奖(非物理专业A类),第三十八届全国高中物理竞赛上海赛区2021.12
  • 二等奖(非物理专业组),同济大学物理竞赛2021.10
  • 本科生组三等奖,第十一届MathorCup高校数学建模挑战赛2021.06
  • 三等奖,同济大学数学建模竞赛2021.06

Skills & Hobbies 技能与爱好

Computer Skills 计算机技能 C++, C, Python, MySQL, C#, SQL Server, JavaScript, HTML, CSS Languages 语言能力 Chinese (native)  |  English (proficient, IELTS 7.5) 中文(母语) | 英语(熟练,IELTS 7.5) Hobbies 爱好 Calligraphy (Level 10)  |  Accordion (Level 7) 书法(十级) | 手风琴(七级)