Hi, I'm Yue Jiang.

Programmer, System Builder, Cat Lover

A passionate computer system developer. I’m excited to build high-performance large-scale systems.

About Me

I am a Master’s student at Carnegie Mellon University. I have a solid background in system-level programming and software development with multiple internship experiences in different industries, e.g. Tech Company, Prop Trading Firm, AI startup. Here are a few technologies I've been working with recently:
  • Storage System
  • Database System
  • Distributed System
  • Computer Networks
  • Software Design
  • Cloud Computing
  • Machine Learning

Experience

Software Development Intern - Splunk Inc
June 2023 - August 2023

I worked on Ingest Actions: a key feature for data masking, filtering, and routing.

  • Developed the routing of less frequently used data in Parquet format from Splunk to S3 to help customers reduce storage costs. The Parquet support is also required by over 50% customers using Splunk Federated Search-S3.
  • Provided 2 new license supports for Ingest Actions, leading to an estimated 40% increase in its exposure to customers.
  • Participated in QA testing process for Ingest Actions, identifying and fixing 2 critical bugs.
C++ Development Intern - Higgs Asset
Feburary 2022 - August 2022

I worked on OpsAdapter, a MarketData & Trading Interfaces System for trading and supporting the quantitative research team.

  • Built interfaces to fetch market data from various Exchanges and parse it into unified data formats with low latency.
  • Developed trading interfaces between Exchanges and Higgs, in support of different kinds of queries, order placement, order cancellation, and responses to asynchronous callbacks from Exchanges under various situations.
  • Utilized multi-threading, concurrency control, and modern features of C++ to speed up the interfaces, reducing loss rate of network packets from 10% to 3%, enabling traders to capture trading signals and respond to them immediately.
Software Development Intern - Megvii
July 2020 - September 2020

Hubble, a Data Management System:

  • Implemented the backend of Hubble to support data storage management with Django framework and various development tools such as Celery, Supervisor, Logging, Sentry, Anaconda, etc.
  • Set up ELK with Docker for system log management, committing automatic analysis of system performance.
Software Development Intern - Megvii
July 2019 - September 2019

Sisyphus, a Data Transfer System:

  • Implemented core functions of Sisyphus and developed data transfer scripts compatible with various scenarios.
  • Designed and developed the frontend of Sisyphus and connected it to the backend with APIs.

Education

2022 - Present
Master of Information Technology Strategy
Carnegie Mellon University
GPA: 3.92 out of 4.00

Courseworks

  • 15-746 Storage Systems
  • 15-513 Introduction to Computer Systems
  • 10-601 Introduction to Machine Learning
  • 14-741 Introduction to Information Security
  • 15-640 Distribued Systems
  • 15-719 Advanced Cloud Computing
  • 17-514 Principles of Software Construction
  • 15-641 Computer Networks
2017 - 2021
Bachelor of Science in Computer Science
Nankai University
GPA: 90.8 out of 100

Honors

  • National Scholarship 2018 & 2019

Courseworks

  • Principles of Compilers
  • Computer Networks
  • Operating Systems
  • Introduction to Algorithm
  • Data Structures
  • Big Data Analytics and Applications
  • Principle of Information Retrieval System

Projects

C++/C AWS File System Deduplication Snapshot Cache
CloudFS
A hybrid file system spanning SSD and cloud storage.
  • Implemented a hybrid file system with a local SSD and a cloud storage service similar to Amazon S3, maintaining small objects as well as metadata on the SSD and large objects on the cloud storage.

  • Applied block-level deduplication based on Rabin Fingerprinting to reduce cloud storage cost by around 50%.

  • Supported generation of consistent CloudFS snapshots (backups) which support restoring the file system to a previous state or altering the state of the file system.

  • Leveraged the spare capacity on the SSD as a cache for cloud backed data based on Least Recently Used (LRU) policy. This improved performance and further reduced cloud service costs from average $37 to $18.59 under 10 workloads.

C++/C Database Hash Table Concurrency Control Query Execution
BusTub
A Relational Database Management System.
  • Implemented a buffer pool manager responsible for moving physical pages back and forth from main memory to disk, supporting databases that are larger than the amount of memory available to the system.

  • Implemented a concurrent disk-backed hash table with extendible hashing scheme for fast data retrieval.

C++/C SSD Garbage Collection Firmware
MyFTL
A Flash Translation Layer of an emulated Solid State Drive.
  • Designed a hybrid log-block mapping scheme to translate read/write requests on logical blocks into low-level operations on SSDs, and garbage collection policies such as LRU and Cost-Benefit proposed in log-structured file system.

  • Proposed a dynamic mapping scheme inspired by log-structured file systems and a Cost-Endurance-Benefit garbage collection policy, optimizing the performance in wear-leveling by 10x and decreasing write amplification rate by 30%.

Java RPC Distributed Protocol Distributed File System
Distributed Systems
Building blocks for a distributed system.
  • Built an RPC system to allow concurrent remote file operations based on TCP.

  • Designed a check-on-use caching protocol for a distributed file system using session semantics.

  • Developed distributed transactions with two-phase commit, utilizing logging to persistent storage for failure recovery.

Get in Touch

Feel free to email me!