Integrating SW stack with DNN frameworks (e.g., PyTorch) to support eager mode, model export path, custom kernel interface and DNN compiler.
Collaborate with cross-teams, including compiler and algorithm teams, to achieve the native integration between SW stack and DNN frameworks.
Design and implement DNN model graph’s pre-processing modules, involving sub-graph pattern matching/replacement, and graph partitioning, to enable the model parallelism and optimize the end-to-end inference performance.
Participate in designing and implementing the user-facing interface of LLM stack.
Write and maintain API references and development documentation.
Minimum Qualifications
Bachelor’s degree in Computer Science or equivalent work experience.
Experience with ML/DNN frameworks, such as TensorFlow, PyTorch, or JAX.
Strong programming skills with 3+ years of experience in Python.
Excellent communication skills for gathering and clarifying requirements.
Ability to identify inefficiencies in programs and processes, with a proactive approach to proposing sustainable solutions.
Preferred Qualifications
Deep understanding of transformer-based DNN model architecture.
2+ years of experience working in AI research.
Familiarity with PyTorch 2.0 technologies (e.g., TorchDynamo, TorchInductor) or DNN compiler technologies (e.g., Triton, MLIR).
Proficient programming skills, particularly in CUDA or Rust.
Experience developing production-grade software for customers.
Understanding of LLM serving optimization techniques, such as mixture of experts, paged attention, speculative decoding, chunked prefill, or continuous batching
Experience with Hugging Face Transformes, PEFT, TGI, vLLM, or TensorRT-LLM
Proven track record of contributing to open-source projects.
Knowledge of testing and CI/CD pipelines, such as Jenkins or similar tools.