職位描述
ONNXOPENVINOTVMLLAMA.CPPVLLM
Advisory Engineer, Large Model On-Device Inference (C++)
Job Description:
? Responsible for the architecture design, development, maintenance, optimization, and innovative exploration of AI on-device inference engines.
? Responsible for verifying and analyzing product stability, performance, and accuracy.
? Responsible for module or OP optimization and acceleration
Requirements:
? Master’s degree or higher in Computer Science, Networking, Communications, or related fields;
? Familiar with operating system principles, with extensive programming experience on Windows/Linux systems and product development experience.
? Proficient in C/C++ programming, including modern standards such as C++11; familiar with STL; knowledgeable in common scripting languages such as Shell and Python.
? Solid understanding of network communication principles; proficient in developing applications using TCP/UDP/HTTP; experienced with mainstream network programming models.
? Familiar with at least one inference engine in terms of application, underlying principles, and customized development, including inference engine architecture, operators, and model conversion tools.
? Hands-on development experience with llama.cpp is a plus
? Prior experience in parallel computing frameworks such as CUDA or SYCL is a plus