Abstract: Large Language Models (LLM) empower many modern software systems, and are required to be highly accurate and reliable. Evaluating LLM poses challenges due to the high costs of manual ...