Loading...

AI QA Engineer

Fulltime
Local
up to 57K
Loading...
logo company
Company:

Details:

Job Information

Details

Technologies
CypressPlaywrightSoftware QALLMOps
Salary
up to 57K
About

AI QA Engineer

Barcelona (hybrid)
Permanent contract
Salary: up to 57K


Role Overview

We’re seeking an AI QA Engineer who can merge strong quality assurance practices with next-generation AI evaluation. This role focuses on guaranteeing the reliability, safety, and correctness of code produced by both humans and AI systems. You will build testing frameworks that validate AI-generated pull requests, track model performance, and ensure that AI-assisted development meets rigorous engineering standards.

Working alongside software engineers and machine learning specialists, you’ll help define how quality is measured in an environment where humans and AI collaborate. Your work will shape the benchmarks, datasets, and evaluation methodologies used to improve AI coding agents over time.

Key Responsibilities

  • Develop and execute test strategies for code produced by humans as well as AI systems.

  • Design regression testing approaches tailored to AI-assisted software development workflows.

  • Partner with ML engineers to assess model accuracy, failure patterns, and quality indicators.

  • Monitor automated pipelines that incorporate AI scoring and validation steps.

  • Build and maintain benchmark datasets used for fine-tuning (SFT/DPO) based on QA findings.

  • Establish and update quality thresholds for PR-generation models and AI-assisted code suggestions.

  • Foster a QA discipline centered on metrics, continuous learning, and measurable model improvement.

Requirements

  • At least 3 years of experience in QA, focused on testing web or mobile applications.

  • Strong understanding of testing methodologies and automation tools (Cypress, Playwright, etc.).

  • Experience integrating tests into CI/CD workflows and modern delivery pipelines.

  • Ability to interpret metrics, detect trends, and assess AI behavior against quality standards.

  • Hands-on experience testing APIs with tools like Postman or Charles Proxy.

  • Interest in evaluating large language models and participating in model-monitoring processes.

Nice to Have

  • Familiarity with LLMOps platforms (Weights & Biases, LangSmith, PromptLayer).

  • Experience proposing custom evaluation criteria for AI-generated outputs.

  • Scripting knowledge in Python or JavaScript to build or extend QA utilities.

  • Understanding of prompt engineering concepts or AI-generated test creation.

 

Loading...