Using LLMs to Bridge the Gaps in QA Test Plans at Firefox
The study explores using LLMs to automatically generate test plans for Firefox features, aiming to reduce the manual effort and blind spots in traditional QA planning.
Overview
As software systems grow in scale and complexity, ensuring their reliability becomes increasingly challenging due to factors like diverse platforms, rapid release cycles, and evolving user needs. This places greater demands on software quality assurance (QA), where comprehensive test planning is crucial. However, in addition to this process being manual and time-consuming, skilled QA engineers may potentially overlook critical scenarios. To bridge this gap, we used an LLM, GPT-4 Turbo, to generate test plans for eight Firefox features, evaluating them for novelty, validity, and relevance. Our results showed that 27% of the test cases in LLM-generated plans surfaced previously missed test scenarios the QA team deemed valuable, while 50.5% replicated existing ones. Although 22.5% were invalid and out of scope, our approach shows potential for improving test coverage. In this paper, we share our experience with this methodology and offer insights for SE/QA practitioners integrating LLMs into their workflows.