CT-GenAI 시험 - ISTQB실제시험문제와 답 - 125문항

Question No : 1

A prompt begins: “You are a senior test manager responsible for risk-based test planning on a payments platform.” Which component is this?

A.Instruction
B.Context
C.Role
D.Constraints

정답:
Explanation:
In structured prompt engineering, the Role component (also known as a Persona) is used to set the perspective, expertise, and tone of the LLM’s response. By assigning the role of a "senior test manager," the tester instructs the model to adopt the specific domain knowledge, vocabulary, and professional standards associated with that position. This technique is highly effective because LLMs are trained on vast datasets containing diverse professional documents; invoking a specific persona helps the model narrow its "latent space" to retrieve information relevant to that specific field. For instance, a senior test manager persona will prioritize risk management, resource allocation, and high-level strategy, whereas a "junior developer" persona might focus more on syntax and local unit tests. While Context (Option B) provides the background of the project and Instruction (Option A) defines the specific task to be performed, the Roleserves as the foundation for how those instructions are interpreted. This ensures the generated testware aligns with the expected professional seniority and organizational maturity required for high-stakes environments like a payments platform.

Question No : 2

Which technique MOST directly reduces hallucinations by grounding the model in project realities?

A.Provide detailed context
B.Randomize prompts each run
C.Rely on generic examples only
D.Use longer temperature settings

정답:
Explanation:
Hallucinations―where an LLM generates factually incorrect or nonsensical information―occur primarily when the model lacks sufficient specific information and "fills in the gaps" using probabilistic patterns from its training data. The most effective mitigation strategy is "grounding," which involves providing the model with detailed, project-specific context. By including technical specifications, existing API schemas, business rules, and identified constraints within the prompt, the tester restricts the model’s operational space to the "project realities." This ensures the model does not have to guess or improvise details about the System Under Test (SUT). In contrast, randomizing prompts (Option B) or relying on generic examples (Option C) increases the likelihood of inconsistent and inaccurate outputs. Furthermore, using "longer" or higher temperature settings (Option D) actually encourages creativity and randomness, which is the opposite of the precision required for testing and significantly increases the risk of hallucinations. Therefore, rich contextual grounding is the technical foundation for reliable AI-assisted test analysis.

Question No : 3

When an organization uses an AI chatbot for testing, what is the PRIMARY LLMOps concern?

A.Maximizing scalability by deploying larger cloud-based LLM clusters
B.Maintaining data privacy and minimizing security risks from external services
C.Achieving faster responses by reducing model checkpoints and updates
D.Focusing primarily on user experience improvements and response formatting

정답:
Explanation:
LLMOps (Large Language Model Operations) is the set of practices used to manage the lifecycle of LLMs in production. When an organization integrates an AI chatbot into its test processes, the primary operational concern is maintaining data privacy and minimizing security risks, especially if using third-party APIs. Unlike traditional software, LLMs are "black boxes" that process every piece of data sent to them. A core LLMOps responsibility is ensuring that any "Prompt Data" (code, requirements, or logs) is not used by the provider to train their public models and that the communication channels are fully secured. While scalability (Option A) and latency (Option C) are important technical metrics, they are secondary to the catastrophic legal and reputational risk of a data breach. LLMOps in a testing context involves implementing data masking tools, monitoring for "Prompt Injection" attacks, and managing the "Grounding" data in vector databases to ensure it remains current and protected. This ensures the AI remains a safe and reliable asset within the enterprise testing ecosystem, rather than a liability for the organization’s intellectual property.

Question No : 4

Which setting can reduce variability by narrowing the sampling distribution during inference?

A.Increasing temperature
B.Increasing learning rate
C.Lowering temperature
D.Using a larger context window

정답:
Explanation:
In the context of LLM inference, Temperature is a hyperparameter that controls the randomness or "creativity" of the model's output. When the temperature is set high, the model's probability distribution is "flattened," meaning it is more likely to select less-probable tokens, leading to more diverse and sometimes unpredictable text. For software testing, where precision and repeatability are paramount, lowering the temperature (Option C) is the standard practice. A temperature of 0.0 makes the model "deterministic," meaning it will consistently choose the token with the highest probability. This narrows the sampling distribution and significantly reduces variability between runs. While a larger context window (Option D) allows the model to process more information, it does not directly control the randomness of token selection. Similarly, the "learning rate" (Option B) is a parameter used during the training or fine-tuning phase, not during inference. For generating test cases or scripts that must follow strict logic, a lower temperature ensures that the model remains focused and produces consistent results.

Question No : 5

Your team needs to generate 500 API test cases for a REST API with 50 endpoints. You have documented 10 exemplar test cases that follow your organization's standard format. You want the LLM to generate test cases following the pattern demonstrated in your examples.
Which of the following prompting techniques is BEST suited to achieve your goal in this scenario?

A.Prompt chaining
B.Few-shot prompting
C.Meta prompting
D.Zero-shot prompting

정답:
Explanation:
Few-shot promptingis the technique of providing a few examples (exemplars) within the prompt to demonstrate the desired task and output format to the LLM. In this scenario, providing 10 existing, high-quality test cases acts as a "pattern" for the model to follow. This is significantly more effective than "Zero-shot prompting" (Option D), where the model is given a task without examples and may deviate from the specific organizational format required (e.g., specific JSON structures or assertion styles). While "Prompt chaining" (Option A) is useful for breaking down complex tasks into sub-tasks, the primary need here is pattern recognition and replication, which is the core strength of Few-shot learning. "Meta prompting" (Option C) involves having the AI write the prompt itself, which is unnecessary when the team already has clear examples. By using Few-shot prompting, the tester "conditions" the model's latent space to prioritize the provided format, ensuring that all 500 generated test cases maintain consistency with the HTTP methods, headers, and assertion logic defined in the exemplars.

Question No : 6

A prompt section states: “Web checkout module v3.2; focus on coupon application; existing regression suite IDs T-112―T-150; recent defect ID BUG-431.”
Which component is this?

A.Input data
B.Constraints
C.Instruction
D.Output format

정답:
Explanation:
In a structured prompt, "Input Data" (or Reference Data) provides the specific subject matter that the model must process or analyze. The statement provided consists of factual identifiers and specific entities related to the System Under Test (SUT), such as the version number, the specific module name, reference IDs for existing tests, and a specific defect record. These elements serve as the raw material for the LLM's task. This differs from "Instructions" (Option C), which would be the command (e.g., "Analyze the following..."), or "Constraints" (Option B), which would define the boundaries of the task (e.g., "Do not include T-115"). "Output Format" (Option D) would define how the result should look (e.g., "Provide a JSON list"). By clearly labeling this section as Input Data, the tester helps the model distinguish between the "what" (the data) and the "how" (the instructions), which is a key principle of structured prompt engineering aimed at improving the accuracy of AI-generated analysis.

Question No : 7

You must use GenAI to perform test analysis on a payments module with finalized requirements: (1) generate test conditions, (2) prioritize by risk, (3) check coverage gaps.
Which sequence best applies prompt chaining?

A.Generate prioritized conditions in one shot ―> verify coverage
B.Generate conditions ―> prioritize by risk ―> map to requirements to find gaps
C.Detect requirement defects ―> generate conditions ―> prioritize
D.Prioritize requirements ―> generate conditions ―> review defects

정답:
Explanation:
Prompt Chaining is a technique where a complex task is decomposed into several smaller, sequential steps, where the output of one step serves as the context or input for the next. This is far more reliable than a "one-shot" approach (Option A) because it reduces the cognitive load on the LLM and allows for intermediate verification. In the scenario of test analysis, the most logical and effective chain begins by extracting discrete test condition sfrom the raw requirements. Once these conditions are established, the next "link" in the chain is to prioritize them based on risk (impact and likelihood), which requires the model to reason specifically about the importance of each condition. The final step is to map these prioritized conditions back to the original requirements to identify any "coverage gaps." This systematic flow (Option B) mirrors the professional test analysis process defined in the ISTQB/CT-GenAI standards. By following this sequence, the tester ensures that the AI-generated output is logically derived and thorough, providing a clear "audit trail" from the initial requirement to the final prioritized test suite.

Question No : 8

A team notices vague, inconsistent LLM outputs for the same story for two different prompts.
Which technique BEST helps choose the stronger wording among two prompt versions using predefined metrics?

A.A/B testing of prompts
B.Iterative prompt modification
C.Output analysis
D.Integrating user feedback

정답:
Explanation:
A/B testing, also known as split testing, is a systematic empirical method used to compare two versions of a prompt (Version A and Version B) to determine which one performs better based on predefined evaluation metrics. In the realm of LLMs, where outputs can be stochastic (probabilistic), A/B testing is essential for mitigating inconsistency. When a team encounters vague or varying results for a user story, simply modifying the prompt iteratively (Option B) may improve the result but does not provide a statistical or objective basis for why one version is superior. Byrunning A/B tests, testers can evaluate prompts against specific KPIs such as accuracy, relevance, format adherence, or the absence of hallucinations. This process involves sending the same input data through both prompt versions multiple times and scoring the outputs. The version that consistently yields the "stronger wording" or more precise testware is then selected as the production standard. This data-driven approach is a cornerstone of prompt engineering in professional environments, ensuring that the most effective linguistic structures are utilized to maximize the model's performance and reliability.

Question No : 9

Which consideration BEST aligns LLM choice with organizational goals in a GenAI testing strategy?

A.Select models with maximum vendor visibility and strong online presence to ensure reliability
B.Select open-source models prioritizing creativity over compliance or performance consistency
C.Select broad-coverage models offering diverse functionalities for various test scenarios
D.Select LLMs aligned to measurable test outcomes, compatible with current infrastructure

정답:
Explanation:
A mature GenAI strategy for software testing must move beyond "hype" and focus on tangible value and operational feasibility. Selecting an LLM based on measurable test outcomes (such as reduction in test design time, increase in defect detection, or script accuracy) ensures that the AI investment directly supports the organization’s Quality Assurance goals. Furthermore, the model must be compatible with current infrastructure. This includes considerations for data security (on-prem vs. cloud), API integration capabilities, and cost-per-token efficiency. While vendor visibility (Option A) can be a factor, it is not a guarantee of task-specific performance. Prioritizing creativity over compliance (Option B) is highly risky for testing, where precision and policy adherence are paramount. Similarly, while broad functionality (Option C) is useful, it often results in "jack-of-all-trades" models that may not perform as well as specialized or instruction-tuned models on specific testing tasks. Strategic alignment requires a balance between model performance, organizational security requirements, and clear KPIs.

Question No : 10

Which statement about data privacy risks in GenAI-assisted testing is INCORRECT?

A.Some GenAI tools may store/process data without explicit consent
B.GenAI outputs can accidentally reveal sensitive information present in inputs
C.Strict GDPR compliance eliminates all privacy risk
D.Using GenAI without regulatory compliance can lead to legal exposure

정답:
Explanation:
The statement that "Strict GDPR compliance eliminates all privacy risk" is incorrect because compliance is a legal and procedural framework, not a foolproof technical shield against all possible risks. Even within a GDPR-compliant environment, risks such as "model inversion" attacks, accidental data leakage through "membership inference," or the unintentional generation of Sensitive Personally Identifiable Information (SPII) can still occur. Data privacy in GenAI is complex because LLMs function by processing and sometimes retaining patterns from the data they are fed. As noted in the CT-GenAI syllabus, some tools may process data in ways that are not fully transparent (Option A), and outputs can inadvertently include snippets of sensitive data used during the prompting or training phase (Option B). Furthermore, failing to adhere to regulations like GDPR or the EU AI Act certainly leads to legal and financial exposure (Option D). Therefore, while compliance frameworks significantly mitigate risk, they do not "eliminate" it; a robust GenAI strategy requires ongoing technical controls, data masking, and human oversight to manage residual privacy threats effectively.

Question No : 11

What is a key data-related aspect when defining a GenAI strategy for testing?

A.Neglect legacy data sources as they provide limited immediate relevance to testing tasks
B.Prioritize accurate and relevant input data secured through defined quality procedures
C.Aggregate data from all available organizational repositories without filtration
D.Use only auto-generated synthetic data to avoid dependency on enterprise repositories

정답:
Explanation:
A successful Generative AI strategy for testing is heavily dependent on the quality of the data used for grounding (RAG) and prompting. The principle of "Garbage In, Garbage Out" is magnified with LLMs; therefore, a key strategic pillar is the prioritization of accurate, relevant, and high-quality input data. This involves establishing defined quality procedures to ensure that the requirements, codebases, and historical defect logs fed into the model are "clean" and representative of the current system state. Strategy must avoid the "unfiltered" approach (Option C), as including contradictory or obsolete data can lead to hallucinations or irrelevant test cases. While synthetic data (Option D) is a powerful tool for privacy, it cannot entirely replace the nuanced reality found in secured enterprise data. Furthermore, legacy data (Option A) often contains valuable insights for regression testing. Consequently, the strategy should focus on building a robust data pipeline that ensures only verified, contextually appropriate information is utilized, thereby increasing the reliability of AI-generated testware and ensuring it aligns with the organization's quality standards.

Question No : 12

An LLM prioritizes tests using likelihood X impact but ranks a trivial tooltip change above a payment failure.
What defect does this MOST LIKELY show?

A.No defect; this is acceptable
B.Reasoning error in risk calculation logic
C.Hallucination
D.Dataset bias toward UI features

정답:
Explanation:
This scenario describes a failure in the model's ability to apply logical weight to specific domain concepts, specifically in the context of Risk-Based Testing (RBT). When an LLM ranks a low-impact UI element (a tooltip) higher than a critical functional failure (payment processing), it demonstrates a "Reasoning error in risk calculation logic." While LLMs can follow formulas like $Risk = Likelihood \times Impact$, they may lack the deep semantic understanding of "Impact" within a specific business domain unless explicitly guided. This is not necessarily a hallucination (Option C), as the model isn't necessarily inventing facts, but rather misapplying the logic of prioritization. It is also distinct from dataset bias (Option D), which would involve a systematic skewing across all outputs. In professional testing, this type of error highlights the necessity of "human-in-the-loop" verification. Testers must review AI-generated prioritizations to ensure that the logical deductions align with the actual business risk and technical criticality of the features being tested.

Question No : 13

Which option BEST differentiates the three prompting techniques?

A.Few-shot = no examples; Chaining = single prompt; Meta = disable iteration
B.Meta = step decomposition; Chaining = zero-shot only; Few-shot = manual optimization
C.Chaining = give examples; Few-shot = break tasks; Meta = manual edits only
D.Few-shot = examples; Chaining = multi-step prompts; Meta = model helps draft/refine prompts

정답:
Explanation:
Differentiating between prompting techniques is essential for a tester to select the right tool for the task. Few-shot prompting is characterized by providing the model with a few examples of inputs and desired outputs, allowing it to learn the pattern and format. Prompt Chaining involves breaking a complex task into a sequence of smaller, interconnected prompts, where the output of one step becomes the input for the next (e.g., first extract requirements, then generate test cases from those requirements).Meta-prompting is a more advanced technique where the user asks the LLM to help design, write, or refine the prompt itself―essentially using the AI as a "prompt engineer" to optimize the instructions.
Option D correctly identifies these core characteristics. Options A, B, and C contain fundamental mischaracterizations: for instance, Few-shot requires examples (contradicting A), and Chaining is the opposite of a single prompt (contradicting A). Mastering these distinctions allows testers to move from simple "chatting" to sophisticated AI orchestration that can handle complex, multi-stage testing workflows with high reliability.

Question No : 14

What BEST protects sensitive test data at rest and in transit?

A.Rely on obfuscation instead of encryption
B.Enforce role-based access controls
C.Disable TLS and rely on VPN only
D.Use public file shares with read-only links

정답:
Explanation:
Data security is a paramount concern when using GenAI in testing, as test environments often contain sensitive business logic or PII (Personally Identifiable Information). To protect this data "at rest" (stored in databases or vector stores) and "in transit" (being sent to the LLM), a combination of technical controls is required. Role-Based Access Control (RBAC) is a fundamental security pillar that ensures only authorized individuals or services can access specific datasets or trigger GenAI workflows. This prevents unauthorized users from feeding sensitive enterprise data into public AI models. While encryption (omitted in Option A as an alternative to obfuscation) and TLS (falsely suggested to be disabled in Option C) are essential technical layers for protecting data in transit, RBAC provides the organizational "gatekeeping" necessary to manage who can interact with the AI system. In a professional GenAI strategy, testers must ensure that the tools they use adhere to strict access policies, ensuring that the "Input Data" used for prompting remains within the secured organizational boundary and is not leaked to unauthorized entities or public training sets.

Question No : 15

Which standard specifies requirements for managing AI systems within an organization, supporting consistent GenAI use in testing?

A.ISO/IEC 42001:2023
B.NIST AI RMF 1.0
C.ISO/IEC 23053:2022
D.EU AI Act

정답:
Explanation:
ISO/IEC 42001:2023is the international standard for an AI Management System (AIMS). It is designed to help organizations develop, provide, or use AI systems responsibly by providing a certifiable framework of requirements and controls. In a software testing context, this standard is vital for establishing governance, ensuring that GenAI tools are used consistently and ethically across the lifecycle. NIST AI RMF 1.0 (Option B) is a highly respected framework, but it is a set of voluntary guidelines for managing risk, not a "requirement standard" for a management system. ISO/IEC 23053:2022 (Option C) provides a general framework for AI using machine learning but lacks the comprehensive "management system" scope found in 42001. Finally, the EU AI Act (Option D) is a regulation (law), not a technical standard. For a test organization looking to align its GenAI strategy with international best practices and achieve formal certification, ISO/IEC 42001 is the definitive standard to follow, as it covers the organizational processes, data handling, and risk management necessary for high-quality AI operations

ISTQB CT-GenAI 시험