Developing AI-Enabled Database Solutions 온라인 연습
최종 업데이트 시간: 2026년03월30일
당신은 온라인 연습 문제를 통해 Microsoft DP-800 시험지식에 대해 자신이 어떻게 알고 있는지 파악한 후 시험 참가 신청 여부를 결정할 수 있다.
시험을 100% 합격하고 시험 준비 시간을 35% 절약하기를 바라며 DP-800 덤프 (최신 실제 시험 문제)를 사용 선택하여 현재 최신 61개의 시험 문제와 답을 포함하십시오.

정답: 
Explanation:
For the vector portion, the correct choice is VECTOR_DISTANCE and order by distance ascending. The
requirement is to build a combined weighted formula using the actual vector distance. Microsoft documents that VECTOR_DISTANCE returns the exact distance between two vectors. Since lower distance means greater similarity, ascending distance is the right direction for ranking.
VECTOR_SEARCH is for ANN retrieval, but this hotspot specifically asks for a weighted formula based on distance, so VECTOR_DISTANCE is the appropriate operator.
For the keyword portion, the correct choice is CONTAINSTABLE on description and return ranked matches. Microsoft documents that CONTAINSTABLE returns a RANK column from 0 through 1000, which is exactly what is needed for weighted scoring in a hybrid formula.
For the final ranking expression, the best choice is order by (distance * 0.6) + ((1.0 - RANK/1000.0) * 0.4). This works because vector distance is a lower-is-better metric, while full-text RANK is a higher-is-better metric. Dividing RANK by 1000 normalizes it to the documented range, and subtracting from 1.0 converts it into a lower-is-better term so both components can be combined consistently in one ascending score. This final step is a sound inference based on Microsoft’s documented distance semantics and full-text rank range.
정답:
Explanation:
The best use case for RAG is answering user questions based on company-specific knowledge. Microsoft defines RAG as a pattern that augments a language model with a retrieval system that provides grounding data at inference time, which is exactly what you need when responses must be based on the latest transactional and reference data, must avoid retraining/fine-tuning, and should be able to include citations or references to source data.
The other options do not fit as well:
summarizing free-form user input does not inherently require retrieval from DB1, training a custom model contradicts the requirement to avoid retraining/fine-tuning, generating marketing slogans is a creative generation task, not a grounding-and-citation scenario. RAG is specifically strong when answers must come from your organization’s own changing knowledge.

정답: 
Explanation:
The first statement is Yes. Embeddings are used to represent the semantic meaning of content, and vector search is for conceptually similar matches over that content. Here, the semantically meaningful fields are product_name, category, and description. Using those together supports natural-language search, while brand and price can be handled as structured filters outside the embedding itself. This is an inference from Microsoft’s guidance that vector search works over embeddings representing content meaning, while filters remain part of the nonvector query pipeline.
The second statement is No. price changes multiple times per day and is a structured numeric attribute, not stable semantic content. Since the requirement already says customers can apply structured filters for brand and price, price does not need to be embedded into the text. Embedding volatile numeric values would also make embeddings stale faster without improving the semantic-search objective. This is again an inference grounded in Microsoft’s distinction between vector similarity over content and filtering/sorting over nonvector fields.
The third statement is Yes. In SQL Server’s vector type, the default underlying base type is float32 unless float16 is specified explicitly.

정답: 
Explanation:
The correct recommendation is to retrieve grounding data from knowledge_base and, at inference time, generate query embeddings and run a vector similarity search.
The chatbot currently answers some general HR questions but fails when policies change and when users ask about internal policy documents by title or category. That is exactly the kind of problem RAG is meant to solve: ground the LLM in the organization’s proprietary content instead of relying on the model’s training data or unrelated transactional tables. Microsoft’s RAG guidance states that RAG extends LLMs by grounding responses in your own content and that, for agentic retrieval, knowledge bases unify knowledge sources for retrieval.
So the grounding data should come from knowledge_base, because that table stores the HR policy documents and already includes fields like title, content, category, and embedding. Those are the fields directly tied to the missing and outdated policy answers.
By contrast:
employee_profiles and benefits_enrollment are operational HR tables, not the authoritative store for policy-document grounding.
PDF exports of the policies would be inferior to querying the indexed/structured knowledge base already prepared for retrieval.
The LLM training data is specifically the wrong source when the issue is outdated internal content.
For the retrieval step, Microsoft’s guidance says to use embeddings for vector queries and notes that vector similarity search matches concepts, not exact terms. This is especially important because users ask about policy documents by title or category and also phrase questions in ways that might not exactly match document wording. Generating a query embedding and then running a vector similarity search is the appropriate retrieval step in a RAG pipeline.
정답:
Explanation:
Because the SQL Server 2025 instance has no outbound network connectivity, the embedding model cannot rely on a remote REST endpoint such as Azure AI Foundry or Azure OpenAI. Microsoft’s CREATE EXTERNAL MODEL documentation includes a local deployment pattern using ONNX Runtime running locally with local runtime/model paths. That is the right design when embeddings must be generated inside the SQL Server instance without external network access. Microsoft explicitly documents a local ONNX Runtime example for SQL Server 2025 and notes the required local runtime setup and model path configuration.
The permission requirement is handled by granting the application user access to use the external embeddings model. Microsoft’s AI_GENERATE_EMBEDDINGS documentation states that, as a prerequisite, you must create an external model of type EMBEDDINGS that is accessible via the correct grants, roles, and/or permissions. Among the choices, the exam-appropriate action is to grant execute permission on the external model project to AlApplicationUser so only that database user can run embedding generation through the model.
정답:
Explanation:
The requirement is to ensure embeddings are updated every time the underlying content changes without relying on a nightly batch job. The right design is to enable change tracking on the source table so an external process can identify which rows changed and regenerate embeddings only for those rows. Microsoft documents that change detection mechanisms are used to pick up new and updated rows incrementally, which is the right pattern when you need near-continuous refresh instead of full nightly rebuilds.
This is better than:
A. fixed-size chunking, which affects chunk strategy but not change detection.
B. a smaller embedding model, which affects model cost/latency but not update triggering.
C. table triggers, which would push embedding-maintenance logic directly into write operations and is generally not the best design for AI-processing pipelines. The question specifically asks for a solution that replaces the nightly batch requirement, not one that performs heavyweight work inline during every transaction.

정답:
Explanation:
The correct insertion at line 22 is FOR JSON PATH, WITHOUT_ARRAY_WRAPPER.
The request body for the Azure OpenAI chat completions call must be a single JSON object containing the messages array with both the system/user content and the retrieved chunks. Microsoft documents that FOR JSON PATH is the preferred way to shape JSON output, especially when you want precise control over nested property names like messages[0].role and messages[1].content.
The key detail is WITHOUT_ARRAY_WRAPPER. By default, FOR JSON returns results enclosed in square brackets as a JSON array. Microsoft documents that WITHOUT_ARRAY_WRAPPER removes those brackets so a single JSON object is produced instead. That is exactly what is needed here for @payload, because the stored procedure is building one request body, not an array of request bodies.

정답: 
Explanation:
These are the correct selections for a hybrid query that uses semantic ranking in Azure AI Search.
Use k = 50 because Microsoft explicitly recommends that when you combine semantic ranking with vector queries, you should set k to 50 so the semantic ranker has enough candidates to rerank. If you use a smaller value such as 10, semantic ranking can receive too few inputs, which is exactly why some queries return fewer results than expected.
Use queryType = "semantic" because captions and answers are only available on semantic queries. Microsoft documents that captions is valid only when the query type is semantic, and semantic answers are returned only for semantic queries.
Use captions = "extractive" because semantic captions are extractive passages pulled from the top-ranked documents. Microsoft’s REST documentation states that the valid captions option here is extractive and that it defaults to none if not specified.
Use answers = "extractive" because semantic answers in Azure AI Search are extractive, not generated. Microsoft documents that semantic answers are verbatim passages recognized as answers and the REST API lists extractive as the answer-return option.

정답: 
Explanation:
The correct mapping is:
FOR JSON PATH
JSON_VALUE
JSON_QUERY
To serialize the retrieved rows from knowledge_base, the correct command is FOR JSON PATH. Microsoft documents that FOR JSON formats query results as JSON, and PATH mode is the standard
way to shape relational rows into JSON for downstream application or AI use.
To extract the answer field from the response, the correct command is JSON_VALUE because answer is a single scalar field. Microsoft states that JSON_VALUE is used to extract a scalar value from JSON text.
To extract the embeddings to store in query_cache, the correct command is JSON_QUERY because embeddings are returned as a JSON array, not a scalar. Microsoft states that JSON_QUERY extracts an object or array from JSON text, which is exactly the right behavior for an embeddings payload.
The unused options are not the best fit here:
OPENJSON is mainly for shredding JSON into rows and columns.
AI_GENERATE_CHUNKS is for chunking text, not extracting fields from a response payload.
VECTOR_DISTANCE computes similarity between vectors and is unrelated to JSON extraction.
FOR XML PATH produces XML, not JSON.
정답:
Explanation:
When you change embedding models, the stored vectors should be treated as belonging to a different embedding space unless you intentionally keep the entire corpus consistent. Microsoft’s vector guidance notes that when most or all embeddings are replaced with fresh embeddings from a new model, the recommended practice is to reload the new embeddings and, for large-scale replacement scenarios, consider dropping and recreating the vector index afterward so search quality remains predictable.
This question also says applications must continue to use VECTOR_SEARCH without runtime errors. VECTOR_SEARCH requires compatible vector dimensions, and the vector column already exists. Azure OpenAI documentation shows that text-embedding-ada-002 is fixed at 1536 dimensions and text-embedding-3-small supports up to 1536 dimensions. That means the migration can remain compatible with a VECTOR(1536) column, but the right implementation step is still to re-embed the existing rows so the table does not contain a mixed corpus produced by different models.
The other options are not appropriate:
B normalization does not solve a model migration problem.
C converting the vector column to nvarchar(max) would break vector-native search design.
D a vector index improves performance, but it does not migrate old embeddings to the new model.

정답: 
Explanation:
The first correct selection is embedding because a vector index must be created on the vector column, not on a scalar distance column or a text column such as product_name. Microsoft’s CREATE VECTOR INDEX documentation shows that the index is created directly on the vector-valued column, for example ON product_embeddings(embedding).
The second correct selection is VECTOR_SEARCH because the requirement is to use a supplied natural language query vector and search against the indexed embeddings. Microsoft documents that VECTOR_SEARCH is the Transact-SQL function for approximate nearest neighbor vector retrieval and that it applies to SQL database in Microsoft Fabric as well as other supported SQL platforms.
This also matches the shown code pattern:
declare a vector variable such as @query_vector VECTOR(1536),
create a vector index on dbo.Products(embedding),
query with VECTOR_SEARCH(... SIMILAR_TO = @query_vector, METRIC = 'cosine', TOP_N = 10).
정답:
Explanation:
The correct answer is B because the problem is not the vector search operator itself. The problem is that embeddings are becoming stale when article content changes. Microsoft documents that change data capture (CDC) tracks insert, update, and delete operations on source tables, which makes it the right mechanism to identify only the rows that changed.
This also best satisfies the requirement to minimize CPU usage on SalesDB. With CDC, the database only records the row changes, and the embedding regeneration work can be moved to an external process such as an Azure Functions app. That avoids running embedding generation inline inside the database for every update and avoids repeatedly recalculating embeddings for unchanged rows. In contrast, an hourly full-table regeneration would be extremely wasteful on a table with two million frequently updated articles, and a trigger that calls embedding generation per row would push expensive AI work into the transactional path of the database.
Option A is incorrect because changing from VECTOR_SEARCH to VECTOR_DISTANCE does not regenerate embeddings; it only changes the retrieval method. Microsoft states that VECTOR_SEARCH is the ANN search function, while VECTOR_DISTANCE performs exact distance calculation, so neither option addresses stale embedding data.
So the right design is:
use CDC to detect only changed articles,
process those changes outside the database,
regenerate embeddings only for changed rows,
write back the refreshed embeddings for current semantic search results.


정답: 
Explanation:
The correct function is VECTOR_SEARCH because the requirement is to perform approximate nearest neighbor (ANN) queries. Microsoft’s SQL documentation states that VECTOR_SEARCH is the function used for vector similarity search, and that an ANN index is used only with VECTOR_SEARCH when a compatible vector index exists on the target column. By contrast, VECTOR_DISTANCE calculates an exact distance and does not use a vector index for ANN retrieval.
The correct distance metric is cosine distance. Microsoft documents that VECTOR_SEARCH supports cosine, dot, and euclidean metrics, and Microsoft guidance specifically notes that cosine similarity is commonly used for text embeddings. It also states that retrieval of the most similar texts to a given text typically functions better with cosine similarity, and that Azure OpenAI embeddings rely on cosine similarity to compute similarity between a query and documents. Since both NotesEmbeddings and DescriptionEmbeddings are text-derived embeddings and the goal is to minimize the impact of different chunk sizes, cosine is the best choice because it compares direction/angle rather than being as sensitive to vector magnitude as Euclidean distance.

정답:
Explanation:
The correct answer is Option B because the requirement is to call an Azure OpenAI REST endpoint from SQL Server 2025 while providing the highest level of security, and the instance already has a managed identity enabled. For Microsoft’s SQL AI features, the preferred secure pattern is to use a database scoped credential with IDENTITY = 'Managed Identity' instead of storing an API key. Microsoft documents that SQL Server 2025 supports managed identity for external AI endpoints, and for Azure OpenAI the credential secret uses the Cognitive Services resource identifier: {"resourceid":"https://cognitiveservices.azure.com"}.
So line 02 should be:
WITH IDENTITY = 'Managed Identity',
SECRET = '{"resourceid":"https://cognitiveservices.azure.com"}';
Why the other options are incorrect:
A and D use HTTP header or query-string credentials with an API key, which is less secure than
managed identity because a secret key must be stored and rotated manually. Microsoft recommends managed identity where supported to avoid embedded secrets.
C mixes Managed Identity with an api-key secret, which is not the correct pattern for Azure OpenAI managed-identity authentication.
E uses an invalid identity value for this scenario. The accepted credential identities for external REST endpoint calls include HTTPEndpointHeaders, HTTPEndpointQueryString, Managed Identity, and Shared Access Signature.
Because the endpoint is Azure OpenAI and the question explicitly asks for the highest security, managed identity with the Cognitive Services resource ID is the Microsoft-aligned answer.





정답:
Explanation:
The correct choice is Option A because it provides the relevant semantic context the LLM needs while avoiding an unnecessary field that would add tokens without improving answer quality.
For LLM grounding and RAG-style context, Microsoft guidance emphasizes mapping and sending the fields that contain text pertinent to the use case. In this FAQ scenario, the useful context is the ProductName, the Question, and the Answer. Those three fields help the model understand both the subject domain and the actual Q&A pair. By contrast, FaqId is just a technical identifier and generally adds no semantic value for response generation, so including it wastes tokens.
That is why Option A is better than the others:
Option A keeps the meaningful text fields and removes the low-value identifier.
Option B is too minimal because it includes only the answer text as Prompt, which strips away the product and question context the LLM may need for accurate grounding.
Option C keeps FaqId but omits ProductName, which can be important disambiguating context.
Option D includes everything, but that does not minimize token usage because it keeps the unnecessary FaqId.