国产精品电影_久久视频免费_欧美日韩国产激情_成年人视频免费在线播放_日本久久亚洲电影_久久都是精品_66av99_九色精品美女在线_蜜臀a∨国产成人精品_冲田杏梨av在线_欧美精品在线一区二区三区_麻豆mv在线看

基于DeepSeek推理的文本聚類 原創(chuàng)

發(fā)布于 2025-3-31 08:25
瀏覽
0收藏

開發(fā)人員需要開發(fā)和理解一種新的文本聚類方法,并使用DeepSeek推理模型解釋推理結(jié)果。

本文將探索大型語言模型(LLM)中的推理領(lǐng)域,并介紹DeepSeek這款優(yōu)秀工具,它能幫助人們解釋推論結(jié)果,構(gòu)建能讓終端用戶更加信賴的機(jī)器學(xué)習(xí)系統(tǒng)。

在默認(rèn)情況下,機(jī)器學(xué)習(xí)模型是一種黑盒,不會為決策提供開箱即用的解釋(XAI)。本文介紹如何使用DeepSeek模型,并嘗試將解釋或推理能力添加到機(jī)器學(xué)習(xí)世界中。

方法?

首先構(gòu)建自定義嵌入和嵌入函數(shù)來創(chuàng)建向量數(shù)據(jù)存儲,并使用DeepSeek模型來執(zhí)行推理。

以下是展示整個流程的一個簡單的流程圖。

基于DeepSeek推理的文本聚類-AI.x社區(qū)

數(shù)據(jù)?

(1)選擇一個新聞文章數(shù)據(jù)集來識別新文章的類別。該??數(shù)據(jù)集??可在Kaggle網(wǎng)站上下載。?

(2)從數(shù)據(jù)集中,使用short_description進(jìn)行向量嵌入,并使用類別特征為每篇文章分配適當(dāng)?shù)臉?biāo)簽。

(3)數(shù)據(jù)集相當(dāng)干凈,不需要對其進(jìn)行預(yù)處理。

(4)使用pandas庫加載數(shù)據(jù)集,并使用scikit-learn將其拆分為訓(xùn)練和測試數(shù)據(jù)集。

1 import pandas as pd
2
3 df = pd.read_json('./News_Category_Dataset_v3.json',lines=True)
4
5 from sklearn.model_selection import train_test_split
6 # Separate features (X) and target (y)
7 X = df.drop('category', axis=1)
8 y = df['category']
9
10 # Split data into training and testing sets
11 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
12
13 train_df = pd.concat([X_train, y_train], axis=1)
14 test_df = pd.concat([X_test, y_test], axis=1)

生成文本嵌入?

使用以下庫進(jìn)行文本嵌入:

  • langchain—用于創(chuàng)建示例提示和語義相似性選擇器
  • langchain_chroma—用于創(chuàng)建嵌入并將其存儲在數(shù)據(jù)存儲中

1 from chromadb import Documents, EmbeddingFunction, Embeddings
2
3 from langchain_chroma import Chroma
4 from langchain_core.example_selectors import SemanticSimilarityExampleSelector
5 from langchain_core.prompts import FewShotPromptTemplate, PromptTemplate

接下來,將構(gòu)建自定義嵌入和嵌入函數(shù)。這些自定義函數(shù)將允許查詢部署在本地或遠(yuǎn)程實(shí)例上的模型。

閱讀器可以為部署在遠(yuǎn)程實(shí)例上的實(shí)例合并必要的安全機(jī)制(HTTPS、數(shù)據(jù)加密等),并調(diào)用REST端點(diǎn)來檢索模型嵌入。

1 class MyEmbeddings(Embeddings):
2
3 def __init__(self):
4 # Server address and port (replace with your actual values)
5 self.url = ""
6 # Request headers
7 self.headers = {
8 "Content-Type": "application/json"
9 }
10
11 self.data = {
12 # Use any text embedding model of your choice
13 "model": "text-embedding-nomic-embed-text-v1.5",
14 "input": None,
15 "encoding_format": "float"
16 }
17
18 def embed_documents(self, texts):
19 embeddings = []
20 for text in texts:
21 embeddings.append(self.embed_query(text))
22 return embeddings
23
24 def embed_query(self, input):
25 self.data['input'] = input
26 with requests.post(self.url, headers=self.headers, data=json.dumps(self.data)) as response:
27 res = response.text
28 yaml_object = yaml.safe_load(res)
29 embeddings = yaml_object['data'][0]['embedding']
30 return embeddings
31
32
33
34 class MyEmbeddingFunction(EmbeddingFunction):
35
36 def __call__(self, input: Documents) -> Embeddings:
37 return MyEmbeddings()

將定義一個簡單的函數(shù),它將為新聞文章創(chuàng)建一個語義相似性選擇器。選擇器將用于使用訓(xùn)練數(shù)據(jù)集創(chuàng)建向量嵌入。

1 def create_semantic_similarity_selector(train_df):
2
3 example_prompt = PromptTemplate(
4 input_variables=["input", "output"],
5 template="Input: {input}\nOutput: {output}",
6 )
7
8 # Examples of a pretend task of creating antonyms.
9 examples = []
10
11 for row in train_df.iterrows():
12 example = {}
13 example['input'] = row[1]['short_description']
14 example['output'] = row[1]['category']
15 examples.append(example)
16
17 semantic_similarity_selector = SemanticSimilarityExampleSelector.from_examples(
18 # The list of examples available to select from.
19 examples,
20 # The embedding class used to produce embeddings which are used to measure semantic similarity.
21 MyEmbeddings(),
22 # The VectorStore class that is used to store the embeddings and do a similarity search over.
23 Chroma,
24 # The number of examples to produce.
25 k=1,
26 ) 
27
28 return semantic_similarity_selector

調(diào)用上面的函數(shù)來生成新聞文章的嵌入。需要注意的是,訓(xùn)練過程可能很耗時,可以將其并行化以使其更快運(yùn)行。

1 semantic_similarity_selector = create_semantic_similarity_selector(train_df)

色度向量數(shù)據(jù)存儲用于存儲各種新聞文章及其相關(guān)標(biāo)簽的向量表示。然后使用數(shù)據(jù)存儲中的嵌入來執(zhí)行與測試數(shù)據(jù)集中文章的語義相似性,并檢查該方法的準(zhǔn)確性。

將調(diào)用DeepSeek REST端點(diǎn),并將從語義相似性選擇器接收到的響應(yīng)和實(shí)際結(jié)果傳遞給測試數(shù)據(jù)集。隨后,將創(chuàng)建一個包含DeepSeek模型進(jìn)行推理所需信息的上下文。

1 def explain_model_result(text, model_answer, actual_answer):
2 # REST end point for deepseek model.
3 url = ""
4 
5 # Request headers
6 headers = {
7 "Content-Type": "application/json"
8 }
9
10 promptJson = {
11 "question": 'Using the text, can you explain why the model answer and actual answer match or do not match ?',
12 "model_answer": model_answer,
13 "actual_answer": actual_answer,
14 "context": text,
15 }
16 prompt = json.dumps(promptJson)
17
18 # Request data (replace with your prompt)
19 data = {
20 "messages": [{"role": "user", "content": prompt}],
21 "temperature": 0.7,
22 "stream": True
23 }
24 captured_explanation = ""
25 with requests.post(url, headers=headers, data=json.dumps(data), stream=True) as response:
26 if response.status_code == 200:
27 for chunk in response.iter_content(chunk_size=None):
28 if chunk:
29 # Attempt to decode the chunk as UTF-8
30 decoded_chunk = chunk.decode('utf-8') 
31 # Process the chunk as a json or yaml to extract the explanation and concat it with captured_explanation object.
32 captured_explanation += yaml.safe_load(decoded_chunk)['data']['choices'][0]['delta']['content']
33 else:
34 print(f"Request failed with status code {response.status_code}")
35
36 return captured_explanation

以下解析測試數(shù)據(jù)集,并從DeepSeek模型中獲取解釋。

1 results_df = pd.DataFrame()
2 results_df['input'] = None
3 results_df['model_answer'] = None
4 results_df['actual_answer'] = None
5 results_df['explanation'] = None
6 for row in test_df.iterrows():
7 example = {}
8 example['input'] = row[1]['short_description']
9 model_result_category = semantic_similarity_selector.select_examples(example)
10 example['explanation'] = explain_model_result(example['input'],model_result_category[0]['output'],row[1]['category'])
11 example['model_answer'] = model_result_category[0]['output']
12 example['input'] = row[1]['category']
13 results_df.loc[len(results_df)] = example

結(jié)果?

以下將分享測試數(shù)據(jù)集中一些場景的結(jié)果。

第一個例子是當(dāng)從語義相似性中捕獲的答案與測試數(shù)據(jù)集中的實(shí)際答案不匹配時。這種洞察力使人們能夠了解模型在做出預(yù)測時是否合理。think標(biāo)簽提供了DeepSeek模型的初始思維過程,以理解問題陳述,然后為最終用戶整理答案。

在第一個例子中,DeepSeek可以解釋為什么模型得到了錯誤的答案。該模型只考慮句子中的旅游(TRAVEL)方面,而沒有考慮娛樂(ENTERTAINMENT)方面。這意味著嵌入模型需要更多的娛樂示例進(jìn)行訓(xùn)練。

1 <think>
2
3 Okay, so I'm trying to figure out why the model answer is "TRAVEL" and the actual answer is "ENTERTAINMENT." The context given is about going to Central Park and the Delacorte Theater to see Shakespeare under the night sky, with rain expected.
4
5 First, I need to understand what each term refers to. TRAVEL relates to visiting places, moving around, or tourism in general. ENTERTAINMENT is more about enjoyable activities, like theater, concerts, movies, etc., that provide pleasure.
6
7 Looking at the context: the person went to Central Park, which is a place they traveled to. Then they went to the Delacorte Theater for a Shakespeare performance. So, the main activities are both TRAVEL and ENTERTAINMENT because they're going out, seeing a play.
8
9 But why would there be a mismatch between model and actual answers? Maybe the model didn't consider that travel can also include attending theater or other performances as part of the entertainment aspect. So the actual answer is broader, including both aspects, while the model focuses only on TRAVEL.
10
11 Alternatively, perhaps the context doesn't emphasize one over the other. The key points are moving to a place (TRAVEL) and attending an event for enjoyment (ENTERTAINMENT). Depending on the focus, either could be considered the main answer, but typically, when someone goes somewhere to attend an event, both travel and entertainment elements come into play.
12
13 So, the model answer might have only picked TRAVEL, not considering ENTERTAINMENT. The actual answer includes both aspects, hence the mismatch.
14
15 </think>
16
17 The discrepancy between the model answer "TRAVEL" and the actual answer "ENTERTAINMENT" arises from the context provided. While travel is indeed part of the narrative—visiting Central Park—the primary focus is on attending a theater performance, which falls under entertainment. The actual answer captures both elements: moving to a place (travel) and engaging in an enjoyable event (entertainment). Thus, the model may have overlooked the entertainment aspect, considering only travel, whereas the actual response encompasses both activities.
18
19 **Answer:** 
20
21
21 The model answer "TRAVEL" focuses on visiting places, while the actual answer "ENTERTAINMENT" includes attending a theater performance. The context highlights both elements, leading to the mismatch in answers.

需要考慮模型的答案和實(shí)際答案不匹配的場景。然而,該模型在預(yù)測方面是合理的。

在第二個例子中,發(fā)生在法國的持械搶劫是一種犯罪(CRIME),因此將用戶文章標(biāo)記為犯罪的模型不一定是錯誤的。DeepSeek的推理能夠?yàn)榻Y(jié)果提供合理的理由。

1 <think>
2
3 Okay, so I'm trying to figure out why the model answer is "CRIME" and the actual answer is "WORLD NEWS" given the context: "French police recovered the merchandise stolen in an armed robbery on Wednesday." 
4
5 First, I need to understand what each part of this question is asking. The user wants me to explain why these two answers match or don't match based on the provided text. So, maybe I should start by analyzing the context and seeing how it relates to both CRIME and WORLD NEWS.
6
7 Looking at the context: It's about French police recovering stolen merchandise from an armed robbery. An armed robbery is definitely a crime, so that makes sense why "CRIME" is an answer. The model answer being CRIME aligns with the event described because it's a criminal act—a robbery.
8
9 Now, why is WORLD NEWS also an answer? Well, if I think about news in general, events like this are newsworthy. So perhaps the actual answer is considering that this incident is part of world news because it involves police action and theft, which can be significant enough to be reported internationally.
10
11 But wait, does the context specify whether it's considered world news or not? It just mentions French police recovering stolen goods in an armed robbery on Wednesday. Without more details about the scale or international implications, I'm not sure if it's explicitly world news. Maybe WORLD NEWS is a broader category that includes such events as part of their reporting.
12
13 So, why do they match and why don't they? They both are relevant because CRIME is directly related to the event, while WORLD NEWS is considering the newsworthiness of the event. But perhaps the model answer is more focused on the specific event being a crime, whereas WORLD NEWS is taking a broader view of news events.
14
15 I'm also thinking about how each answer approaches the context differently. The model answer is zeroing in on the type of event (crime), while the actual answer is considering it as a newsworthy event that might be reported in world news sections or channels.
16
17 So, to sum up, both answers are relevant but approach the context from different angles—Crime specifically relates to the event's nature, whereas WORLD NEWS broadens it into the realm of news reporting.
18
19 </think>
20
21 The model answer "CRIME" and the actual answer "WORLD NEWS" both address the provided context, but they do so from different perspectives. 
22
23 **CRIME**: The context explicitly describes an armed robbery, which is a criminal act. Therefore, "CRIME" directly relates to the nature of the event.
24
25 **WORLD NEWS**: While the event is indeed newsworthy, the term "WORLD NEWS" broadens the scope beyond the specific details, considering it as part of international or general news reporting.
26
27 In essence, both answers are relevant but focus on different aspects: "CRIME" highlights the event's nature as a criminal act, while "WORLD NEWS" recognizes its potential newsworthiness.

當(dāng)模型和實(shí)際答案匹配時,將考慮推理。

匹配結(jié)果背后的推理增強(qiáng)了人們的信任,并建立了對模型預(yù)測的信心。

1 <think>
2
3 Okay, so I'm trying to understand why the model answer and the actual answer both say "MEDIA" when explaining the context. The question is asking me to explain whether they match or not based on the given text.
4
5 First, let me look at the context provided: "The newsroom is awash with fears about the casino mogul's intentions after his refusal to admit ownership." This sentence mentions a casino mogul and their refusal to own up to something. That refusal is causing fear in the newsroom regarding their intentions.
6
7 Now, looking at both the model answer and the actual answer, they both are "MEDIA". I think this might be an abbreviation or shorthand for media-related reasons. The context talks about the newsroom, which is a place associated with media organizations or journalists. So, when there's fear about someone's intentions, especially related to ownership, it makes sense that the media would be concerned and involved.
8
9 Wait, but why specifically does it say "MEDIA"? Maybe it's short for something else or maybe it's just a coincidence. However, given that the context is in a newsroom setting, it's likely referring to media-related issues.
10
11 Let me break it down: The casino mogul refusing ownership could imply some kind of hidden agenda or control. Since this is in a newsroom, which deals with reporting and information, the refusal might be seen as trying to hide something. This fear could lead to media coverage or reporting on it, hence the mention of MEDIA.
12
13 So, putting it together, both answers refer to the involvement or concern of media outlets regarding the casino mogul's actions. Therefore, they match because the context clearly ties into a media-related situation.
14
15 </think>
16
17 The model answer and actual answer both state "MEDIA" because the context provided refers to a newsroom setting, where concerns about a casino mogul's intentions are likely related to media involvement or reporting. The refusal to admit ownership sparks fear, suggesting possible media coverage or concerns, thus aligning both answers with the media-related context. 
18
19 **Answer:** Both answers match because the context involves a newsroom, indicating media-related concerns over a casino mogul's intentions.

結(jié)論

推理的力量使用戶對模型的功能有了寶貴的了解,并凸顯了其優(yōu)勢與不足。它增強(qiáng)了對系統(tǒng)的理解,并提供了改進(jìn)系統(tǒng)的路徑。

設(shè)想一個只提供推理決策而不提供任何解釋的模型,那么用戶將永遠(yuǎn)無法理解為什么模型會提供這些決策,并且可能難以對其產(chǎn)生信心。

原文標(biāo)題:??Text Clustering With Deepseek Reasoning??,作者:Kalpan Dharamshi

?著作權(quán)歸作者所有,如需轉(zhuǎn)載,請注明出處,否則將追究法律責(zé)任
收藏
回復(fù)
舉報
回復(fù)
相關(guān)推薦
成人avav影音| 噜噜噜噜噜久久久久久91| 欧美伦理视频在线观看| 欧美激情a∨在线视频播放| 日日躁夜夜躁aaaabbbb| 欧美区一区二区三区| 九七影院理论片| 制服诱惑亚洲| 欧美精品久久久久久久久25p| 国产免费av国片精品草莓男男| 这里只有精品在线| 久久久久久久欧美精品| 色综合伊人色综合网站| 91精品婷婷国产综合久久| 中文字幕日本最新乱码视频| 一区二区国产精品| 国产精品第一区| 欧美二区观看| 亚洲毛片一区二区| 日本在线人成| 日韩欧美在线观看| 黄网站免费观看| www国产黄色| 国产精品视频免费在线| 国产精品一二三| 日韩免费视频一区| 日韩av在线综合| 在线观看av一区二区| 国产精品夜夜嗨| 一区二区日韩| 九九九在线观看视频| 怡红院精品视频| 久久夜色电影| 少妇久久久久久被弄到高潮| 久久人人超碰精品| 国产成人在线小视频| 久久夜色精品一区| 一区二区三区区四区播放视频在线观看 | 欧美亚洲国产一卡| 国产激情99| 国产精品福利影院| 免费男女羞羞的视频网站中文版 | 色婷婷久久久亚洲一区二区三区| 中文字幕在线视频不卡| 日韩手机在线导航| 久久精品亚洲| 久操手机在线视频| 天堂蜜桃一区二区三区| 91人人网站| 在线观看国产精品淫| 国产精品一区一区| 天天做天天躁天天躁| 中文无字幕一区二区三区| 波多野结衣亚洲一二三| 国产精品看片资源| 成人91在线观看| 午夜网站在线观看| 国产夜色精品一区二区av| av不卡一区| 成人网址在线观看| 欧美区在线播放| 国产成人免费| 欧美成人高清视频| 老牛精品亚洲成av人片| 热久久99这里有精品| 日韩精品dvd| 国产精品一区二区av| 男女精品网站| 欧美a级免费视频| 国产视频一区二区在线观看| caoporn97免费视频公开| 五月激情六月综合| 国产九九九九九| 欧美精品1区2区3区| 成人不用播放器| 亚洲精品91美女久久久久久久| 国产羞羞视频在线播放| 国产精品电影一区| 欧美日韩水蜜桃| 中文字幕在线精品| avav免费在线观看| 一个色妞综合视频在线观看| 欧美重口乱码一区二区| 国产成人tv| 欧美精品在线网站| 在线能看的av网址| 女人一区二区三区| 波多野结衣精品| 日本韩国精品一区二区在线观看| 又黄又爽毛片免费观看| 成人免费看视频| 亚洲一卡二卡三卡| 亚洲精品女人| 国产精品都在这里| 在线看成人短视频| 久久久久久久久久久91| 资源视频在线播放免费| 国产成人综合视频| 浅井舞香一区二区| 国产一区不卡在线| 国产精品久久久久一区二区| 伊人成综合网伊人222| 欧美中文在线观看国产| 欧美黑人巨大videos精品| 日韩欧美区一区二| 高清一区二区三区| 日韩有码视频在线| 亚洲国产欧美日韩在线观看第一区 | 精品中文在线| 国产日韩精品一区二区浪潮av| 国产精品91在线观看| av免费观看大全| 超碰在线网站| 国内精品久久久久影院 日本资源| 国产精品久久久久久麻豆一区软件| 欧美一区二区三区在线免费观看 | 伊人精品在线观看| 韩日欧美一区| 制服丝袜综合网| 亚洲美女视频网| 一区在线免费| 国产激情二区| 久久精品夜夜夜夜夜久久| 99亚洲一区二区| 在线理论视频| 91高潮精品免费porn| 丁香婷婷综合五月| 免费电影网站在线视频观看福利| 国产中文字幕亚洲| ...xxx性欧美| gogo久久日韩裸体艺术| 亚洲成年人专区| 欧美v亚洲v综合ⅴ国产v| 99国产精品一区二区| 日韩三级在线免费观看| 超碰成人在线免费观看| 欧美酷刑日本凌虐凌虐| 99久久精品费精品国产| 国产精品666| 久久久久国产精品www| 成人精品免费网站| 男人久久天堂| 日韩av不卡播放| 欧美精品三级在线观看| 欧美a级片一区| 欧美69xxxxx| 成人精品久久久| 天天综合天天综合色| 国产精品一区二区av日韩在线| www.激情小说.com| 欧美福利视频网站| 欧美激情一区二区在线| 一区二区三区| 成人一对一视频| 欧美成人免费全部| 国产欧美日韩在线| 狠狠久久伊人| 无限国产资源| 91在线看www| 欧美在线观看18| 亚洲尤物在线| √最新版天堂资源网在线| 亚洲一区二区自拍偷拍| 亚洲美女av在线| av一本久道久久综合久久鬼色| 日本成人一区二区| 久久精品影视大全| 欧洲中文字幕国产精品| 亚洲电影第三页| 欧美日韩亚洲一区二区三区在线| 国产女人在线视频| 精品一区二区国产| 日韩av一卡二卡| www.性欧美| 日韩在线成人| 性色av一区二区| 成人欧美一区二区三区视频xxx| 在线精品视频一区二区三四| 欧美日韩一区二区三区四区五区| 成人av在线网| 精品大片一区二区| 神马精品久久| 日本黄色播放器| 青青草一区二区| 在线观看日韩电影| 国产一区中文字幕| 欧美一区二区三区红桃小说| 中国黄色在线视频| 欧美一级片免费播放| 法国空姐在线观看免费| 二区三区中文字幕| 黄色污网站在线免费观看| caoporen国产精品| 欧美日韩亚洲国产一区| 亚洲福利天堂| 成人h小游戏| 视频一区免费观看| 成年无码av片在线| 日韩欧美在线播放| 国产成人精品在线看|