Zzzyt语录判断器

打完NOIP后要退役了,闲着无聊写了Zzzyt语言判断器。

第一步:获取数据

从Skype上下载聊天历史数据,并进行处理,可以参考这篇文章HHS Blog

这是我的源码供参考修改:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import json

print("text,label")

d=dict()

def deal(s):
if s==None:
return None

moved=""
s=s.replace(",",",").replace("&apos;","'").replace("&lt;","<").replace("&gt;",">").replace("&quot;","\"").replace("\n","\\n").replace("\r","").replace("&amp;","&")
layer=0
for i in range(0,min(50,len(s))):
if s[i]=='<':
layer+=1
elif s[i]=='>':
layer-=1
elif layer==0:
moved+=s[i]
return moved
with open("messages.json","r") as f:
data=json.loads("\n".join(f.readlines()))

for conv in data['conversations']:
for msg in conv['MessageList']:
usr=msg['displayName']

if usr==None:
continue
txt=deal(msg['content'])
# print(msg['content'],'->',txt)
if usr=="Zzzyt":
d[txt]="Zzzyt"
elif txt not in d:
d[txt]="Other"
# if txt!=None and txt not in d:
# d[txt]=usr
# elif d[txt]!=usr and d[txt]!="Zzzyt:
# d[txt]="Common"

count=0
for i in d:
count+=1
if d[i]=="Zzzyt":
print(i,"Zzzyt",sep=",")
else:
print(i,"Not Zzzyt",sep=",")

第二步 导入数据到Create ML

将输出文件导入到Create ML中,进行简单训练。 (毕竟我也是ML弟弟,也懒得花时间真正去写tf什么的)

第三步 查看效果

啊啊啊

经过简单测试,本模型已经可以简单识别Zzzyt语言习惯,比如im,:?等常用短语表情。

下载

GDrive

Github