0%

数据分析-paper

  • 主要做一些论文相关的数据统计ing


TIFS2024

image-20250310210859942

  • 相对来讲TIFS收录的方向会比较均匀
  • 因此虽然只选取了2024年收录的跟Android安全强相关的11篇论文,但从上面的词云中也能反映出一些问题

IEEE S&P

image-20250310211511690

  • 这个统计了从2012-2024的96篇Android强相关的论文,确实够priacvy
  • 可以看出来近10年围绕权限所展开的移动端安全对抗来是非常多的,什么时候授予权限?什么授予多少权限?什么时候收回权限?

附录

  • 词云生成脚本
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
import matplotlib.pyplot as plt
from wordcloud import WordCloud

# 1. 读取 IEEE 导出的 Excel 文件
df = pd.read_excel("S&P.xlsx")

# 2. 选择论文标题和摘要列
texts = df["论文名称"] + " " + df["Abstract"]

# 3. 自定义停用词
custom_stop_words = set([
"app","apps","application","attack","mobile","applications","devices","android","research","paper","google","software","analysis",
"study","security","attacks","used","use","using","new","accuracy"
])

# 4. 计算 TF-IDF 关键词权重
vectorizer = TfidfVectorizer(stop_words="english", max_features=50)
tfidf_matrix = vectorizer.fit_transform(texts)

# 筛选掉自定义的停用词
filtered_keywords = [word for word in vectorizer.get_feature_names_out() if word not in custom_stop_words]

# 生成词云
wordcloud = WordCloud(width=800, height=400, background_color="white").generate(" ".join(filtered_keywords))
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()