如何让Ai帮数据分析师干活-工作1 - 博客

[{"createTime":1735734952000,"id":1,"img":"hwy_ms_500_252.jpeg","link":"https://activity.huaweicloud.com/cps.html?fromacct=261f35b6-af54-4511-a2ca-910fa15905d1&utm_source=V1g3MDY4NTY=&utm_medium=cps&utm_campaign=201905","name":"华为云秒杀","status":9,"txt":"华为云38元秒杀","type":1,"updateTime":1735747411000,"userId":3},{"createTime":1736173885000,"id":2,"img":"txy_480_300.png","link":"https://cloud.tencent.com/act/cps/redirect?redirect=1077&cps_key=edb15096bfff75effaaa8c8bb66138bd&from=console","name":"腾讯云秒杀","status":9,"txt":"腾讯云限量秒杀","type":1,"updateTime":1736173885000,"userId":3},{"createTime":1736177492000,"id":3,"img":"aly_251_140.png","link":"https://www.aliyun.com/minisite/goods?userCode=pwp8kmv3","memo":"","name":"阿里云","status":9,"txt":"阿里云2折起","type":1,"updateTime":1736177492000,"userId":3},{"createTime":1735660800000,"id":4,"img":"vultr_560_300.png","link":"https://www.vultr.com/?ref=9603742-8H","name":"Vultr","status":9,"txt":"Vultr送$100","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":5,"img":"jdy_663_320.jpg","link":"https://3.cn/2ay1-e5t","name":"京东云","status":9,"txt":"京东云特惠专区","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":6,"img":"new_ads.png","link":"https://www.iodraw.com/ads","name":"发布广告","status":9,"txt":"发布广告","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":7,"img":"yun_910_50.png","link":"https://activity.huaweicloud.com/discount_area_v5/index.html?fromacct=261f35b6-af54-4511-a2ca-910fa15905d1&utm_source=aXhpYW95YW5nOA===&utm_medium=cps&utm_campaign=201905","name":"底部","status":9,"txt":"高性能云服务器2折起","type":2,"updateTime":1735660800000,"userId":3}]

故事背景：

openai公开了api调用接口，北大前几天出了一款chatexcel工具。这两件事本来没什么关系，但是工程师就是这样没事总要给自己找点事干。在一个技术群里跟人吹牛说如果openai开放api我也可以做一个chatexcel，甚至比他们做的更加好。

1.要做到自然语言接需求

2.可以精准的理解用户需求

3.可以给出准确分析结果

4.需要给出可视化的呈现报告

5.如果可以最好能做成ppt呈现

好了然后又是填坑之路，为了快速做产品最小代价poc。于是就openai
api+可视化的chatgpt一起使用了。其实如果真要做产品这些必然都是封装好的都是用openai
api来做，对用户就一个需求交互框、一个输入数据cvs表的地方就可以。这边我是验证产品上下界，所以请允许我无伤大雅的犯一次规。

思路如下：

1.用户输入表单后对表头做解析，解析出meta信息，已备后续用户需求分析使用

2.先格式化的输入描述，让openai API生成自动化数据分析的代码（产品化时候，可以把用户宽泛需求通过openai转成格式化输入）

3.把生成python代码解析存成.py格式

4.用python得os包执行python脚本，把数据可视化转成html格式方便点击查看

下面看实际效果：

调用api通过自然语言描述生成带代码，给出的结果代码堆在一起，所以需要做解析

代码解析，这边偷了个懒，让chatgpt帮忙解析出结果

解析出openai生成的代码如下，因为包的版本问题有些版本冲突问题，为了快速验证，我放弃解决冲突，让chatgpt帮忙重新对任务生成代码。
import pandas as pd import jieba from pyecharts import WordCloud # read in the
data from the CSV file data = pd.read_csv('product_reviews.csv') # split
product reviews into individual words reviews = data['Product Review']
word_freq = {} for review in reviews: words = jieba.cut(review) for word in
words: if word in word_freq: word_freq[word] += 1 else: word_freq[word] = 1 #
sort the words by frequency sorted_word_freq = sorted(word_freq.items(),
key=lambda x: x[1], reverse=True) # print the top 10 most frequent words
print('Top 10 most frequent words:') for word, freq in sorted_word_freq[:10]:
print(f'{word}: {freq}') # create a word cloud of the top 50 most frequent
words wordcloud = WordCloud(width=800, height=620) wordcloud.add("",
sorted_word_freq[:50], word_size_range=[20, 100])
wordcloud.render('wordcloud.html')
chatgpt解析任务，生成代码

上面代码有些小bug

于是还是直接测试了用chatgpt来生成代码
import pandas as pd import jieba from wordcloud import WordCloud import
matplotlib.pyplot as plt from collections import Counter # read the data from
the table df = pd.read_csv('product_reviews.csv') # create a list of stop words
stop_words = ['的', '了', '是', '我', '你', '他', '她', '我们', '你们', '他们'] # tokenize
the product reviews and count the frequency of each word words_list = [] for
review in df['Product Review']: words = jieba.lcut(review)
words_list.extend(words) words_freq = Counter(words_list) # remove stop words
from the word frequency dictionary for stop_word in stop_words:
words_freq.pop(stop_word, None) # sort the word frequency dictionary by
descending order of frequency sorted_words_freq = sorted(words_freq.items(),
key=lambda x: x[1], reverse=True) # print the top 10 most frequent words
print('Top 10 most frequent words in product reviews:') for word, freq in
sorted_words_freq[:10]: print(f'{word}: {freq}') # create a word cloud using
the top 50 most frequent words wordcloud = WordCloud(background_color='white',
width=800, height=400).generate_from_frequencies(words_freq.most_common(50)) #
plot the word cloud plt.figure(figsize=(12, 6)) plt.imshow(wordcloud,
interpolation='bilinear') plt.axis('off') plt.show() # save the word cloud as
an HTML file wordcloud.to_file('wordcloud.html')
甚至给出了要安装什么包

让chatgpt帮忙生成一些数据做测试
import csv import random # 商品名称列表 product_names = ['Product A', 'Product B',
'Product C', 'Product D', 'Product E'] # 生成商品评论数据 product_reviews = [] for i in
range(1000): # 随机选择一个商品名称 product_name = random.choice(product_names) #
随机生成一个评论 product_review = f"This is a great {product_name}!" # 随机生成曝光点次数和点击次数
num_exposures = random.randint(1, 100) click_count = random.randint(0,
num_exposures) # 添加到商品评论列表中 product_reviews.append([product_name,
product_review, num_exposures, click_count]) # 将商品评论数据写入 CSV 文件 with
open('product_reviews.csv', mode='w', newline='') as csv_file: writer =
csv.writer(csv_file) # 写入表头 writer.writerow(['Product Name', 'Product Review',
'Number of Exposures', 'Click Count']) # 写入数据 writer.writerows(product_reviews)
整合项目测试代码效果

生成数据

代码生成的词云图

小结：

1.整体看chatgpt已经非常强大，基本可以把每个环节任务做的非常好，只有很小一些版本、数据转化（list和字典）小问题

2.通过对具体任务的拆解，把流程衔接好，chatgpt基本可以完整的解决实际工作生产

3.用单一系统chatgpt或者openai API+简单业务流程串接，开发ai应用产品效果很惊艳

4.未来产品交互将会更加人性化简洁化

5.对于1的小问题，我觉得可以通过领域代码fintune可以完全解决

技术

Java1212 篇
Python927 篇
开发语言608 篇
c语言463 篇
算法461 篇
MySQL438 篇
数据库394 篇
前端387 篇
更多...