用AI写代码 -- Github Copilot测试
Posted 有数可据
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了用AI写代码 -- Github Copilot测试相关的知识,希望对你有一定的参考价值。
截图为copilot官网
要使用copilot,首先要安装vscode。下载copilot插件。下载完了以后,并不能马上使用。还需要申请内测账号,前几天,我终于收到内测账号通过的邮件。趁着今天在家,连忙打开vscode测试一下。
爬图片
我们的第一个实验是爬csdn的logo。
于是,我新建了一个文件,我开始写注释。当我写道retri的时候,copilot已经知道我要说什么了,直接把一句话补全了。牛!
我于是打了一下tab,这句话就补全了。
接着,我打了import,copilot已经开始建议我导入urllib了。牛!
打完def三个字,整个函数的建议都有了。牛!
接着,打if:
于是,满怀期待的,运行了程序,结果,悲剧了:
AttributeError: module ‘urllib’ has no attribute ‘urlopen’
我还是比较熟悉requests模块,于是,我把第一句代码改成了import requests,重新生成代码。如下:
# find logo in csdn.net
# retrieve the logo and save it in the same directory
import requests
from bs4 import BeautifulSoup
import os
def get_logo(url):
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
logo = soup.find('link', rel='shortcut icon')
if logo:
logo_url = logo['href']
r = requests.get(logo_url)
if r.status_code == 200:
with open(os.path.basename(logo_url), 'wb') as f:
f.write(r.content)
print('logo saved')
else:
print('logo not found')
else:
print('logo not found')
if __name__ == "__main__":
get_logo('http://www.csdn.net')
运行完了,发现不是logo,是favicon。😦
算了,算了,Copilot找不到logo,还是我自己找吧。
于是,我把注释改成了
# download and save https://img-home.csdnimg.cn/images/20201124032511.png
# to the same folder
生成的代码如下:
# download and save https://img-home.csdnimg.cn/images/20201124032511.png
# to the same folder
import requests
from bs4 import BeautifulSoup
import os
def download():
url = 'https://img-home.csdnimg.cn/images/20201124032511.png'
root = './img'
path = root + url.split('/')[-1]
try:
if not os.path.exists(root):
os.mkdir(root)
if not os.path.exists(path):
r = requests.get(url)
with open(path, 'wb') as f:
f.write(r.content)
f.close()
print('文件保存成功')
else:
print('文件已存在')
except:
print('爬取失败')
if __name__ == '__main__':
download()
这次终于成功的下载了csdn的logo。
我想,这可能是因为csdn使用了反爬技术(logo图片名字不叫logo),所以找不到,于是,我想到了,可以去爬小米的200万logo。
给了注释以后,生成代码如下:
# find logo from https://www.mi.com/index.html
# and download it to local
import os
import requests
from bs4 import BeautifulSoup
def download():
url = 'https://www.mi.com/index.html'
r = requests.get(url)
r.raise_for_status()
soup = BeautifulSoup(r.text, 'html.parser')
img = soup.find('img', class_='logo')
print(img)
img_url = img['src']
img_name = img['alt']
img_path = os.path.join('.', img_name)
if not os.path.exists(img_path):
with open(img_path, 'wb') as f:
f.write(requests.get(img_url).content)
print('Downloading %s' % img_name)
else:
print('%s already exists' % img_name)
if __name__ == '__main__':
download()
运行结果,找不到logo。我打开小米官网,看了半天,我都没看出来logo的地址在哪里。怪不得copilot也找不到。看来,200万的logo,不容屌丝随便盗取呀!
Pandas过滤
第二个例子,我打算写一个过滤程序,导入两个csv,用第一个减去第二个。得到第三个并保存。
生成代码如下
# read two csv files using pandas,
# subtract the data in the second file from the first file,
# and write the result to a new csv file.
import pandas as pd
import csv
import sys
def subtract():
# read the two csv files
df1 = pd.read_csv(sys.argv[1])
df2 = pd.read_csv(sys.argv[2])
# subtract the second file from the first file
df3 = df1 - df2
# write the result to a new csv file
df3.to_csv(sys.argv[3], index=False)
if __name__ == '__main__':
subtract()
结果,运行出错:
TypeError: unsupported operand type(s) for -: ‘str’ and ‘str’
于是,我修改了注释,再次生成。
# read two csv files, a.csv and b.csv
# remove every row in a.csv that appears in b.csv
# write the result to a new csv file.
import os
import csv
import pandas as pd
def main():
# get the current working directory
cwd = os.getcwd()
# get the path to the files
a_path = os.path.join(cwd, "a.csv")
b_path = os.path.join(cwd, "b.csv")
# read the files into pandas dataframes
a_df = pd.read_csv(a_path)
b_df = pd.read_csv(b_path)
# remove rows from a_df that appear in b_df
a_df = a_df[~a_df.index.isin(b_df.index)]
# write the result to a new csv file
a_df.to_csv("a_minus_b.csv", index=False)
if __name__ == "__main__":
main()
这次居然用了index过滤,但是并不是我想要的。算了,我还是去StackOverflow找吧。
Leetcode测试
我用leetcode第14题,测试,结果如下:
# 14. Longest Common Prefix
# Easy
# 4845
# 2366
# Add to List
# Share
# Write a function to find the longest common prefix string amongst an array of strings.
# If there is no common prefix, return an empty string "".
# Example 1:
# Input: strs = ["flower","flow","flight"]
# Output: "fl"
# Example 2:
# Input: strs = ["dog","racecar","car"]
# Output: ""
# Explanation: There is no common prefix among the input strings.
# Constraints:
# 1 <= strs.length <= 200
# 0 <= strs[i].length <= 200
# strs[i] consists of only lower-case English letters.
class Solution(object):
def longestCommonPrefix(self, strs):
"""
:type strs: List[str]
:rtype: str
"""
# 以上是我复制粘贴自
# https://leetcode.com/problems/longest-common-prefix/
# 以下为copilot生成
if not strs:
return ""
if len(strs) == 1:
return strs[0]
min_len = min([len(s) for s in strs])
for i in range(min_len):
for j in range(len(strs)):
if strs[j][i] != strs[0][i]:
return strs[0][:i]
return strs[0][:min_len]
if __name__ == "__main__":
a = Solution()
print(a.longestCommonPrefix(["flower","flow","flight"]))
print(a.longestCommonPrefix(["dog","racecar","car"]))
print(a.longestCommonPrefix(["a","b","c"]))
print(a.longestCommonPrefix(["a"]))
print(a.longestCommonPrefix([""]))
print(a.longestCommonPrefix([]))
打印结果
fl
a
看起来好像很厉害,都能做leetcode的题目了。但是,我大胆怀疑,copilot应该用了leetcode训练模型,所以能做出leetcode并不奇怪。
机器学习代码
同理,我给出注释,然后copilot给出代码,如下:
# read iris datasets from sklearn.datasets
# predict the class of the samples in the test set
# return the predicted classes
# print accuracy, f1 score, recall, precision
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, f1_score, recall_score, precision_score
def predict(X_train, y_train, X_test, y_test, k):
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
print("Accuracy: {:.2f}".format(accuracy_score(y_test, y_pred)))
print("F1 score: {:.2f}".format(f1_score(y_test, y_pred)))
print("Recall: {:.2f}".format(recall_score(y_test, y_pred)))
print("Precision: {:.2f}".format(precision_score(y_test, y_pred)))
def main():
iris = load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
predict(X_train, y_train, X_test, y_test, 5)
if __name__ == "__main__":
main()
又报错了:
ValueError: Target is multiclass but average=‘binary’. Please choose another average setting, one of [None, ‘micro’, ‘macro’, ‘weighted’].
修改predict函数如下
def predict(X_train, y_train, X_test, y_test, k):
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
print("Accuracy: {:.2f}".format(accuracy_score(y_test, y_pred)))
print("F1 score: {:.2f}".format(f1_score(y_test, y_pred, average='macro')))
print("Recall: {:.2f}".format(recall_score(y_test, y_pred, average='macro')))
print("Precision: {:.2f}".format(precision_score(y_test, y_pred, average='macro')))
再测试,通过了。
虽然有错误,但是我觉得也挺好,我可以少打不少字了。少数几个错误改一下即可。
总结
要用好copilot,还是需要给他非常明确的指示。比如下载图片,要告诉他图片的URL。很多时候,你忙了半天,或许还不如自己去网上找代码。常见问题能解决,非常见问题,就未必了。不过,这还仅仅是开始,相信未来copilot会越来越准。那时候,善于使用copitlot的程序员,就不用加班了。不用copilot,就要天天加班了。
即使有了copilot,你还是需要自己理解代码。比如上面的pandas过滤程序,代码其实错了,但是正好通过了测试。如果你不知起所以然,直接上生产,后果可想而知。
测试过程中发现,他的速度非常不稳定。经常是半天没反应,不知道是不是网络原因。希望未来copilot能在中国放服务器,这样速度就有保证了。
以上是关于用AI写代码 -- Github Copilot测试的主要内容,如果未能解决你的问题,请参考以下文章
让 AI 为你写代码 - 体验 Github Copilot
GitHub Copilot代码笔刷火了,一刷修bug加文档,特斯拉前AI总监:我现在80%的代码由AI完成...