用AI写代码 -- Github Copilot测试

Posted 2022-12-04 织网者Eric

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了用AI写代码 -- Github Copilot测试相关的知识，希望对你有一定的参考价值。

截图为copilot官网

要使用copilot，首先要安装vscode。下载copilot插件。下载完了以后，并不能马上使用。还需要申请内测账号，前几天，我终于收到内测账号通过的邮件。趁着今天在家，连忙打开vscode测试一下。

爬图片

我们的第一个实验是爬csdn的logo。

于是，我新建了一个文件，我开始写注释。当我写道retri的时候，copilot已经知道我要说什么了，直接把一句话补全了。牛！

我于是打了一下tab，这句话就补全了。

接着，我打了import，copilot已经开始建议我导入urllib了。牛！

打完def三个字，整个函数的建议都有了。牛！

接着，打if：

于是，满怀期待的，运行了程序，结果，悲剧了：

AttributeError: module ‘urllib’ has no attribute ‘urlopen’

我还是比较熟悉requests模块，于是，我把第一句代码改成了import requests，重新生成代码。如下：

# find logo in csdn.net
# retrieve the logo and save it in the same directory

import requests
from bs4 import BeautifulSoup
import os

def get_logo(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser')
    logo = soup.find('link', rel='shortcut icon')
    if logo:
        logo_url = logo['href']
        r = requests.get(logo_url)
        if r.status_code == 200:
            with open(os.path.basename(logo_url), 'wb') as f:
                f.write(r.content)
                print('logo saved')
        else:
            print('logo not found')
    else:
        print('logo not found')
        

if __name__ == "__main__":
    get_logo('http://www.csdn.net')

运行完了，发现不是logo，是favicon。😦

算了，算了，Copilot找不到logo，还是我自己找吧。

于是，我把注释改成了

# download and save https://img-home.csdnimg.cn/images/20201124032511.png
# to the same folder

生成的代码如下：

# download and save https://img-home.csdnimg.cn/images/20201124032511.png
# to the same folder

import requests
from bs4 import BeautifulSoup
import os

def download():
    url = 'https://img-home.csdnimg.cn/images/20201124032511.png'
    root = './img'
    path = root + url.split('/')[-1]
    try:
        if not os.path.exists(root):
            os.mkdir(root)
        if not os.path.exists(path):
            r = requests.get(url)
            with open(path, 'wb') as f:
                f.write(r.content)
                f.close()
                print('文件保存成功')
        else:
            print('文件已存在')
    except:
        print('爬取失败')


if __name__ == '__main__':
    download()

这次终于成功的下载了csdn的logo。

我想，这可能是因为csdn使用了反爬技术（logo图片名字不叫logo），所以找不到，于是，我想到了，可以去爬小米的200万logo。

给了注释以后，生成代码如下：

# find logo from https://www.mi.com/index.html
# and download it to local

import os
import requests
from bs4 import BeautifulSoup

def download():
    url = 'https://www.mi.com/index.html'
    r = requests.get(url)
    r.raise_for_status()
    soup = BeautifulSoup(r.text, 'html.parser')
    img = soup.find('img', class_='logo')
    print(img)
    img_url = img['src']
    img_name = img['alt']
    img_path = os.path.join('.', img_name)
    if not os.path.exists(img_path):
        with open(img_path, 'wb') as f:
            f.write(requests.get(img_url).content)
            print('Downloading %s' % img_name)
    else:
        print('%s already exists' % img_name)


if __name__ == '__main__':
    download()

运行结果，找不到logo。我打开小米官网，看了半天，我都没看出来logo的地址在哪里。怪不得copilot也找不到。看来，200万的logo，不容屌丝随便盗取呀！

Pandas过滤

第二个例子，我打算写一个过滤程序，导入两个csv，用第一个减去第二个。得到第三个并保存。

生成代码如下

# read two csv files using pandas, 
# subtract the data in the second file from the first file,
# and write the result to a new csv file.

import pandas as pd
import csv
import sys

def subtract():
    # read the two csv files
    df1 = pd.read_csv(sys.argv[1])
    df2 = pd.read_csv(sys.argv[2])
    # subtract the second file from the first file
    df3 = df1 - df2
    # write the result to a new csv file
    df3.to_csv(sys.argv[3], index=False)

if __name__ == '__main__':
    subtract()

结果，运行出错：

TypeError: unsupported operand type(s) for -: ‘str’ and ‘str’

于是，我修改了注释，再次生成。

# read two csv files, a.csv and b.csv
# remove every row in a.csv that appears in b.csv
# write the result to a new csv file.

import os
import csv
import pandas as pd

def main():
    # get the current working directory
    cwd = os.getcwd()
    # get the path to the files
    a_path = os.path.join(cwd, "a.csv")
    b_path = os.path.join(cwd, "b.csv")
    # read the files into pandas dataframes
    a_df = pd.read_csv(a_path)
    b_df = pd.read_csv(b_path)
    # remove rows from a_df that appear in b_df
    a_df = a_df[~a_df.index.isin(b_df.index)]
    # write the result to a new csv file
    a_df.to_csv("a_minus_b.csv", index=False)

if __name__ == "__main__":
    main()

这次居然用了index过滤，但是并不是我想要的。算了，我还是去StackOverflow找吧。

Leetcode测试

我用leetcode第14题，测试，结果如下：

# 14. Longest Common Prefix
# Easy

# 4845

# 2366

# Add to List

# Share
# Write a function to find the longest common prefix string amongst an array of strings.

# If there is no common prefix, return an empty string "".

 

# Example 1:

# Input: strs = ["flower","flow","flight"]
# Output: "fl"
# Example 2:

# Input: strs = ["dog","racecar","car"]
# Output: ""
# Explanation: There is no common prefix among the input strings.
 

# Constraints:

# 1 <= strs.length <= 200
# 0 <= strs[i].length <= 200
# strs[i] consists of only lower-case English letters.

class Solution(object):
    def longestCommonPrefix(self, strs):
        """
        :type strs: List[str]
        :rtype: str
        """
        # 以上是我复制粘贴自
		# https://leetcode.com/problems/longest-common-prefix/
        # 以下为copilot生成
        if not strs:
            return ""
        if len(strs) == 1:
            return strs[0]
        min_len = min([len(s) for s in strs])
        for i in range(min_len):
            for j in range(len(strs)):
                if strs[j][i] != strs[0][i]:
                    return strs[0][:i]
        return strs[0][:min_len]

if __name__ == "__main__":
    a = Solution()
    print(a.longestCommonPrefix(["flower","flow","flight"]))
    print(a.longestCommonPrefix(["dog","racecar","car"]))
    print(a.longestCommonPrefix(["a","b","c"]))
    print(a.longestCommonPrefix(["a"]))
    print(a.longestCommonPrefix([""]))
    print(a.longestCommonPrefix([]))

打印结果

fl

a

看起来好像很厉害，都能做leetcode的题目了。但是，我大胆怀疑，copilot应该用了leetcode训练模型，所以能做出leetcode并不奇怪。

机器学习代码

同理，我给出注释，然后copilot给出代码，如下：

# read iris datasets from sklearn.datasets
# predict the class of the samples in the test set
# return the predicted classes
# print accuracy, f1 score, recall, precision


from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, f1_score, recall_score, precision_score

def predict(X_train, y_train, X_test, y_test, k):
    knn = KNeighborsClassifier(n_neighbors=k)
    knn.fit(X_train, y_train)
    y_pred = knn.predict(X_test)
    print("Accuracy: :.2f".format(accuracy_score(y_test, y_pred)))
    print("F1 score: :.2f".format(f1_score(y_test, y_pred)))
    print("Recall: :.2f".format(recall_score(y_test, y_pred)))
    print("Precision: :.2f".format(precision_score(y_test, y_pred)))

def main():
    iris = load_iris()
    X = iris.data
    y = iris.target
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
    predict(X_train, y_train, X_test, y_test, 5)


if __name__ == "__main__":
    main()

又报错了：

ValueError: Target is multiclass but average=‘binary’. Please choose another average setting, one of [None, ‘micro’, ‘macro’, ‘weighted’].

修改predict函数如下

def predict(X_train, y_train, X_test, y_test, k):
    knn = KNeighborsClassifier(n_neighbors=k)
    knn.fit(X_train, y_train)
    y_pred = knn.predict(X_test)
    print("Accuracy: :.2f".format(accuracy_score(y_test, y_pred)))
    print("F1 score: :.2f".format(f1_score(y_test, y_pred, average='macro')))
    print("Recall: :.2f".format(recall_score(y_test, y_pred, average='macro')))
    print("Precision: :.2f".format(precision_score(y_test, y_pred, average='macro')))

再测试，通过了。
虽然有错误，但是我觉得也挺好，我可以少打不少字了。少数几个错误改一下即可。

总结

要用好copilot，还是需要给他非常明确的指示。比如下载图片，要告诉他图片的URL。很多时候，你忙了半天，或许还不如自己去网上找代码。常见问题能解决，非常见问题，就未必了。不过，这还仅仅是开始，相信未来copilot会越来越准。那时候，善于使用copitlot的程序员，就不用加班了。不用copilot，就要天天加班了。

即使有了copilot，你还是需要自己理解代码。比如上面的pandas过滤程序，代码其实错了，但是正好通过了测试。如果你不知起所以然，直接上生产，后果可想而知。

测试过程中发现，他的速度非常不稳定。经常是半天没反应，不知道是不是网络原因。希望未来copilot能在中国放服务器，这样速度就有保证了。

以上是关于用AI写代码 -- Github Copilot测试的主要内容，如果未能解决你的问题，请参考以下文章

AI 自动写代码插件 Copilot(副驾驶员)

让 AI 为你写代码 - 体验 Github Copilot

GitHub Copilot代码笔刷火了，一刷修bug加文档，特斯拉前AI总监：我现在80%的代码由AI完成...

AI 帮写代码 67 元/月！GitHub Copilot 搞收费“双标”，劝退大批程序员