JsonParser 的 Parse 方法大大减慢了我的代码

Posted

技术标签:

【中文标题】JsonParser 的 Parse 方法大大减慢了我的代码【英文标题】:Parse method of JsonParser drastically slow down my code 【发布时间】:2017-05-20 16:11:30 【问题描述】:

我正在做一个项目,该项目应该从 JSON 文件中提取数据(包含有关波兰代表的信息)并仅使用这些数据进行一些计算。

代码正在正确执行,但一种方法会大大减慢一切。 我不是最擅长描述的,所以让我们展示一下我的 Jsonreader 类Gist link (方法用在第17、43、50行) 代码看起来有点乱,但它工作正常,不包括使用 jsonparser.parse 方法的片段。每个特使需要大约 2 秒,这是不可接受的。我必须改变那几行,但我不知道如何。 我正在考虑将 json 文件用于映射对象,然后对其进行处理,但我不确定这是否是一个不错的选择。 (对不起我的语法不好)

【问题讨论】:

确定parse()方法慢,而不是使用getContent()从HTTPS服务器检索JSON文本慢?跨度> @Andreas 这可能是 getContent() 方法,我不能 100% 确定,因为我检查的是代码的每一行而不是每个方法的时间。我应该如何测试它?如果是这种方法,我该如何加快速度? 加速它的唯一方法是使用多个线程。 @Andreas 像我这样的初学者可以做到吗? 我无法回答。只有你可以。 【参考方案1】:

如何检查 getContent 方法是否存在问题?

您可以间接证明这一点:只需在 Web 浏览器网络调试器选项卡中检查您的服务 API 性能,或测量简单 wget 的时间,如 time wget YOUR_URL

我同意Andreas 的观点,怀疑parse 方法是万恶之源。其实不是。如果您仔细查看要点,您会发现 parse 方法接受委托阅读器,该阅读器实际上使用与远程主机“连接”的底层输入流。 I/O 通常是非常耗时的操作,尤其是网络。此外,在这里建立 HTTP 连接是一件昂贵的事情。 在我的机器上,我最终得到了以下平均时间:

发出 HTTP 请求:第一个请求大约 1.50..2.00 秒,连续请求大约 0.50..1.00 秒; 读取数据:~0.80 秒(无论是读到最后还是 JSON 解析——没关系,Gson 真的非常快;您甚至可以使用网络调试器或@987654325 在浏览器中分析性能@ 如果您使用 Unix 终端)。

Andreas 建议的另一点是使用多个线程来并行运行独立的任务。这可以加快速度,但不会对您产生太大的影响,因为您的服务访问速度不是那么快,很遗憾。

Executing SingleThreadedDemo...
Executing SingleThreadedDemo took 1063935ms         = ~17:43
Executing MultiThreadedDemo...
Executing MultiThreadedDemo took 353044ms           = ~5:53

稍后运行演示得到以下结果(大约快 3 倍,不知道之前减速的真正原因是什么)

Executing SingleThreadedDemo...
Executing SingleThreadedDemo took 382249ms          = ~6:22
Executing MultiThreadedDemo...
Executing MultiThreadedDemo took 130502ms           = ~2:11
Executing MultiThreadedDemo...
Executing MultiThreadedDemo took 110119ms           = ~1:50

AbstractDemo.java

下面的类违反了一些好的 OOP 设计概念,但是为了不让类的总数膨胀,就放在这里吧。

abstract class AbstractDemo
        implements Callable<List<EnvoyData>> 

    // Gson is thread-safe
    private static final Gson gson = new Gson();

    // JsonParser is thread-safe: https://groups.google.com/forum/#!topic/google-gson/u6hq2OVpszc
    private static final JsonParser jsonParser = new JsonParser();

    interface IPointsAndYearbooksConsumer 

        void acceptPointsAndYearbooks(SerializedDataPoints points, SerializedDataYears yearbooks);

    

    interface ITripsConsumer 

        void acceptTrips(SerializedDataTrips trips);

    

    AbstractDemo() 
    

    protected abstract List<EnvoyData> doCall()
            throws Exception;

    // This implementation measures time (in milliseconds) taken for each demo call
    @Override
    public final List<EnvoyData> call()
            throws Exception 
        final String name = getClass().getSimpleName();
        final long start = currentTimeMillis();
        try 
            out.printf("Executing %s...\n", name);
            final List<EnvoyData> result = doCall();
            out.printf("Executing %s took %dms\n", name, currentTimeMillis() - start);
            return result;
         catch ( final Exception ex ) 
            err.printf("Executing %s took %dms\n", name, currentTimeMillis() - start);
            throw ex;
        
    

    // This is a generic method that encapsulates generic pagination and lets you to iterate over the service pages in for-each style manner 
    static Iterable<JsonElement> jsonRequestsAt(final URL startUrl, final Function<? super JsonObject, URL> nextLinkExtrator, final JsonParser jsonParser) 
        return () -> new Iterator<JsonElement>() 
            private URL nextUrl = startUrl;

            @Override
            public boolean hasNext() 
                return nextUrl != null;
            

            @Override
            public JsonElement next() 
                if ( nextUrl == null ) 
                    throw new NoSuchElementException();
                
                try ( final Reader reader = readFrom(nextUrl) ) 
                    final JsonElement root = jsonParser.parse(reader);
                    nextUrl = nextLinkExtrator.apply(root.getAsJsonObject());
                    return root;
                 catch ( final IOException ex ) 
                    throw new RuntimeException(ex);
                
            
        ;
    

    // Just a helper method to iterate over the start response
    static Iterable<JsonElement> getAfterwords()
            throws MalformedURLException 
        return jsonRequestsAt(
                afterwordsUrl(),
                root -> 
                    try 
                        final JsonElement next = root.get("Links").getAsJsonObject().get("next");
                        return next != null ? new URL(next.getAsString()) : null;
                     catch ( final MalformedURLException ex ) 
                        throw new RuntimeException(ex);
                    
                ,
                jsonParser
        );
    

    // Just extract points and yearbooks.
    // You can return a custom data holder class, but this one uses consuming-style passing the results via its parameter consumer
    static void extractPointsAndYearbooks(final Reader reader, final IPointsAndYearbooksConsumer consumer) 
        final JsonObject expensesJsonObject = jsonParser.parse(reader)
                .getAsJsonObject()
                .get("layers")
                .getAsJsonObject()
                .get("wydatki")
                .getAsJsonObject();
        final SerializedDataPoints points = gson.fromJson(expensesJsonObject.get("punkty").getAsJsonArray(), SerializedDataPoints.class);
        final SerializedDataYears yearbooks = gson.fromJson(expensesJsonObject.get("roczniki").getAsJsonArray(), SerializedDataYears.class);
        consumer.acceptPointsAndYearbooks(points, yearbooks);
    

    // The same as above but for another type of response
    static void extractTrips(final Reader reader, final ITripsConsumer consumer) 
        final JsonElement tripsJsonElement = jsonParser.parse(reader)
                .getAsJsonObject()
                .get("layers")
                .getAsJsonObject()
                .get("wyjazdy");
        final SerializedDataTrips trips = tripsJsonElement.isJsonArray()
                ? gson.fromJson(tripsJsonElement.getAsJsonArray(), SerializedDataTrips.class)
                : null;
        consumer.acceptTrips(trips);
    

    // It might be a constant field, but the next methods are dynamic (parameter-dependent), so let them all be similar
    // Checked exceptions are not that evil, and let the call-site decide what to do with them
    static URL afterwordsUrl()
            throws MalformedURLException 
        return new URL("https://api-v3.mojepanstwo.pl/dane/poslowie.json");
    

    // The same as above
    static URL afterwordsUrl(final int page)
            throws MalformedURLException 
        return new URL("https://api-v3.mojepanstwo.pl/dane/poslowie.json?_type=objects&page=" + page);
    

    // The same as above
    static URL tripsUrl(final int envoyId)
            throws MalformedURLException 
        return new URL("https://api-v3.mojepanstwo.pl/dane/poslowie/" + envoyId + ".json?layers[]=wyjazdy");
    

    // The same as above
    static URL expensesUrl(final int envoyId)
            throws MalformedURLException 
        return new URL("https://api-v3.mojepanstwo.pl/dane/poslowie/" + envoyId + ".json?layers[]=wydatki");
    

    // Since jsonParser is encapsulated
    static JsonElement parseJsonElement(final Reader reader) 
        return jsonParser.parse(reader);
    

    // A helper method to return a reader for the given URL
    static Reader readFrom(final URL url)
            throws IOException 
        final HttpURLConnection request = (HttpURLConnection) url.openConnection();
        request.connect();
        return new BufferedReader(new InputStreamReader((InputStream) request.getContent()));
    

    // Waits for all futures used in multi-threaded demo
    // Not sure how good this method is since I'm not an expert in concurrent programming unfortunately
    static void waitForAllFutures(final Iterable<? extends Future<?>> futures)
            throws ExecutionException, InterruptedException 
        final Iterator<? extends Future<?>> iterator = futures.iterator();
        while ( iterator.hasNext() ) 
            final Future<?> future = iterator.next();
            future.get();
            iterator.remove();
        
    


SingleThreadedDemo.java

最简单的演示。整个数据拉取在单个线程中执行,因此它往往是这里最慢的演示。这是完全线程安全的,没有字段,可以声明为单例。

final class SingleThreadedDemo
        extends AbstractDemo 

    private static final Callable<List<EnvoyData>> singleThreadedDemo = new SingleThreadedDemo();

    private SingleThreadedDemo() 
    

    static Callable<List<EnvoyData>> getSingleThreadedDemo() 
        return singleThreadedDemo;
    

    @Override
    protected List<EnvoyData> doCall()
            throws IOException 
        final List<EnvoyData> envoys = new ArrayList<>();
        for ( final JsonElement afterwordJsonElement : getAfterwords() ) 
            final JsonArray dataObjectArray = afterwordJsonElement.getAsJsonObject().get("Dataobject").getAsJsonArray();
            for ( final JsonElement dataObjectElement : (Iterable<JsonElement>) dataObjectArray::iterator ) 
                final int envoyId = dataObjectElement.getAsJsonObject().get("id").getAsInt();
                try ( final Reader expensesReader = readFrom(expensesUrl(envoyId)) ) 
                    extractPointsAndYearbooks(expensesReader, (points, yearbooks) -> 
                        // ... consume points and yearbooks here
                    );
                
                try ( final Reader tripsReader = readFrom(tripsUrl(envoyId)) ) 
                    extractTrips(tripsReader, trips -> 
                        // ... consume trips here
                    );
                
            
        
        return envoys;
    


MultiThreadedDemo.java

不幸的是我在Java并发方面真的很薄弱,可能这些多线程演示可以大大改进。这个使用这两种方法的半多线程演示:

一个用于遍历页面的线程; 多线程获取积分、年鉴和旅行数据。

另外请注意,这个演示(以及下面的另一个多线程演示)不是故障安全的:如果在提交的任务中出现任何异常,执行器服务后台线程将不会正确终止。 因此,您可能希望自己使其具有故障安全性和健壮性。

final class MultiThreadedDemo
        extends AbstractDemo 

    private final ExecutorService executorService;

    private MultiThreadedDemo(final ExecutorService executorService) 
        this.executorService = executorService;
    

    static Callable<List<EnvoyData>> getMultiThreadedDemo(final ExecutorService executorService) 
        return new MultiThreadedDemo(executorService);
    

    @Override
    protected List<EnvoyData> doCall()
            throws InterruptedException, ExecutionException, MalformedURLException 
        final List<EnvoyData> envoys = synchronizedList(new ArrayList<>());
        final Collection<Future<?>> futures = new ConcurrentLinkedQueue<>();
        for ( final JsonElement afterwordJsonElement : getAfterwords() ) 
            final JsonArray dataObjectArray = afterwordJsonElement.getAsJsonObject().get("Dataobject").getAsJsonArray();
            for ( final JsonElement dataObjectElement : (Iterable<JsonElement>) dataObjectArray::iterator ) 
                final int envoyId = dataObjectElement.getAsJsonObject().get("id").getAsJsonPrimitive().getAsInt();
                submitExtractPointsAndYearbooks(futures, envoyId);
                submitExtractTrips(futures, envoyId);
            
        
        waitForAllFutures(futures);
        return envoys;
    

    private void submitExtractPointsAndYearbooks(final Collection<? super Future<?>> futures, final int envoyId) 
        futures.add(executorService.submit(() -> 
            try ( final Reader expensesReader = readFrom(expensesUrl(envoyId)) ) 
                extractPointsAndYearbooks(expensesReader, (points, yearbooks) -> 
                    // ... consume points and yearbooks here
                );
                return null;
            
        ));
    

    private void submitExtractTrips(final Collection<? super Future<?>> futures, final int envoyId) 
        futures.add(executorService.submit(() -> 
            try ( final Reader tripsReader = readFrom(tripsUrl(envoyId)) ) 
                extractTrips(tripsReader, trips -> 
                    // ... consume trips here
                );
                return null;
            
        ));
    


MultiThreadedEstimatedPagesDemo.java

这个是前一个演示的增强版本。但是这个演示提交了执行器服务任务以迭代服务页面。为此,需要事先检测页数。拥有页数可以让https://...poslowie.json?...page=... URL 并行处理。请注意,如果找到超过 1 个页面,则下一次迭代从第 2 个页面开始,而不是重复请求。

final class MultiThreadedEstimatedPagesDemo
        extends AbstractDemo 

    private final ExecutorService executorService;

    private MultiThreadedEstimatedPagesDemo(final ExecutorService executorService) 
        this.executorService = executorService;
    

    static Callable<List<EnvoyData>> getMultiThreadedEstimatedPagesDemo(final ExecutorService executorService) 
        return new MultiThreadedEstimatedPagesDemo(executorService);
    

    @Override
    protected List<EnvoyData> doCall()
            throws IOException, JsonIOException, JsonSyntaxException, InterruptedException, ExecutionException 
        final List<EnvoyData> envoys = synchronizedList(new ArrayList<>());
        final JsonObject page1RootJsonObject;
        final int totalPages;
        try ( final Reader page1Reader = readFrom(afterwordsUrl()) ) 
            page1RootJsonObject = parseJsonElement(page1Reader).getAsJsonObject();
            totalPages = estimateTotalPages(page1RootJsonObject);
        
        final Collection<Future<?>> futures = new ConcurrentLinkedQueue<>();
        futures.add(executorService.submit(() -> 
            final JsonArray dataObjectArray = page1RootJsonObject.getAsJsonObject().get("Dataobject").getAsJsonArray();
            for ( final JsonElement dataObjectElement : (Iterable<JsonElement>) dataObjectArray::iterator ) 
                final int envoyId = dataObjectElement.getAsJsonObject().get("id").getAsInt();
                submitExtractPointsAndYearbooks(futures, envoyId);
                submitExtractTrips(futures, envoyId);
            
            return null;
        ));
        for ( int page = 2; page <= totalPages; page++ ) 
            final int finalPage = page;
            futures.add(executorService.submit(() -> 
                try ( final Reader reader = readFrom(afterwordsUrl(finalPage)) ) 
                    final JsonElement afterwordJsonElement = parseJsonElement(reader);
                    final JsonArray dataObjectArray = afterwordJsonElement.getAsJsonObject().get("Dataobject").getAsJsonArray();
                    for ( final JsonElement dataObjectElement : (Iterable<JsonElement>) dataObjectArray::iterator ) 
                        final int envoyId = dataObjectElement.getAsJsonObject().get("id").getAsInt();
                        submitExtractPointsAndYearbooks(futures, envoyId);
                        submitExtractTrips(futures, envoyId);
                    
                
                return null;
            ));
        
        waitForAllFutures(futures);
        return envoys;
    

    private static int estimateTotalPages(final JsonObject rootJsonObject) 
        final int elementsPerPage = rootJsonObject.get("Dataobject").getAsJsonArray().size();
        final int totalElements = rootJsonObject.get("Count").getAsInt();
        return (int) ceil((double) totalElements / elementsPerPage);
    

    private void submitExtractPointsAndYearbooks(final Collection<? super Future<?>> futures, final int envoyId) 
        futures.add(executorService.submit(() -> 
            try ( final Reader expensesReader = readFrom(expensesUrl(envoyId)) ) 
                extractPointsAndYearbooks(expensesReader, (points, yearbooks) -> 
                    // ... consume points and yearbooks here
                );
                return null;
            
        ));
    

    private void submitExtractTrips(final Collection<? super Future<?>> futures, final int envoyId) 
        futures.add(executorService.submit(() -> 
            try ( final Reader tripsReader = readFrom(tripsUrl(envoyId)) ) 
                extractTrips(tripsReader, trips -> 
                    // ... consume trips here
                );
                return null;
            
        ));
    


Test.java

还有演示本身:

public final class Test 

    private Test() 
    

    public static void main(final String... args)
            throws Exception 
        runSingleThreadedDemo();
        runMultiThreadedDemo();
        runMultiThreadedEstimatedPagesDemo();
    

    private static void runSingleThreadedDemo()
            throws Exception 
        final Callable<?> singleThreadedDemo = getSingleThreadedDemo();
        singleThreadedDemo.call();
    

    private static void runMultiThreadedDemo()
            throws Exception 
        final ExecutorService executorService = newFixedThreadPool(getRuntime().availableProcessors());
        final Callable<?> demo = getMultiThreadedDemo(executorService);
        demo.call();
        executorService.shutdown();
    

    private static void runMultiThreadedEstimatedPagesDemo()
            throws Exception 
        final ExecutorService executorService = newFixedThreadPool(getRuntime().availableProcessors());
        final Callable<?> demo = getMultiThreadedEstimatedPagesDemo(executorService);
        demo.call();
        executorService.shutdown();
    


【讨论】:

谢谢!你的帖子对我帮助很大。:)

以上是关于JsonParser 的 Parse 方法大大减慢了我的代码的主要内容,如果未能解决你的问题,请参考以下文章

使用 sqlalchemy 会话执行 sql 会大大减慢执行时间

如何在不减慢执行速度的情况下分析 Android 应用程序的执行?

图片下载会减慢 UIActivityViewController 的出现时间

图像显着减慢 Rspec 请求规范

为啥 TensorFlow 的 `tf.data` 包会减慢我的代码速度?

如何从 JsonParser (Jackson Json) 获取底层字符串