大数据拆分处理方式
Posted 熊猫太郎
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了大数据拆分处理方式相关的知识,希望对你有一定的参考价值。
思路:
1:参考分页方式将数据拆成指定大小线程数
2:在每个独立的线程中去读取数据并处理数据
步骤一实现
public class IndexIntiTools { public static AtomicInteger runflag=new AtomicInteger(); //用于测试 public static List<Object> syncList=new CopyOnWriteArrayList<Object>(); private static int idxThreadCount = 10; private static Executor ex = Executors.newFixedThreadPool(idxThreadCount); /** * 构建索引 * @param hql * @param size */ public static void build(String hql, int size) { int pagecount = idxThreadCount; int count = size / pagecount; int mod = size % pagecount; List<Runnable> runList = new ArrayList<Runnable>(pagecount); IndexExecutor idxExecutor; for (int i = 0; i < pagecount; i++) { if (i == (pagecount - 1)) { idxExecutor = new IndexExecutor(hql, i * count, count + mod); } else { idxExecutor = new IndexExecutor(hql, i * count, count); } runList.add(idxExecutor); } for (Runnable runnable : runList) { runflag.incrementAndGet(); ex.execute(runnable); } } }
步骤二实现
public class IndexExecutor implements Runnable{ private static final Log log = LogFactory.getLog(IndexExecutor.class); private int start; private int limit; private String hql; public IndexExecutor(String hql,int start, int limit) { this.hql=hql; this.start = start; this.limit = limit; } @Override public void run() { log.info("hql:"+hql+",start:"+start+",limit"+limit); //查询数据库(hql,start,limit); log.info(list); IndexIntiTools.syncList.addAll(list); IndexIntiTools.runflag.decrementAndGet(); } }
以上是关于大数据拆分处理方式的主要内容,如果未能解决你的问题,请参考以下文章
使用 Hadoop Map reduce 处理和拆分大数据?
嵌套 GNU Parallel 处理多个大文件并将每个文件数据拆分为队列处理