AI辅助，数据标注行业发展的新引擎丨曼孚科技

Posted 2020-12-19

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了AI辅助，数据标注行业发展的新引擎丨曼孚科技相关的知识，希望对你有一定的参考价值。

（一）简介

Java 8 API添加了一个新的抽象称为流Stream，可以让你以一种声明的方式处理数据。
Stream 使用一种类似用 SQL 语句从数据库查询数据的直观方式来提供一种对 Java 集合运算和表达的高阶抽象。
Stream API可以极大提高Java程序员的生产力，让程序员写出高效率、干净、简洁的代码。
这种风格将要处理的元素集合看作一种流，流在管道中传输，并且可以在管道的节点上进行处理，比如筛选，排序，聚合等。
元素流在管道中经过中间操作（intermediate operation）的处理，最后由最终操作(terminal operation)得到前面处理的结果。

（二）分类

创建Stream：通过stream()方法，取得集合对象的数据集。
中间操作（Intermediate）：通过一系列中间方法，对数据集进行过滤、检索等数据集的再次处理。如，使用filter()方法来对数据集进行过滤。
终止操作（Terminal）：通过最终方法完成对数据集中元素的处理。如，使用forEach()完成对过滤后元素的打印。

short-circuiting：对于一个 intermediate 操作，如果它接受的是一个无限大（infinite/unbounded）的 Stream，但返回一个有限的新 Stream；对于一个 terminal 操作，如果它接受的是一个无限大的 Stream，但能在有限的时间计算出结果；当操作一个无限大的 Stream，而又希望在有限时间内完成操作，则在管道内拥有一个 short-circuiting 操作是必要非充分条件。

操作	方法
Intermediate	concat()、distinct()、filter()、flatMap()、map()、peek()、skip()、sorted()、parallel()、sequential()、unordered()、limit()
Terminal	collect()、count()、forEach()、forEachOrdered()、max()、min()、reduce()、toArray()、allMatch()、anyMatch()、noneMatch()、findAny()、findFirst()、iterator()
Short-circuiting	allMatch()、anyMatch()、noneMatch()、findAny()、findFirst()、limit()

（三）使用

1. 创建Stream

1.1 通过数组创建

1.1.1 of

生成的Stream是有限长度的，Stream的长度为其内的元素个数。

Stream<Integer> integerStream = Stream.of(1, 2, 3);
Stream<String> stringStream = Stream.of("a");

1.1.2 generator

返回一个无限长度的Stream,其元素由Supplier接口的提供。在Supplier是一个函数接口，只封装了一个get()方法，其用来返回任何泛型的值，该结果在不同的时间内，返回的可能相同也可能不相同，没有特殊的要求。

Stream<Double> streamA = Stream.generate(new Supplier<Double>() {
    @Override
    public Double get() {
        return java.lang.Math.random();
    }
});
Stream<Double> streamB = Stream.generate(() -> java.lang.Math.random());
Stream<Double> streamC = Stream.generate(java.lang.Math::random);

1.1.3 iterate

返回的是一个无限长度的Stream，与generate方法不同的是，其是通过函数f迭代对给指定的元素种子而产生无限连续有序Stream，其中包含的元素可以认为是：seed，f(seed),f(f(seed))无限循环。

Stream<Integer> stream = Stream.iterate(1, item -> item + 1);

1.1.4 empty

返回一个空的顺序Stream，该Stream里面不包含元素项。

Stream<Object> empty = Stream.empty();

1.2 Collection接口和Arrays默认方法

1.2.1 Collection接口

List<Integer> list = new ArrayList<>();
list.add(1);
list.add(2);
list.add(3);
Stream<Integer> stream = list.stream();

1.2.2 Arrays

int[] arr = {1,2,3};
IntStream stream = Arrays.stream(arr);

2. Intermediate

对Stream做出相应转换及限制流，即将原Stream转换为一个新的Stream。

2.1 concat()

将两个Stream连接在一起，合成一个Stream。若两个输入的Stream都时排序的，则新Stream也是排序的；若输入的Stream中任何一个是并行的，则新的Stream也是并行的；若关闭新的Stream时，原两个输入的Stream都将执行关闭处理。

Stream<Integer> concat = Stream.concat(Stream.of(1, 2, 3), Stream.of(4, 5));

2.2 distinct()

去除掉原Stream中重复的元素，生成的新Stream中没有没有重复的元素。

Stream<Integer> distinct = Stream.of(1, 2, 3, 3, 4, 5, 4).distinct();

2.3 filter()

对原Stream按照指定条件过滤，在新建的Stream中，只包含满足条件的元素，将不满足条件的元素过滤掉。

Stream<Integer> stream = Stream.of(1, 2, 3, 3, 4, 5, 4).filter(item -> item <= 3);

2.4 map()

对于Stream中包含的元素使用给定的转换函数进行转换操作，新生成的Stream只包含转换生成的元素。

官方已封装好了，三种变形：mapToDouble，mapToInt，mapToLong。

Stream<String> streamA = Stream.of("a", "b", "c").map(item -> item.toUpperCase()); IntStream streamB = Stream.of("1", "2", "3").mapToInt(item -> Integer.valueOf(item));

2.5 flatMap()

接收一个函数作为参数，将流中的每个值都换成另一个流，然后把所有流连接成一个流

String[] arr1 = {"a", "b", "c", "d"};
String[] arr2 = {"e", "f", "c", "d"};
String[] arr3 = {"h", "j", "c", "d"};
Stream.of(arr1, arr2, arr3).flatMap(Arrays::stream).forEach(System.out::println);

2.6 peek()

偷窥流内的数据

2.7 skip()

跳过前n个数据

String[] arr = {"a", "b", "d", "c"};
Arrays.stream(arr).skip(2).forEach(System.out::println);

2.8 sorted()

对流进行排序

String[] arr = {"a", "b", "d", "c"};
Arrays.stream(arr).sorted(Comparator.naturalOrder()).forEach(System.out::println);

3. Short-circuiting

3.1 allMatch()

是否所有元素匹配

String[] arr = new String[]{"b", "ab", "abc", "abcd", "abcde"};
Boolean aBoolean = Stream.of(arr).allMatch(x -> x.startsWith("a"));
System.out.println(aBoolean);

3.2 anyMatch()

是否含有匹配的元素

String[] arr = new String[]{"b", "ab", "abc", "abcd", "abcde"};
Boolean aBoolean = Stream.of(arr).anyMatch(x -> x.startsWith("a"));
System.out.println(aBoolean);

3.3 noneMatch()

是否没有匹配的元素

String[] arr = new String[]{"b", "ab", "abc", "abcd", "abcde"};
Boolean aBoolean = Stream.of(arr).noneMatch(x -> x.startsWith("a"));
System.out.println(aBoolean);

3.4 findAny()

查找满足条件的元素，只要发现第一个匹配的就会直接结束整个运算，适用并行流

String[] arr = new String[]{"b", "ab", "abc", "abcd", "abcde"};
Optional<String> optional = Stream.of(arr).parallel().filter(x -> x.length() > 3).findAny();
optional.ifPresent(System.out::println);

3.5 findFirst()

查找第一个

String[] arr = new String[]{"b", "ab", "abc", "abcd", "abcde"};
String str = Stream.of(arr).parallel().filter(x -> x.length() > 3).findFirst().orElse("noghing");

3.6 limit()

限制从流中获得前n个数据

String[] arr = {"a", "b", "d", "c"};
Arrays.stream(arr).limit(2).forEach(System.out::println);

4. Terminal

4.1 collect()

收集结果

String[] arr = new String[]{"b", "ab", "abc", "abcd", "abcde"};
//长度大于3的转为list
List<String> list = Stream.of(arr).filter(x -> x.length() > 3).collect(toList());
list.forEach(System.out::println);
//长度大于3的转为list
Set<String> set = Stream.of(arr).filter(x -> x.length() > 3).collect(toSet());
set.forEach(System.out::println);
//学生类的name为key，score为value，如果key相同对value求和存入
Map<String, Integer> map = Arrays.stream(students).collect(toMap(Student::getName, Student::getScore, (s, a) -> s + a));
map.forEach((x, y) -> System.out.println(x + "->" + y));
//生成指定类型
String[] arr = new String[]{"b", "ab", "abc", "abcd", "abcde"};
HashSet<String> s = Arrays.stream(arr).collect(toCollection(HashSet::new));
s.forEach(System.out::println);

//按name进行分类
Map<String, List<Student>> map = Arrays.stream(students).collect(groupingBy(Student::getName));
map.forEach((x, y) -> System.out.println(x + "->" + y));
//按成绩是否大于50分成两类，key为Boolean
Map<Boolean, List<Student>> map = Arrays.stream(students).collect(partitioningBy(x -> x.getScore() > 50));
map.forEach((x, y) -> System.out.println(x + "->" + y));

4.2 count()

统计数量

String[] arr = {"a", "b", "d", "c"};
System.out.println(Stream.of(arr).count());

4.3 forEach()

对流中的每个元素都执行操作

String[] arr = {"a", "b", "c", "d"};
Arrays.stream(arr).forEach(System.out::println);//abcd
Arrays.stream(arr).parallel().forEach(System.out::println);//cdab

4.4 forEachOrdered()

对流中的每个元素都执行操作，按照顺序进行

String[] arr = {"a", "b", "c", "d"};
Arrays.stream(arr).forEachOrdered(System.out::println);//abcd
Arrays.stream(arr).parallel().forEachOrdered(System.out::println);//abcd

4.5 max()

获取最大值

String[] arr = {"b","ab","abc","abcd","abcde"};
Stream.of(arr).max(Comparator.comparing(String::length)).ifPresent(System.out::println);

4.6 min()

获取最小值

String[] arr = {"b","ab","abc","abcd","abcde"};
Stream.of(arr).min(Comparator.comparing(String::length)).ifPresent(System.out::println);

4.7 reduce()

根据指定的计算模型将Stream中的值计算得到一个最终结果

Optional<Integer> optional = Stream.of(1, 2, 3).reduce((x, y) -> x + y);
System.out.println(optional.get());

4.7 toArray()

生成数组

String[] arr = new String[]{"b", "ab", "abc", "abcd", "abcde"};
String[] strings = Stream.of(arr).filter(x -> x.length() > 3).toArray(String[]::new);
System.out.println(Arrays.toString(strings));

http://cnblogs.com/andywithu/p/7404101.html

以上是关于AI辅助，数据标注行业发展的新引擎丨曼孚科技的主要内容，如果未能解决你的问题，请参考以下文章