java codility 训练基因组范围查询

Posted

技术标签:

【中文标题】java codility 训练基因组范围查询【英文标题】:java codility training Genomic-range-query 【发布时间】:2013-10-23 21:35:28 【问题描述】:

任务是:

给出了一个非空的零索引字符串 S。字符串 S 由大写英文字母 A、C、G、T 集合中的 N 个字符组成。

这个字符串实际上代表一个DNA序列,大写字母代表单个核苷酸。

你还得到了由 M 个整数组成的非空零索引数组 P 和 Q。这些数组代表关于最小核苷酸的查询。我们将字符串 S 的字母表示为数组 P 和 Q 中的整数 1、2、3、4,其中 A = 1、C = 2、G = 3、T = 4,我们假设 A

查询 K 要求您从 (P[K], Q[K]) 0 ≤ P[i] ≤ Q[i]

例如,考虑字符串 S = GACACCATA 和数组 P、Q,这样:

P[0] = 0    Q[0] = 8
P[1] = 0    Q[1] = 2
P[2] = 4    Q[2] = 5
P[3] = 7    Q[3] = 7

这些范围内的最少核苷酸如下:

    (0, 8) is A identified by 1,
    (0, 2) is A identified by 1,
    (4, 5) is C identified by 2,
    (7, 7) is T identified by 4.

写一个函数:

class Solution  public int[] solution(String S, int[] P, int[] Q);  

给定一个由 N 个字符组成的非空零索引字符串 S 和两个由 M 个整数组成的非空零索引数组 P 和 Q,返回一个由 M 个字符组成的数组,指定所有查询的连续答案.

序列应返回为:

    a Results structure (in C), or
    a vector of integers (in C++), or
    a Results record (in Pascal), or
    an array of integers (in any other programming language).

例如,给定字符串 S = GACACCATA 和数组 P、Q,这样:

P[0] = 0    Q[0] = 8
P[1] = 0    Q[1] = 2
P[2] = 4    Q[2] = 5
P[3] = 7    Q[3] = 7

该函数应返回值 [1, 1, 2, 4],如上所述。

假设:

    N is an integer within the range [1..100,000];
    M is an integer within the range [1..50,000];
    each element of array P, Q is an integer within the range [0..N − 1];
    P[i] ≤ Q[i];
    string S consists only of upper-case English letters A, C, G, T.

复杂性:

    expected worst-case time complexity is O(N+M);
    expected worst-case space complexity is O(N), 
         beyond input storage 
         (not counting the storage required for input arguments).

输入数组的元素可以修改。

我的解决方案是:

class Solution 
    public int[] solution(String S, int[] P, int[] Q) 
        final  char c[] = S.toCharArray();
        final int answer[] = new int[P.length];
        int tempAnswer;
        char tempC;

        for (int iii = 0; iii < P.length; iii++) 
            tempAnswer = 4;
            for (int zzz = P[iii]; zzz <= Q[iii]; zzz++) 
                tempC = c[zzz];
                if (tempC == 'A') 
                    tempAnswer = 1;
                    break;
                 else if (tempC == 'C') 
                    if (tempAnswer > 2) 
                        tempAnswer = 2;
                    
                 else if (tempC == 'G') 
                    if (tempAnswer > 3) 
                        tempAnswer = 3;
                    

                
            
            answer[iii] = tempAnswer;
        

        return answer;
    

这不是最佳的,我相信它应该在一个循环内完成,任何提示我该如何实现它?

您可以在这里检查解决方案的质量https://codility.com/train/ 测试名称是 Genomic-range-query。

【问题讨论】:

这可能不是这个问题的正确位置。试试Codereview。 供将来参考,此问题称为Range Minimum Query,如您的回答所示,您可以在 O(1) 中处理范围查询,给定 O(N) 预处理。 【参考方案1】:

这是在 codility.com 中获得 100 分中的 100 分的解决方案。请阅读前缀和以了解解决方案:

public static int[] solveGenomicRange(String S, int[] P, int[] Q) 
        //used jagged array to hold the prefix sums of each A, C and G genoms
        //we don't need to get prefix sums of T, you will see why.
        int[][] genoms = new int[3][S.length()+1];
        //if the char is found in the index i, then we set it to be 1 else they are 0
        //3 short values are needed for this reason
        short a, c, g;
        for (int i=0; i<S.length(); i++) 
            a = 0; c = 0; g = 0;
            if ('A' == (S.charAt(i))) 
                a=1;
            
            if ('C' == (S.charAt(i))) 
                c=1;
            
            if ('G' == (S.charAt(i))) 
                g=1;
            
            //here we calculate prefix sums. To learn what's prefix sums look at here https://codility.com/media/train/3-PrefixSums.pdf
            genoms[0][i+1] = genoms[0][i] + a;
            genoms[1][i+1] = genoms[1][i] + c;
            genoms[2][i+1] = genoms[2][i] + g;
        

        int[] result = new int[P.length];
        //here we go through the provided P[] and Q[] arrays as intervals
        for (int i=0; i<P.length; i++) 
            int fromIndex = P[i];
            //we need to add 1 to Q[i], 
            //because our genoms[0][0], genoms[1][0] and genoms[2][0]
            //have 0 values by default, look above genoms[0][i+1] = genoms[0][i] + a; 
            int toIndex = Q[i]+1;
            if (genoms[0][toIndex] - genoms[0][fromIndex] > 0) 
                result[i] = 1;
             else if (genoms[1][toIndex] - genoms[1][fromIndex] > 0) 
                result[i] = 2;
             else if (genoms[2][toIndex] - genoms[2][fromIndex] > 0) 
                result[i] = 3;
             else 
                result[i] = 4;
            
        

        return result;
    

【讨论】:

这是一个聪明的解决方案。尽管在代码中,它暗示使用前缀和,但很难应用于这种情况。谢谢。 这些 cmets 很容易理解 :) 笔误。 fromIndex = P[i]+1; //+1 is not required 不错的解决方案 :),虽然我不明白为什么 toIndex = Q[i] + 1 gmuhammad,我把 cmets 放在上面,我们必须加 1,因为我们的锯齿状数组在 genoms[n][0] 处包含零,默认情况下(这是前缀总和在开始)。因此,锯齿状数组的大小要大一个元素以适应那些零 - int[][] genoms = new int[3][S.length()+1]; (S.length()+1,长度加1以保持默认零)【参考方案2】:

使用 cmets 的 JS 中简单、优雅、特定领域的 100/100 解决方案!

function solution(S, P, Q) 
    var N = S.length, M = P.length;

    // dictionary to map nucleotide to impact factor
    var impact = A : 1, C : 2, G : 3, T : 4;

    // nucleotide total count in DNA
    var currCounter = A : 0, C : 0, G : 0, T : 0;

    // how many times nucleotide repeats at the moment we reach S[i]
    var counters = [];

    // result
    var minImpact = [];

    var i;

    // count nucleotides
    for(i = 0; i <= N; i++) 
        counters.push(A: currCounter.A, C: currCounter.C, G: currCounter.G);
        currCounter[S[i]]++;
    

    // for every query
    for(i = 0; i < M; i++) 
        var from = P[i], to = Q[i] + 1;

        // compare count of A at the start of query with count at the end of equry
        // if counter was changed then query contains A
        if(counters[to].A - counters[from].A > 0) 
            minImpact.push(impact.A);
        
        // same things for C and others nucleotides with higher impact factor
        else if(counters[to].C - counters[from].C > 0) 
            minImpact.push(impact.C);
        
        else if(counters[to].G - counters[from].G > 0) 
            minImpact.push(impact.G);
        
        else  // one of the counters MUST be changed, so its T
            minImpact.push(impact.T);
        
    

    return minImpact;

【讨论】:

【参考方案3】:

Java,100/100,但没有累积/前缀总和!我将较低 3 个核苷酸的最后一次出现索引隐藏在数组“map”中。稍后我检查最后一个索引是否在 P-Q 之间。如果是,则返回核苷酸,如果未找到,则为顶部的(T):

class Solution 

int[][] lastOccurrencesMap;

public int[] solution(String S, int[] P, int[] Q) 
    int N = S.length();
    int M = P.length;

    int[] result = new int[M];
    lastOccurrencesMap = new int[3][N];
    int lastA = -1;
    int lastC = -1;
    int lastG = -1;

    for (int i = 0; i < N; i++) 
        char c = S.charAt(i);

        if (c == 'A') 
            lastA = i;
         else if (c == 'C') 
            lastC = i;
         else if (c == 'G') 
            lastG = i;
        

        lastOccurrencesMap[0][i] = lastA;
        lastOccurrencesMap[1][i] = lastC;
        lastOccurrencesMap[2][i] = lastG;
    

    for (int i = 0; i < M; i++) 
        int startIndex = P[i];
        int endIndex = Q[i];

        int minimum = 4;
        for (int n = 0; n < 3; n++) 
            int lastOccurence = getLastNucleotideOccurrence(startIndex, endIndex, n);
            if (lastOccurence != 0) 
                minimum = n + 1; 
                break;
            
        

        result[i] = minimum;
    
    return result;


int getLastNucleotideOccurrence(int startIndex, int endIndex, int nucleotideIndex) 
    int[] lastOccurrences = lastOccurrencesMap[nucleotideIndex];
    int endValueLastOccurenceIndex = lastOccurrences[endIndex];
    if (endValueLastOccurenceIndex >= startIndex) 
        return nucleotideIndex + 1;
     else 
        return 0;
    


【讨论】:

【参考方案4】:

这里是解决方案,假设有人仍然感兴趣。

class Solution 
        public int[] solution(String S, int[] P, int[] Q) 
            int[] answer = new int[P.length];
            char[] chars = S.toCharArray();
            int[][] cumulativeAnswers = new int[4][chars.length + 1];

            for (int iii = 0; iii < chars.length; iii++) 
                if (iii > 0) 
                    for (int zzz = 0; zzz < 4; zzz++) 
                        cumulativeAnswers[zzz][iii + 1] = cumulativeAnswers[zzz][iii];
                    
                

                switch (chars[iii]) 
                    case 'A':
                        cumulativeAnswers[0][iii + 1]++;
                        break;
                    case 'C':
                        cumulativeAnswers[1][iii + 1]++;
                        break;
                    case 'G':
                        cumulativeAnswers[2][iii + 1]++;
                        break;
                    case 'T':
                        cumulativeAnswers[3][iii + 1]++;
                        break;
                
            

            for (int iii = 0; iii < P.length; iii++) 
                for (int zzz = 0; zzz < 4; zzz++) 

                    if ((cumulativeAnswers[zzz][Q[iii] + 1] - cumulativeAnswers[zzz][P[iii]]) > 0) 
                        answer[iii] = zzz + 1;
                        break;
                    

                
            

            return answer;
        
    

【讨论】:

算法的第一部分为每个符号 x 和每个输入字符 y 创建一个计算矩阵。第二部分是诀窍的关键。我试图理解if ((cumulativeAnswers[zzz][Q[iii] + 1] - cumulativeAnswers[zzz][P[iii]]) &gt; 0) answer[iii] = zzz + 1; break; 为什么会这样? 也许我理解每一列代表每个符号的出现次数。当我们发现同一行的两个元素之间出现多次为正时,这意味着我们在该位置添加了该元素。因此,如果我们发现了一个积极的差异,我们发现了最小值!【参考方案5】:

如果有人关心 C:

#include <string.h>

struct Results solution(char *S, int P[], int Q[], int M)     
    int i, a, b, N, *pA, *pC, *pG;
    struct Results result;

    result.A = malloc(sizeof(int) * M);
    result.M = M;

    // calculate prefix sums
    N = strlen(S);
    pA = malloc(sizeof(int) * N);
    pC = malloc(sizeof(int) * N);
    pG = malloc(sizeof(int) * N);
    pA[0] = S[0] == 'A' ? 1 : 0;
    pC[0] = S[0] == 'C' ? 1 : 0;
    pG[0] = S[0] == 'G' ? 1 : 0;
    for (i = 1; i < N; i++) 
        pA[i] = pA[i - 1] + (S[i] == 'A' ? 1 : 0);
        pC[i] = pC[i - 1] + (S[i] == 'C' ? 1 : 0);
        pG[i] = pG[i - 1] + (S[i] == 'G' ? 1 : 0);
    

    for (i = 0; i < M; i++) 
        a = P[i] - 1;
        b = Q[i];

        if ((pA[b] - pA[a]) > 0) 
            result.A[i] = 1;
         else if ((pC[b] - pC[a]) > 0) 
            result.A[i] = 2;
         else if ((pG[b] - pG[a]) > 0) 
            result.A[i] = 3;
         else 
            result.A[i] = 4;
        
    


    return result;

【讨论】:

【参考方案6】:

这是我使用分段树 O(n)+O(log n)+O(M) 时间的解决方案

public class DNAseq 


public static void main(String[] args) 
    String S="CAGCCTA";
    int[] P=2, 5, 0;
    int[] Q=4, 5, 6;
    int [] results=solution(S,P,Q);
    System.out.println(results[0]);


static class segmentNode
    int l;
    int r;
    int min;
    segmentNode left;
    segmentNode right;




public static segmentNode buildTree(int[] arr,int l,int r)
    if(l==r)
        segmentNode n=new segmentNode();
        n.l=l;
        n.r=r;
        n.min=arr[l];
        return n;
    
    int mid=l+(r-l)/2;
    segmentNode le=buildTree(arr,l,mid);
    segmentNode re=buildTree(arr,mid+1,r);
    segmentNode root=new segmentNode();
    root.left=le;
    root.right=re;
    root.l=le.l;
    root.r=re.r;

    root.min=Math.min(le.min,re.min);

    return root;


public static int getMin(segmentNode root,int l,int r)
    if(root.l>r || root.r<l)
        return Integer.MAX_VALUE;
    
    if(root.l>=l&& root.r<=r) 
        return root.min;
    
    return Math.min(getMin(root.left,l,r),getMin(root.right,l,r));

public static int[] solution(String S, int[] P, int[] Q) 
    int[] arr=new int[S.length()];
    for(int i=0;i<S.length();i++)
        switch (S.charAt(i)) 
        case 'A':
            arr[i]=1;
            break;
        case 'C':
            arr[i]=2;
            break;
        case 'G':
            arr[i]=3;
            break;
        case 'T':
            arr[i]=4;
            break;
        default:
            break;
        
    

    segmentNode root=buildTree(arr,0,S.length()-1);
    int[] result=new int[P.length];
    for(int i=0;i<P.length;i++)
        result[i]=getMin(root,P[i],Q[i]);
    
    return result;
 

【讨论】:

我喜欢这个想法,这个问题非常适合 SegmentTree。 我想知道您对“extreme_large - all max range”的性能结果是什么。 SegmentTree 真的是解决这个问题的完美主意。感谢您指出这个想法。【参考方案7】:

这是我的解决方案。得到 %100 。当然,我需要先检查和研究一点前缀和。

public int[] solution(String S, int[] P, int[] Q)

        int[] result = new int[P.length];

        int[] factor1 = new int[S.length()];
        int[] factor2 = new int[S.length()];
        int[] factor3 = new int[S.length()];
        int[] factor4 = new int[S.length()];

        int factor1Sum = 0;
        int factor2Sum = 0;
        int factor3Sum = 0;
        int factor4Sum = 0;

        for(int i=0; i<S.length(); i++)
            switch (S.charAt(i)) 
            case 'A':
                factor1Sum++;
                break;
            case 'C':
                factor2Sum++;
                break;
            case 'G':
                factor3Sum++;
                break;
            case 'T':
                factor4Sum++;
                break;
            default:
                break;
            
            factor1[i] = factor1Sum;
            factor2[i] = factor2Sum;
            factor3[i] = factor3Sum;
            factor4[i] = factor4Sum;
        

        for(int i=0; i<P.length; i++)

            int start = P[i];
            int end = Q[i];

            if(start == 0)
                if(factor1[end] > 0)
                    result[i] = 1;
                else if(factor2[end] > 0)
                    result[i] = 2;
                else if(factor3[end] > 0)
                    result[i] = 3;
                else
                    result[i] = 4;
                
            else
                if(factor1[end] > factor1[start-1])
                    result[i] = 1;
                else if(factor2[end] > factor2[start-1])
                    result[i] = 2;
                else if(factor3[end] > factor3[start-1])
                    result[i] = 3;
                else
                    result[i] = 4;
                
            

        

        return result;
    

【讨论】:

【参考方案8】:

这是我的 javascript 解决方案,在 Codility 上获得了 100% 的好评:

function solution(S, P, Q) 
    let total = [];
    let min;

    for (let i = 0; i < P.length; i++) 
        const substring = S.slice(P[i], Q[i] + 1);
        if (substring.includes('A')) 
            min = 1;
         else if (substring.includes('C')) 
            min = 2;
         else if (substring.includes('G')) 
            min = 3;
         else if (substring.includes('T')) 
            min = 4;
        
        total.push(min);
    
    return total;

【讨论】:

我们必须检查String.includes() 方法的时间复杂度。如果是O( String.length ),那么这个解决方案就是O ( P * S ),这并不理想。我也不确定String.slice() 的时间复杂度。 这个的时间复杂度不是很好,但是我很惊讶你可以在代码中获得 100% 的覆盖率。请记住使用 slice 并包括进一步的大声笑【参考方案9】:

这是一个 C# 解决方案,基本思想与其他答案几乎相同,但可能更简洁:

using System;

class Solution

    public int[] solution(string S, int[] P, int[] Q)
    
        int N = S.Length;
        int M = P.Length;
        char[] chars = 'A','C','G','T';

        //Calculate accumulates
        int[,] accum = new int[3, N+1];
        for (int i = 0; i <= 2; i++)
        
            for (int j = 0; j < N; j++)
            
                if(S[j] == chars[i]) accum[i, j+1] = accum[i, j] + 1;
                else accum[i, j+1] = accum[i, j];
            
        

        //Get minimal nucleotides for the given ranges
        int diff;
        int[] minimums = new int[M];
        for (int i = 0; i < M; i++)
        
            minimums[i] = 4;
            for (int j = 0; j <= 2; j++)
            
                diff = accum[j, Q[i]+1] - accum[j, P[i]];
                if (diff > 0)
                
                    minimums[i] = j+1;
                    break;
                
            
        

        return minimums;
    

【讨论】:

欢迎来到 SO。不要只是发布代码。你可以给一个 sudo 代码。因为问题是在java中,或者你可以解释算法。祝你好运。 :)【参考方案10】:
import java.util.Arrays;
import java.util.HashMap;
class Solution 

   static HashMap<Character, Integer > characterMapping = new HashMap<Character, Integer>()
    put('A',1);
    put('C',2);
    put('G',3);
    put('T',4);
  ;

  public static int minimum(int[] arr) 

    if (arr.length ==1) return arr[0];

    int smallestIndex = 0;
    for (int index = 0; index<arr.length; index++) 
      if (arr[index]<arr[smallestIndex]) smallestIndex=index;
    
    return arr[smallestIndex];
  

    public int[] solution(String S, int[] P, int[] Q) 
        final char[] characterInput = S.toCharArray();
    final int[] integerInput = new int[characterInput.length];

    for(int counter=0; counter < characterInput.length; counter++) 
      integerInput[counter] = characterMapping.get(characterInput[counter]);
    

    int[] result = new int[P.length];

    //assuming P and Q have the same length
    for(int index =0; index<P.length; index++) 

      if (P[index]==Q[index]) 
        result[index] = integerInput[P[index]];
        break;
      
      final int[] subArray = Arrays.copyOfRange(integerInput, P[index], Q[index]+1);
      final int minimumValue = minimum(subArray);
      result[index]= minimumValue;
    
    return result;
    

【讨论】:

【参考方案11】:

这是 100% Scala 解决方案:

def solution(S: String, P: Array[Int], Q: Array[Int]): Array[Int] = 


    val resp = for(ind <- 0 to P.length-1) yield 

      val sub= S.substring(P(ind),Q(ind)+1)


      var factor = 4

      if(sub.contains("A")) factor=1
      else
        if(sub.contains("C")) factor=2
        else
          if(sub.contains("G")) factor=3
        
      
      factor

    

    return resp.toArray

  

和性能:https://codility.com/demo/results/trainingEUR4XP-425/

【讨论】:

【参考方案12】:

希望这会有所帮助。

public int[] solution(String S, int[] P, int[] K) 
        // write your code in Java SE 8
        char[] sc = S.toCharArray();
        int[] A = new int[sc.length];
        int[] G = new int[sc.length];
        int[] C = new int[sc.length];

        int prevA =-1,prevG=-1,prevC=-1;

        for(int i=0;i<sc.length;i++)
            if(sc[i]=='A')
               prevA=i;
            else if(sc[i] == 'G')
               prevG=i;
            else if(sc[i] =='C')
               prevC=i;
            A[i] = prevA;
            G[i] = prevG;
            C[i] = prevC;
            //System.out.println(A[i]+ " "+G[i]+" "+C[i]);

        
        int[] result = new int[P.length];

        for(int i=0;i<P.length;i++)
            //System.out.println(A[P[i]]+ " "+A[K[i]]+" "+C[P[i]]+" "+C[K[i]]+" "+P[i]+" "+K[i]);

            if(A[K[i]] >=P[i] && A[K[i]] <=K[i])
                  result[i] =1;
            
            else if(C[K[i]] >=P[i] && C[K[i]] <=K[i])
                  result[i] =2;
            else if(G[K[i]] >=P[i] && G[K[i]] <=K[i])
                  result[i] =3;
            
            else
                result[i]=4;
            
        

        return result;
    

【讨论】:

【参考方案13】:

如果有人仍然对这个练习感兴趣,我分享我的 Python 解决方案(100/100 in Codility)

def solution(S, P, Q):

    count = []
    for i in range(3):
        count.append([0]*(len(S)+1))

    for index, i in enumerate(S):
        count[0][index+1] = count[0][index] + ( i =='A')
        count[1][index+1] = count[1][index] + ( i =='C')
        count[2][index+1] = count[2][index] + ( i =='G')

    result = []

    for i in range(len(P)):
      start = P[i]
      end = Q[i]+1

      if count[0][end] - count[0][start]:
          result.append(1)
      elif count[1][end] - count[1][start]:
          result.append(2)
      elif count[2][end] - count[2][start]:
          result.append(3)
      else:
          result.append(4)

    return result

【讨论】:

终于有一个简洁的 Python 解决方案了!谢谢【参考方案14】:

带有解释的 Python 解决方案

这个想法是为每个核苷酸 X 保存一个辅助数组,位置 i(忽略零)是到目前为止 X 出现的次数。因此,如果我们需要从位置 f 到位置 t 的 X 出现次数,我们可以采用以下等式:

辅助(t) - 辅助(f)

时间复杂度为:

O(N+M)

def solution(S, P, Q):
    n = len(S)
    m = len(P)

    aux = [[0 for i in range(n+1)] for i in [0,1,2]]

    for i,c in enumerate(S):
        aux[0][i+1] = aux[0][i] + ( c == 'A' )
        aux[1][i+1] = aux[1][i] + ( c == 'C' )
        aux[2][i+1] = aux[2][i] + ( c == 'G' )

    result = []

    for i in range(m):
        fromIndex , toIndex = P[i] , Q[i] +1
        if   aux[0][toIndex] - aux[0][fromIndex] > 0:
            r = 1
        elif aux[1][toIndex] - aux[1][fromIndex] > 0:
            r = 2
        elif aux[2][toIndex] - aux[2][fromIndex] > 0:
            r = 3
        else:
            r = 4

        result.append(r)

    return result

【讨论】:

【参考方案15】:

这是针对同一问题的 Swift 4 解决方案。它基于上面@codebusta的解决方案:

public func solution(_ S : inout String, _ P : inout [Int], _ Q : inout [Int]) -> [Int] 
var impacts = [Int]()
var prefixSum = [[Int]]()
for _ in 0..<3 
    let array = Array(repeating: 0, count: S.count + 1)
    prefixSum.append(array)


for (index, character) in S.enumerated() 
    var a = 0
    var c = 0
    var g = 0

    switch character 
    case "A":
        a = 1

    case "C":
        c = 1

    case "G":
        g = 1

    default:
        break
    

    prefixSum[0][index + 1] = prefixSum[0][index] + a
    prefixSum[1][index + 1] = prefixSum[1][index] + c
    prefixSum[2][index + 1] = prefixSum[2][index] + g


for tuple in zip(P, Q) 
    if  prefixSum[0][tuple.1 + 1] - prefixSum[0][tuple.0] > 0 
        impacts.append(1)
    
    else if prefixSum[1][tuple.1 + 1] - prefixSum[1][tuple.0] > 0 
        impacts.append(2)
    
    else if prefixSum[2][tuple.1 + 1] - prefixSum[2][tuple.0] > 0 
        impacts.append(3)
    
    else 
        impacts.append(4)
    


   return impacts
 

【讨论】:

【参考方案16】:

pshemek 的解决方案将自身限制为空间复杂度 (O(N)) - 即使使用二维数组和答案数组,因为常量 (4) 用于二维数组。该解决方案也适合计算复杂度 - 而我的是 O (N^2) - 尽管实际计算复杂度要低得多,因为它跳过了包括最小值的整个范围。

我试了一下 - 但我的最终使用了更多空间 - 但对我来说更直观(C#):

public static int[] solution(String S, int[] P, int[] Q)

    const int MinValue = 1;
    Dictionary<char, int> stringValueTable = new Dictionary<char,int>() 'A', 1, 'C', 2, 'G', 3, 'T', 4 ;

    char[] inputArray = S.ToCharArray();
    int[,] minRangeTable = new int[S.Length, S.Length]; // The meaning of this table is [x, y] where x is the start index and y is the end index and the value is the min range - if 0 then it is the min range (whatever that is)
    for (int startIndex = 0; startIndex < S.Length; ++startIndex)
    
        int currentMinValue = 4;
        int minValueIndex = -1;
        for (int endIndex = startIndex; (endIndex < S.Length) && (minValueIndex == -1); ++endIndex)
        
            int currentValue = stringValueTable[inputArray[endIndex]];
            if (currentValue < currentMinValue)
            
                currentMinValue = currentValue;
                if (currentMinValue == MinValue) // We can stop iterating - because anything with this index in its range will always be minimal
                    minValueIndex = endIndex;
                else
                    minRangeTable[startIndex, endIndex] = currentValue;
            
            else
                minRangeTable[startIndex, endIndex] = currentValue;
        

        if (minValueIndex != -1) // Skip over this index - since it is minimal
            startIndex = minValueIndex; // We would have a "+ 1" here - but the "auto-increment" in the for statement will get us past this index
    

    int[] result = new int[P.Length];
    for (int outputIndex = 0; outputIndex < result.Length; ++outputIndex)
    
        result[outputIndex] = minRangeTable[P[outputIndex], Q[outputIndex]];
        if (result[outputIndex] == 0) // We could avoid this if we initialized our 2-d array with 1's
            result[outputIndex] = 1;
    

    return result;

在 pshemek 的回答中 - 第二个循环中的“技巧”很简单,即一旦您确定找到了一个具有最小值的范围 - 您就不需要继续迭代了。不确定这是否有帮助。

【讨论】:

【参考方案17】:

php 100/100 解决方案:

function solution($S, $P, $Q) 
    $S      = str_split($S);
    $len    = count($S);
    $lep    = count($P);
    $arr    = array();
    $result = array();
    $clone  = array_fill(0, 4, 0);
    for($i = 0; $i < $len; $i++)
        $arr[$i] = $clone;
        switch($S[$i])
            case 'A':
                $arr[$i][0] = 1;
                break;
            case 'C':
                $arr[$i][1] = 1;
                break;
            case 'G':
                $arr[$i][2] = 1;
                break;
            default:
                $arr[$i][3] = 1;
                break;
        
    
    for($i = 1; $i < $len; $i++)
        for($j = 0; $j < 4; $j++)
            $arr[$i][$j] += $arr[$i - 1][$j];
        
    
    for($i = 0; $i < $lep; $i++)
        $x = $P[$i];
        $y = $Q[$i];
        for($a = 0; $a < 4; $a++)
            $sub = 0;
            if($x - 1 >= 0)
                $sub = $arr[$x - 1][$a];
            
            if($arr[$y][$a] - $sub > 0)
                $result[$i] = $a + 1;
                break;
            
        
    
    return $result;

【讨论】:

【参考方案18】:

这个程序获得了 100 分,性能方面比上面列出的其他 java 代码更有优势!

代码可以在here找到。

public class GenomicRange 

final int Index_A=0, Index_C=1, Index_G=2, Index_T=3;
final int A=1, C=2, G=3, T=4; 

public static void main(String[] args) 

    GenomicRange gen = new GenomicRange();
    int[] M = gen.solution( "GACACCATA", new int[]  0,0,4,7  , new int[]  8,2,5,7  );
    System.out.println(Arrays.toString(M));
 

public int[] solution(String S, int[] P, int[] Q) 

    int[] M = new int[P.length];
    char[] charArr = S.toCharArray();
    int[][] occCount = new int[3][S.length()+1];

    int charInd = getChar(charArr[0]);

    if(charInd!=3) 
        occCount[charInd][1]++;
    

    for(int sInd=1; sInd<S.length(); sInd++) 

        charInd = getChar(charArr[sInd]);

        if(charInd!=3)
            occCount[charInd][sInd+1]++;

        occCount[Index_A][sInd+1]+=occCount[Index_A][sInd];
        occCount[Index_C][sInd+1]+=occCount[Index_C][sInd];
        occCount[Index_G][sInd+1]+=occCount[Index_G][sInd];
    

    for(int i=0;i<P.length;i++) 

        int a,c,g;

        if(Q[i]+1>=occCount[0].length) continue;

        a =  occCount[Index_A][Q[i]+1] - occCount[Index_A][P[i]];
        c =  occCount[Index_C][Q[i]+1] - occCount[Index_C][P[i]];
        g =  occCount[Index_G][Q[i]+1] - occCount[Index_G][P[i]];

        M[i] = a>0? A : c>0 ? C : g>0 ? G : T;    
    

    return M;


private int getChar(char c) 

    return ((c=='A') ? Index_A : ((c=='C') ? Index_C : ((c=='G') ? Index_G : Index_T)));  


【讨论】:

【参考方案19】:

这是一个 100% 的简单 javascript 解决方案。

function solution(S, P, Q) 
    var A = [];
    var C = [];
    var G = [];
    var T = [];
    var result = [];
    var i = 0;

    S.split('').forEach(function(a) 
        if (a === 'A') 
            A.push(i);
         else if (a === 'C') 
            C.push(i);
         else if (a === 'G') 
            G.push(i);
         else 
            T.push(i);
        

        i++;
    );

    function hasNucl(typeArray, start, end) 
        return typeArray.some(function(a) 
            return a >= P[j] && a <= Q[j];
        );
    

    for(var j=0; j<P.length; j++) 
        if (hasNucl(A, P[j], P[j])) 
            result.push(1)
         else if (hasNucl(C, P[j], P[j])) 
            result.push(2);
         else if (hasNucl(G, P[j], P[j])) 
            result.push(3);
         else 
            result.push(4);
        
    

    return result;

【讨论】:

我认为您的代码中会有一些错误和编译错误。例如,您使用P[j] 作为第二个和第三个参数调用了hasNucl。同样在此函数声明中,您使用了未在该范围内定义的 j【参考方案20】:

perl 100/100 解决方案:

sub solution 
    my ($S, $P, $Q)=@_; my @P=@$P; my @Q=@$Q;

    my @_A = (0), @_C = (0), @_G = (0), @ret =();
    foreach (split //, $S)
    
        push @_A, $_A[-1] + ($_ eq 'A' ? 1 : 0);
        push @_C, $_C[-1] + ($_ eq 'C' ? 1 : 0);
        push @_G, $_G[-1] + ($_ eq 'G' ? 1 : 0);
    

    foreach my $i (0..$#P)
    
        my $from_index = $P[$i];
        my $to_index = $Q[$i] + 1;
        if ( $_A[$to_index] - $_A[$from_index] > 0 )
        
            push @ret, 1;
            next;
        
        if ( $_C[$to_index] - $_C[$from_index] > 0 )
        
            push @ret, 2;
            next;
        
        if ( $_G[$to_index] - $_G[$from_index] > 0 )
        
            push @ret, 3;
            next;
        
        push @ret, 4
    

    return @ret;

【讨论】:

【参考方案21】:

Java 100/100

class Solution 
public int[] solution(String S, int[] P, int[] Q) 
    int     qSize       = Q.length;
    int[]   answers     = new int[qSize];

    char[]  sequence    = S.toCharArray();
    int[][] occCount    = new int[3][sequence.length+1];

    int[] geneImpactMap = new int['G'+1];
    geneImpactMap['A'] = 0;
    geneImpactMap['C'] = 1;
    geneImpactMap['G'] = 2;

    if(sequence[0] != 'T') 
        occCount[geneImpactMap[sequence[0]]][0]++;
    

    for(int i = 0; i < sequence.length; i++) 
        occCount[0][i+1] = occCount[0][i];
        occCount[1][i+1] = occCount[1][i];
        occCount[2][i+1] = occCount[2][i];

        if(sequence[i] != 'T') 
            occCount[geneImpactMap[sequence[i]]][i+1]++;
        
    

    for(int j = 0; j < qSize; j++) 
        for(int k = 0; k < 3; k++) 
            if(occCount[k][Q[j]+1] - occCount[k][P[j]] > 0) 
                answers[j] = k+1;
                break;
            

            answers[j] = 4;
                    
    

    return answers;

 

【讨论】:

【参考方案22】:

在红宝石中 (100/100)

def interval_sum x,y,p
    p[y+1] - p[x]
end

def solution(s,p,q)

    #Hash of arrays with prefix sums

    p_sums = 
    respuesta = []


    %w(A C G T).each do |letter|
        p_sums[letter] = Array.new s.size+1, 0
    end 

    (0...s.size).each do |count|
        %w(A C G T).each do |letter|
            p_sums[letter][count+1] = p_sums[letter][count] 
        end if count > 0

        case s[count]
        when 'A'
            p_sums['A'][count+1] += 1
        when 'C'
            p_sums['C'][count+1] += 1
        when 'G'
            p_sums['G'][count+1] += 1
        when 'T'
            p_sums['T'][count+1] += 1
        end 

    end




    (0...p.size).each do |count|


        x = p[count]
        y = q[count]


        if interval_sum(x, y, p_sums['A']) > 0 then
            respuesta << 1
            next
        end 

        if interval_sum(x, y, p_sums['C']) > 0 then
            respuesta << 2
            next
        end 

        if interval_sum(x, y, p_sums['G']) > 0 then
            respuesta << 3
            next
        end 

        if interval_sum(x, y, p_sums['T']) > 0 then
            respuesta << 4
            next
        end 

    end

    respuesta

end

【讨论】:

【参考方案23】:

简单的 php 100/100 解决方案

function solution($S, $P, $Q) 
    $result = array();
    for ($i = 0; $i < count($P); $i++) 
        $from = $P[$i];
        $to = $Q[$i];
        $length = $from >= $to ? $from - $to + 1 : $to - $from + 1;
        $new = substr($S, $from, $length);

        if (strpos($new, 'A') !== false) 
            $result[$i] = 1;
         else 
            if (strpos($new, 'C') !== false) 
                $result[$i] = 2;
             else 
                if (strpos($new, 'G') !== false) 
                    $result[$i] = 3;
                 else 
                   $result[$i] = 4;
                
            
        
    
    return $result;

【讨论】:

只是我的看法,如果你做一个 strpos 这意味着额外的搜索会增加复杂性【参考方案24】:

这是我的 Java (100/100) 解决方案:

class Solution 
    private ImpactFactorHolder[] mHolder;
    private static final int A=0,C=1,G=2,T=3;

    public int[] solution(String S, int[] P, int[] Q)  
        mHolder = createImpactHolderArray(S);

        int queriesLength = P.length;
        int[] result = new int[queriesLength];

        for (int i = 0; i < queriesLength; ++i ) 
            int value = 0;
            if( P[i] == Q[i]) 
              value = lookupValueForIndex(S.charAt(P[i])) + 1;
             else 
             value = calculateMinImpactFactor(P[i], Q[i]);
            
            result[i] = value;
        
        return result;    

    

    public int calculateMinImpactFactor(int P, int Q) 
        int minImpactFactor = 3;

        for (int nucleotide = A; nucleotide <= T; ++nucleotide ) 
            int qValue = mHolder[nucleotide].mOcurrencesSum[Q];
            int pValue = mHolder[nucleotide].mOcurrencesSum[P];
            // handling special cases when the less value is assigned on the P index
            if( P-1 >= 0 ) 
                pValue = mHolder[nucleotide].mOcurrencesSum[P-1] == 0 ? 0 : pValue;
             else if ( P == 0 ) 
                pValue = mHolder[nucleotide].mOcurrencesSum[P] == 1 ? 0 : pValue;
            

            if ( qValue - pValue > 0) 
                minImpactFactor = nucleotide;
                break;
             
                
        return minImpactFactor + 1;
     

    public int lookupValueForIndex(char nucleotide) 
        int value = 0;
        switch (nucleotide) 
            case 'A' :
                    value = A;
                    break;
                case 'C' :
                    value = C;
                    break;
                case 'G':
                   value = G;
                    break;
                case 'T':
                    value = T;
                    break;
                default:                    
                    break;
        
        return value;
    

    public ImpactFactorHolder[] createImpactHolderArray(String S) 
        int length = S.length();
        ImpactFactorHolder[] holder = new ImpactFactorHolder[4];
        holder[A] = new ImpactFactorHolder(1,'A', length);
        holder[C] = new ImpactFactorHolder(2,'C', length);
        holder[G] = new ImpactFactorHolder(3,'G', length);
        holder[T] = new ImpactFactorHolder(4,'T', length);
        int i =0;
        for(char c : S.toCharArray()) 
            int nucleotide = lookupValueForIndex(c);
            ++holder[nucleotide].mAcum;
            holder[nucleotide].mOcurrencesSum[i] = holder[nucleotide].mAcum;  
            holder[A].mOcurrencesSum[i] = holder[A].mAcum;
            holder[C].mOcurrencesSum[i] = holder[C].mAcum;
            holder[G].mOcurrencesSum[i] = holder[G].mAcum;
            holder[T].mOcurrencesSum[i] = holder[T].mAcum;
            ++i;
        

        return holder;
    

    private static class ImpactFactorHolder 
        public ImpactFactorHolder(int impactFactor, char nucleotide, int length) 
            mImpactFactor = impactFactor;
            mNucleotide = nucleotide;
            mOcurrencesSum = new int[length];
            mAcum = 0;
        
        int mImpactFactor;
        char mNucleotide;
        int[] mOcurrencesSum;
        int mAcum;
    

链接:https://codility.com/demo/results/demoJFB5EV-EG8/ 我期待着实现类似于@Abhishek Kumar 解决方案的分段树

【讨论】:

【参考方案25】:

我的 C++ 解决方案

vector<int> solution(string &S, vector<int> &P, vector<int> &Q) 

    vector<int> impactCount_A(S.size()+1, 0);
    vector<int> impactCount_C(S.size()+1, 0);
    vector<int> impactCount_G(S.size()+1, 0);

    int lastTotal_A = 0;
    int lastTotal_C = 0;
    int lastTotal_G = 0;
    for (int i = (signed)S.size()-1; i >= 0; --i) 
        switch(S[i]) 
            case 'A':
                ++lastTotal_A;
                break;
            case 'C':
                ++lastTotal_C;
                break;
            case 'G':
                ++lastTotal_G;
                break;
        ;

        impactCount_A[i] = lastTotal_A;
        impactCount_C[i] = lastTotal_C;
        impactCount_G[i] = lastTotal_G;
    

    vector<int> results(P.size(), 0);

    for (int i = 0; i < P.size(); ++i) 
        int pIndex = P[i];
        int qIndex = Q[i];

        int numA = impactCount_A[pIndex]-impactCount_A[qIndex+1];
        int numC = impactCount_C[pIndex]-impactCount_C[qIndex+1];
        int numG = impactCount_G[pIndex]-impactCount_G[qIndex+1];

        if (numA > 0) 
            results[i] = 1;
        
        else if (numC > 0) 
            results[i] = 2;
        
        else if (numG > 0) 
            results[i] = 3;
        
        else 
            results[i] = 4;
        
    

    return results;

【讨论】:

【参考方案26】:

/* 100/100 解决方案 C++。 使用前缀总和。首先在 nuc 变量中将字符转换为整数。然后在一个二维向量中,我们在其各自的 prefix_sum[s][x] 中考虑每个核苷 x 在 S 中的出现。之后我们只需要找出每个区间 K 中出现的较低的核苷。

*/ . 向量解(字符串 &S, 向量 &P, 向量 &Q)

int n=S.size();
int m=P.size();
vector<vector<int> > prefix_sum(n+1,vector<int>(4,0));
int nuc;

//prefix occurrence sum
for (int s=0;s<n; s++) 
    nuc = S.at(s) == 'A' ? 1 : (S.at(s) == 'C' ? 2 : (S.at(s) == 'G' ? 3 : 4) );        
    for (int u=0;u<4;u++) 
        prefix_sum[s+1][u] = prefix_sum[s][u] + ((u+1)==nuc?1:0);
    


//find minimal impact factor in each interval K
int lower_impact_factor;

for (int k=0;k<m;k++) 

    lower_impact_factor=4;
    for (int u=2;u>=0;u--) 
        if (prefix_sum[Q[k]+1][u] - prefix_sum[P[k]][u] != 0)
            lower_impact_factor = u+1;
    
    P[k]=lower_impact_factor;


return P;

【讨论】:

【参考方案27】:
   static public int[] solution(String S, int[] P, int[] Q) 
    // write your code in Java SE 8

    int A[] = new int[S.length() + 1], C[] = new int[S.length() + 1], G[] = new int[S.length() + 1];

    int last_a = 0, last_c = 0, last_g = 0;

    int results[] = new int[P.length];
    int p = 0, q = 0;
    for (int i = S.length() - 1; i >= 0; i -= 1) 
        switch (S.charAt(i)) 
            case 'A': 
                last_a += 1;
                break;
            
            case 'C': 
                last_c += 1;
                break;
            

            case 'G': 
                last_g += 1;
                break;
            

        
        A[i] = last_a;
        G[i] = last_g;
        C[i] = last_c;
    


    for (int i = 0; i < P.length; i++) 
        p = P[i];
        q = Q[i];

        if (A[p] - A[q + 1] > 0) 
            results[i] = 1;
         else if (C[p] - C[q + 1] > 0) 
            results[i] = 2;
         else if (G[p] - G[q + 1] > 0) 
            results[i] = 3;
         else 
            results[i] = 4;
        

    
    return results;

【讨论】:

【参考方案28】:

scala 解决方案 100/100

import scala.annotation.switch
import scala.collection.mutable

object Solution 
  def solution(s: String, p: Array[Int], q: Array[Int]): Array[Int] = 

    val n = s.length

    def arr = mutable.ArrayBuffer.fill(n + 1)(0L)

    val a = arr
    val c = arr
    val g = arr
    val t = arr

    for (i <- 1 to n) 
      def inc(z: mutable.ArrayBuffer[Long]): Unit = z(i) = z(i - 1) + 1L

      def shift(z: mutable.ArrayBuffer[Long]): Unit = z(i) = z(i - 1)

      val char = s(i - 1)
      (char: @switch) match 
        case 'A' => inc(a); shift(c); shift(g); shift(t);
        case 'C' => shift(a); inc(c); shift(g); shift(t);
        case 'G' => shift(a); shift(c); inc(g); shift(t);
        case 'T' => shift(a); shift(c); shift(g); inc(t);
      
    

    val r = mutable.ArrayBuffer.fill(p.length)(0)

    for (i <- p.indices) 
      val start = p(i)
      val end = q(i) + 1
      r(i) =
        if (a(start) != a(end)) 1
        else if (c(start) != c(end)) 2
        else if (g(start) != g(end)) 3
        else if (t(start) != t(end)) 4
        else 0
    

    r.toArray
  

【讨论】:

【参考方案29】:

我想我正在使用动态编程。这是我的解决方案。空间小。代码很干净,只是展示我的想法。

class Solution 
public int[] solution(String S, int[] P, int[] Q) 
    int[] preDominator = new int[S.length()];
    int A = -1;
    int C = -1;
    int G = -1;
    int T = -1;

    for (int i = 0; i < S.length(); i++) 
        char c = S.charAt(i);
        if (c == 'A')  
            A = i;
            preDominator[i] = -1;
         else if (c == 'C') 
            C = i;
            preDominator[i] = A;
         else if (c == 'G') 
            G = i;
            preDominator[i] = Math.max(A, C);
         else 
            T = i;
            preDominator[i] = Math.max(Math.max(A, C), G);
        
    

    int N = preDominator.length;
    int M = Q.length;
    int[] result = new int[M];
    for (int i = 0; i < M; i++) 
        int p = P[i];
        int q = Math.min(N, Q[i]);
        for (int j = q;;) 
            if (preDominator[j] < p) 
                char c = S.charAt(j);
                if (c == 'A') 
                    result[i] = 1;
                 else if (c == 'C') 
                    result[i] = 2;
                 else if (c == 'G') 
                    result[i] = 3;
                 else 
                    result[i] = 4;
                
                break;
            
            j = preDominator[j];
        
    
    return result;

【讨论】:

【参考方案30】:

我在 Kotlin 中实现了 Segment Tree 解决方案

import kotlin.math.*

fun solution(S: String, P: IntArray, Q: IntArray): IntArray 

    val a = IntArray(S.length)
    for (i in S.indices) 
        a[i] = when (S[i]) 
            'A' -> 1
            'C' -> 2
            'G' -> 3
            'T' -> 4
            else -> throw IllegalStateException()
        
    

    val segmentTree = IntArray(2*nextPowerOfTwo(S.length)-1)
    constructSegmentTree(a, segmentTree, 0, a.size-1, 0)

    val result = IntArray(P.size)
    for (i in P.indices) 
        result[i] = rangeMinQuery(segmentTree, P[i], Q[i], 0, a.size-1, 0)
    
    return result


fun constructSegmentTree(input: IntArray, segmentTree: IntArray,  low: Int,  high: Int,  pos: Int) 

    if (low == high) 
        segmentTree[pos] = input[low]
        return
    
    val mid = (low + high)/2
    constructSegmentTree(input, segmentTree, low, mid, 2*pos+1)
    constructSegmentTree(input, segmentTree, mid+1, high, 2*pos+2)
    segmentTree[pos] = min(segmentTree[2*pos+1], segmentTree[2*pos+2])


fun rangeMinQuery(segmentTree: IntArray, qlow:Int, qhigh:Int ,low:Int, high:Int, pos:Int): Int 

    if (qlow <= low && qhigh >= high) 
        return segmentTree[pos]
    
    if (qlow > high || qhigh < low) 
        return Int.MAX_VALUE
    
    val mid = (low + high)/2
    return min(rangeMinQuery(segmentTree, qlow, qhigh, low, mid, 2*pos+1), rangeMinQuery(segmentTree, qlow, qhigh, mid+1, high, 2*pos+2))


fun nextPowerOfTwo(n:Int): Int 
    var count = 0
    var number = n
    if (number > 0 && (number and (number - 1)) == 0) return number
    while (number != 0) 
        number = number shr 1
        count++
    
    return 1 shl count

【讨论】:

以上是关于java codility 训练基因组范围查询的主要内容,如果未能解决你的问题,请参考以下文章

Codility - CountDiv JavaScript解决方案

codility MaxNotPresent

以co-co-h2o为诱生co-co-co-co-co-

java https://codility.com/programmers/lessons/

java codility Max-Counters

重叠的基因组范围