<<  < 2013 - >  >>
1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31




Description

Biologists finally invent techniques of repairing DNA that contains segments causing kinds of inherited diseases. For the sake of simplicity, a DNA is represented as a string containing characters 'A', 'G' , 'C' and 'T'. The repairing techniques are simply to change some characters to eliminate all segments causing diseases. For example, we can repair a DNA "AAGCAG" to "AGGCAC" to eliminate the initial causing disease segments "AAG", "AGC" and "CAG" by changing two characters. Note that the repaired DNA can still contain only characters 'A', 'G', 'C' and 'T'.

You are to help the biologists to repair a DNA by changing least number of characters.

Input

The input consists of multiple test cases. Each test case starts with a line containing one integers N (1 ≤ N ≤ 50), which is the number of DNA segments causing inherited diseases.
The following N lines gives N non-empty strings of length not greater than 20 containing only characters in "AGCT", which are the DNA segments causing inherited disease.
The last line of the test case is a non-empty string of length not greater than 1000 containing only characters in "AGCT", which is the DNA to be repaired.

The last test case is followed by a line containing one zeros.

Output

For each test case, print a line containing the test case number( beginning with 1) followed by the
number of characters which need to be changed. If it's impossible to repair the given DNA, print -1.

Sample Input

2
AAA
AAG
AAAG    
2
A
TG
TGAATG
4
A
G
C
T
AGT
0

Sample Output

Case 1: 1
Case 2: 4
Case 3: -1

题目大意:给定至多50条,带有遗传病的致病基因序列,每条序列的长度不超过20,再给你一个长度不超过1000的待修改的序列,要求你修改这个序列中的若干字母,使得它的任意一个子串都不包含致病基因。

显然,有多个待匹配的串,就要想到自动机了。先把所有致病的序列加入到自动机,并且在tire树上的最后一个字符打上标记true,意思为这个字符(状态)是某个致病基因。构造自动机的同时要注意,状态转移需要的是tire图,而不是tire树,所以我们在构造好的自动机上需要外加两步操作
一、对于当前节点cur,如果在其fail链上某节点上有tag为true的标记,那么从那一点起到当前节点的tag标记都应该赋为true,因为那个状态肯定是当前这个状态的子状态,果断标记排除。
二、某些节点可能并不包含所有ATGC的next标记,但是状态转移时允许的,故因补上缺失的next标记。其实很简单:next[cur][i]=next[fail[cur]][i],即把它的fail指针指向它的fail指针的第i个孩子。

于是就是赤裸裸的DP了,f[i][j]表示扫描到第i的长度,在自动机上达到j状态时,为了不踩到地雷(即不包含任意一个致病基因)需要修改的最少的字母数。
所以
f[i+1][next[j][k]]=min(f[i+1][next[j][k]],f[i][j])(第i+1个字符的标号为k)
f[i+1][next[j][k]]=min(f[i+1][next[j][k]],f[i][j]+1) (第i+1个字符的标号不为k,这时就肯定需要修改一个点了)
这样答案就比较显然了(注:上面DP循环的任意状态都必须判断是否踩到了地雷)
下面附一份AC代码:


#i nclude <iostream>
#i nclude <cstdio>
#i nclude <cstring>
#i nclude <queue>
#define maxn 1010
using namespace std;

int next[maxn][4],f[maxn][maxn],N,n,cod[500],fail[maxn],L;
bool tag[maxn];
char s[maxn];


void insert()
{
    int cur=0,tep;
    for (int i=0; s[i]; i++)
    {
        tep=cod[int(s[i])];
        if (next[cur][tep]==0)
            next[cur][tep]=++N;
        cur=next[cur][tep];
    }
    tag[cur]=true;
}

void buildAC()
{
    queue<int> Q;
    int cur,child,k;
    Q.push(0);
    while (!Q.empty())
    {
        cur=Q.front();
        Q.pop();
        for (int i=0; i<4; i++)
        {
            child=next[cur][i];
            if (child)
            {
                Q.push(child);
                if (cur==0) fail[child]=0;
                else
                {
                    k=fail[cur];
                    while (k && next[k][i]==0) k=fail[k];
                    if (next[k][i]) fail[child]=next[k][i];
                        else fail[child]=0;
                }
                if (tag[fail[child]]) tag[child]=true;
            }
            else
            {
                k=fail[cur];
                next[cur][i]=next[k][i];
            }
        }
    }
}

int main()
{
    int tep,ans,cas=0;
    cod['A']=0; cod['T']=1;
    cod['G']=2; cod['C']=3;
    while (scanf("%d",&n) && n)
    {
        memset(next,0,sizeof next);
        memset(tag,false,sizeof tag);
        memset(f,1,sizeof f);
        memset(fail,0,sizeof fail);
        N=0;  ans=~0U>>1;
        while (n--)
        {
            scanf("%s",s);
            insert();
        }
        buildAC();
        scanf("%s",s);
        L=strlen(s);
        f[0][0]=0;
        for (int i=0; s[i]; i++)
        {
            tep=cod[int(s[i])];
            for (int j=0; j<=N; j++)
            {
                if (tag[j]) continue;
                if (f[i][j]==f[1009][1009]) continue;
                for (int k=0; k<4; k++)
                {
                    if (tag[next[j][k]]) continue;
                    if (tep==k)
                        f[i+1][next[j][k]]=min(f[i+1][next[j][k]],f[i][j]);
                    else f[i+1][next[j][k]]=min(f[i+1][next[j][k]],f[i][j]+1);
                }
            }
        }
        for (int i=0; i<=N; i++)
            if (f[L][i]<ans) ans=f[L][i];
        printf("Case %d: ",++cas);
        if (ans<f[1009][1009]) printf("%d\n",ans);
            else printf("-1\n");
    }
    return 0;
}


^_^

发表评论:
天涯博客欢迎您!