本发明涉及基因在转录的过程中rna序列中假尿苷修饰位点预测技术,具体是一种基于卷积神经网络预测假尿苷修饰位点的方法。
背景技术:
基因在转录的过程中,很多rna都发生了修饰的现象。截止目前,已经发现了一百多种rna的修饰。用化学的方法,这些共价原子的rna修饰已经被研究了十二年左右,一些这种修饰存在一个生命中的很多部位,它们影响rna的二级和三级结构,影响基因表达的速度和精度,能够维持rna的稳定性、帮助rna在核糖体上正确解码、阻止一些疾病的发生等在正常行驶生物学功能方面具有重要意义。
在这一百多种修饰中,假尿苷是第一个被发现的,并且是迄今为止数量最多的一种rna修饰。目前人们熟知假尿苷修饰存在trna,rrna,srna等一些非编码rna中,后来thomasm.carlile等人通过高通量测序的方法发现,在人类和酵母菌细胞中的mrna上也存在假尿苷位点的修饰。假尿苷位是尿嘧啶的异构体,是在一定条件下通过共价键的转移形成的。例如在真核生物中,假尿苷化的过程主要是通过boxh/acarnps的催化作用,boxh/acarnas每一个发夹有两个凸环,它通过识别特定的rna序列,并在下面凸环的结构处与之碱基互补配对,然后通过特定酶的催化,作用于凸环顶端未配对处右面的尿嘧啶,使尿嘧啶的化学结构以位置3和6连线为轴旋转180°,然后磷酸c5顺时针旋转到最下面,这样和核糖相连的原来的c-n键变为了c-c键,形成了假尿苷。
假尿苷能够改变rna的结构,增加碱基堆集,提高碱基配对,固定核糖-磷酸骨架。它与帕金森等神经性疾病和x连锁形式的骨头骨髓衰竭综合征角化病直接或间接相关,由于其特殊的结构和化学性质,以及它的生物学和医学的意义,假尿苷位点的研究越来越引起人们的关注。针对假尿苷位点识别这一问题,高通量测序技术被称为ψ-seq被提出(carlile,t.m.etal.pseudouridineprofilingrevealsregulatedmrnapseudouridylationinyeastandhumancells.nature515,143(2014)),对一些物种ψ位点进行了全面、高分辨率测绘,来确定假尿苷位点,但这种技术是对基因组序列进行测序,成本巨大,耗费时间比较长,而且随着序列长度的增加测序会越来越困难。因此,迫切需要开发一些更方便的计算机算法提取假尿苷位点的信息,然后对位点进行预测。
目前li,y等人(li,y.h.,zhang,g.&cui,q.ppus:awebservertopredictpus-specificpseudouridinesites.bioinformatics31,3362(2015))和chenw等人(wei,c.,hua,t.,jing,y.,hao,l.&chou,k.c.irna-pseu:identifyingrnapseudouridinesites.moleculartherapynucleicacids5,e332(2016))等人通过对基因序列进行截取,然后对序列进行编码,chenw在编码时加入了核苷酸的物理化学性质,最后再用libsvm算法进行特征提取和分类,来确定假尿苷位点,但libsvm算法进行特征提取和分类的准确率有待提高,为了更准确的预测假尿苷位点,需要更高效率的算法进行序列特征提取。
技术实现要素:
本发明目是针对现有技术的不足,而提供一种基于卷积神经网络预测假尿苷修饰位点的方法。这种方法能提高假尿苷位点预测的准确率,使假尿苷位点预测更好的推广于应用。
实现本发明目的的技术方案是:
一种基于卷积神经网络预测假尿苷修饰位点的方法,包括如下步骤:
1)数据集整理及转换:选取wei,c.,hua,t.,jing,y.,hao,l.&chou,k.c.irna-pseu:identifyingrnapseudouridinesites.moleculartherapynucleicacids5,e332-2016论文中的由含有假尿苷位点的正样本和不含假尿苷位点的负样本组成的酵母菌、人和小家鼠三个物种的数据集,对这些数据集进行编码,将人和小家鼠数据集中每一个样本转换成20×20大小的矩阵,酵母菌数据集样本转换成20×30大小的矩阵;
2)模型构建和训练卷积神经网络模型:构建卷积神经网络(convolutionalneuralnetwork,简称cnn)的结构,我们将步骤1)中转换成矩阵的正负样本作为cnn的输入,同时满足正负样本的均衡性,调整cnn的层数以及卷积核的个数和大小,然后利用调整好的cnn结构对数据集序列进行特征提取,训练出一个包含特征向量的模型;
3)对待预测序列截取和编码:将所需要检测的整条序列整理为fasta格式,即首行第一个字符为‘>’,后面添加对序列的解释说明,下一行为待预测序列,用同步骤1)的数据集样本相同长度的滑动窗口对待预测序列进行截取,截取的序列形式和数据集样本形式相同,并将截取的序列转换成步骤1)中的矩阵形式;
4)特征提取和预测:将步骤3)的转换结果作为预测集输入,利用卷积神经网络特征提取后,根据步骤2)已经训练好的卷积神经网络模型对输入序列进行预测,然后向待预测序列末尾的方向滑动窗口,重复循环步骤3)中对序列的截取转换和步骤4),直到整条序列的末尾,最终得到了预测出的假尿苷位点。
步骤1)中所述的编码为:rna序列中一共有a,u,g,c四种核糖核苷酸,任意先后取两个为一组,一共有16种组合方式,然后进行16维移位编码,每一对组合都会被编码为一个16维的列向量,对于一个样本序列,从左到右取两个相邻的核苷酸编码,然后右移一个核苷酸,取后面相邻的两个核苷酸进行移位编码,重复这样的操作进行编码,直到最后一个核苷酸,按照这样的编码方式可知,相邻两个核苷酸都可以转换为一个16维的列向量,简单的这样编码还是不够的,为了更准确的转换特征,还需加上核苷酸的化学性质,核苷酸的化学性质见表1,用第17维代表相邻两个中第一个核苷酸的环形结构,嘌呤用数字‘1’表示,嘧啶用数字‘0’表示;第18维代表相邻两个中第一个核苷酸的官能团,氨基用数字‘1’表示,酮基用数字‘0’表示;第19维代表相邻两个中第一个核苷酸互补配对时氢键的强弱,强用数字‘1’表示,弱用数字‘0’表示;第20维表示和相邻两个中第一个核苷酸类型相同的核苷酸占样本中除去最后一个核苷酸后的比例;对于一个由l+r+1个核苷酸组成的样本序列,编码后转换成为一个矩阵,该矩阵大小为20×(l+r),
表1核糖核苷酸的化学性质
利用卷积神经网络进行序列特征提取在假尿苷位点预测中的应用。
这种方法利用深度学习中卷积神经网络算法对序列特征进行提取和预测。
这种方法的有益效果是:假尿苷在行驶正常的生物学功能方面具有重要作用,因此我们需要准确的预测出假尿苷位点,卷积神经网络具有能够自动深度的挖掘数据的隐含特征的特点,相比现有技术所使用的支持向量机(supportvectormachine,svm)算法,能够更好的提取序列特征,进而提高假尿苷位点预测的准确率。
这种方法能提高假尿苷位点预测的准确率,使假尿苷位点预测更好的推广于应用。
附图说明
图1为实施例的方法流程示意图;
图2为实施例中假尿苷位点的形成过程示意图;
图3是实施例中序列的移位编码方式示意图;
图4是实施例中以物种人为例cnn的结构示意图。
具体实施方式
下面接和附图和实施例对本发明内容作进一步的阐述,但不是对本发明的限定。
实施例:
假尿苷是尿嘧啶的同分异构体,它是在rna转录的过程中,通过酶的催化作用,如图2所示,使尿嘧啶的化学结构以位置3和6连线为轴旋转180°,然后磷酸c5顺时针旋转到最下面,这样和核糖相连的原来的c-n变为了c-c键,形成了假尿苷。
参照图1,一种基于卷积神经网络预测假尿苷修饰位点的方法,包括如下步骤:
1)数据集整理及转换:选取wei,c.,hua,t.,jing,y.,hao,l.&chou,k.c.irna-pseu:identifyingrnapseudouridinesites.moleculartherapynucleicacids5,e332-2016论文中的由含有假尿苷位点的正样本和不含假尿苷位点的负样本组成的酵母菌、人和小家鼠三个物种的数据集,对这些数据集进行编码,将人和小家鼠数据集中每一个样本转换成20×20大小的矩阵,酵母菌数据集样本转换成20×30大小的矩阵;
具体的编码方式为:先进行移位编码,如图3所示,rna序列中一共有a,u,g,c四种核糖核苷酸,任意先后取两个为一组,一共有16种组合方式,然后进行16维移位编码,每一对组合都会被编码为一个16维的列向量,对于一个样本序列,从左到右取两个相邻的核苷酸编码,然后右移一个核苷酸,取后面相邻的两个核苷酸进行移位编码,重复这样的操作进行编码,直到最后一个核苷酸,按照这样的编码方式可知,相邻两个核苷酸都可以转换为一个16维的列向量,简单的这样编码还是不够的,为了更准确的转换特征,还需加上核苷酸的化学性质,核苷酸的化学性质见表1,用第17维代表相邻两个中第一个核苷酸的环形结构,嘌呤用数字‘1’表示,嘧啶用数字‘0’表示;第18维代表相邻两个中第一个核苷酸的官能团,氨基用数字‘1’表示,酮基用数字‘0’表示;第19维代表相邻两个中第一个核苷酸互补配对时氢键的强弱,强用数字‘1’表示,弱用数字‘0’表示;第20维表示和相邻两个中第一个核苷酸类型相同的核苷酸占样本中除去最后一个核苷酸后的比例;例如序列‘agaucu’的编码结果r(agaucu)如式(1)所示:
表1核糖核苷酸的化学性质
2)模型构建和训练卷积神经网络模型:构建卷积神经网络的结构,我们将步骤1)中转换成矩阵的正负样本作为cnn的输入,同时满足正负样本的均衡性,调整cnn的层数以及卷积核的个数和大小,如图4所示,给出的物种人调整好的卷积神经网络的结构,然后利用调整好的cnn结构对数据集序列进行特征提取,训练出一个包含特征向量的模型;
3)对待预测序列截取和编码:利用滑动窗口对整条待预测序列截取和编码,将所需要检测的整条序列整理为fasta格式,即首行第一个字符为‘>’,后面添加对序列的解释说明,下一行为待预测序列,用同步骤1)的数据集样本相同长度的滑动窗口对待预测序列进行截取,截取的序列形式和数据集样本形式相同,所以待预测序列的截取方式为如式(2)所示,以被预测的位点u为基准,在其上游和下游分别取l和r个核苷酸,截取序列的长度为l+r+1个核苷酸,
s(u)=n-ln-(l-1)n-(l-2)...n-2n-1un+1n+2n+(r-2)...n+(r-1)n+r(2),
根据数据集样本的长度,如果待预测序列来自物种人和小家鼠,我们取l=r=10;如果待预测序列来自物种酵母菌,我们取l=r=15,并将截取的序列转换成步骤1)中的矩阵形式;
4)特征提取和预测:将步骤3)的转换结果作为预测集输入,利用卷积神经网络特征提取后,根据步骤2)已经训练好的卷积神经网络模型对输入序列进行预测,然后向待预测序列末尾方向滑动窗口,重复循环步骤3)中对序列的截取转换和步骤4),直到整条序列的末尾,最终在整条待预测序列上得到了预测出的假尿苷位点。
图1给出了基于卷积神经网络的假尿苷位点预测的步骤,首先我们要对数据集进行整理及编码转换,把序列数据集转换成矩阵形式;其次,进行卷积神经网络模型的搭建,然后利用数据集转换成的矩阵训练搭建好的卷积神经网络模型;紧接着,利用滑动窗口截取待预测序列,然后对截取的序列编码;最后,利用卷积神经网络进行特征提取后,基于训练好的模型对输入序列进行预测。
实验例:
利用三个独立测试集s(4),s(5),s(6)对三个物种进行预测:s(4),s(5),s(6)分别来自物种人、酵母菌和小家鼠,其中,s(4),s(5)来自论文(wei,c.,hua,t.,jing,y.,hao,l.&chou,k.c.irna-pseu:identifyingrnapseudouridinesites.moleculartherapynucleicacids5,e332(2016)),s(6)是根据本实施例方法需要单独构造的,s(4),s(5),s(6)分别包含一百个含有假尿苷位点的正样本和一百个不含位点的负样本,预测结果如表2所示:
表2:本实施例方法与仅有的两个预估器的预测结果对比
由表2可以看出,使用本实施例方法预测,其预测结果表明,cnn明显优于目前世界上仅有的基于支持向量机(supportvectormachine简称:svm)算法的两个预测器ppus和irna-pseu。
序列表
<110>桂林电子科技大学
<120>一种基于卷积神经网络预测假尿苷修饰位点的方法
<141>2017-10-20
<160>3
<170>siposequencelisting1.0
<210>2
<211>6200
<212>rna
<213>saccharomycescerevisiae
<400>2
cuaucaucgcugaucucccacucccugaucugaagaggucaucgguucgauuccgguugc60
guguaagaugcaagaguucgaaucucuuagcaagcgaaagauuagaaaucuuuugggcuu120
ugccgguuaaggcgaaagauuagaaaucuuuuggguuuaggaccgagcuuuuaguggaug180
ucaucaggacacuucugauguuucaaaagauauuccagguacuggacgagaaucgcagaa240
caauuugacguagauguuuguuguucacccacaacugaagaguugucgaguuuuuugagg300
uuaagaaugaaaggucgaaaaaguuucaggcaguuucucagcguugggcccccgguucga360
uuccgggcuugcugguaaaauccaacguugccaucguugggccuaagcgcaagugguuua420
gugguaaaauccaagguuaaggcgaaagauuagaaaucuuuuggggcgaaagauuagaaa480
ucuuuugggcuuugccggcuucauuaacauguacuucaacuacggaaguggagaucaucg540
guucaaauccgauuggaauuugguuuucaaguguaauaggcuacgugaucagugguucaa600
gacgucgccuuuacacggcguagugguuaucacuuucgguuuugauccggacacuuucgg660
uuuugauccggacaaccccgguaauugaucuauguuguagcugcgcuggcggcaacucca720
guucuuuaucuucuuucuccgcuggcgucugacuucuaaucagaagauuauggguucuuc780
cgugauaguuuaauggucagaaugggcagaaugggcgcuugucgcgugccagaucgggug840
ccagaucgggguucaauuccccgucgcgagaaaaagccaaugaugagauacaagccauua900
ucgacauaugcugguuacauggcaguagaagaauauacauucuauuaucgaaccuggcca960
ugaaacaagauuucuguagcauacucgcuucauacuuguuuucuuuuuugugccuuuguu1020
acguugcuuuguggaaguucgaaacuccaaaguaugagugauggaaguguaguuauccgg1080
agaucagggucaaaucuucguugaccgucaauuacaugcagcacaaauuuguagacaggc1140
ugguuugaggauuacuuggacauuaacgguucuccuauucaagacaaaaguguucuuuca1200
ucugcaguguuggcguacagauuguaguuguggcugcuaccuuuuuuaauguccguuucu1260
augauugggcuauuguucgaagguaaugccuugaucagaagacuguugguccuuaguucg1320
auccugagugcgagcagcagauugcaaaucuguugguccuuaguuuauccgauauagugu1380
aacggcuaucacauccguggagaccgggguucgacuccccguaucggguauguuauuuau1440
guaacggguaugcgaacauucuuuuuuugauguaauaggauaagcuugcuguucuuuuca1500
guguaacaacugaaaugacuguaguaucuguucuuuucaguguaacaacuguguaguauc1560
uguucuuuucaguguaacaacaaguguaguaucuguucuuuucaguguaacacaagugua1620
guaucuguucuuuucaguguaacaucaaguguaguaucuguucuuuucaguguaucauug1680
uucuuggauuucaaugggugcugucuaaauuucgccacuguagaugaagaagacgaaaaa1740
ugagaagaguguagauguauuauccuuccaagauagacuauguaaugguaaagaacauau1800
ggcggcgggugccuuuggagcagcaaucgaugguguggucacuguaagagauuggcccca1860
ccauggacgagccuguaguauacaacgguaaacaaaggucuuccuaugauuccggcguuc1920
gucuuucucauacccuguagaccagaccucucuagaauacuuugaagguuuaaccgagga1980
aaugcguggagaccgggguucgacuccccguaucguuauccgauauaguguaacggcuau2040
cacaucggacacuucugauguuucaaaagauauuccauaacugugggaauacucagguau2100
cguaagauguaagaugcaagaguucgaaucucuuagcaaacaauuuucacaguuuaaggc2160
caagaacaaggcccguuuacacauuuugauacaaccguagacgggaggucccggguucga2220
gucccggcucgcgauucucgcuuagggugcgggaggucccggggcgugcgacuguuaauc2280
gcaagaucgugagucgcaagaucgugaguucaacccucacuggggcguugggcccccggu2340
ucgauuccgggcuugcugguaaaauccaacguugccaucguugggcccuagcgcaagugg2400
uuuagugguaaaauccaaggcuguguucuucuuucuaaauucccuaucgggaaaaacccg2460
uugcuagaagcgcaacuggugaaaaaaguucagaauugcagaaaaguggugagugguuuc2520
cuaguguaucagccacuaucggcauaagguuagggguucgagcccccuacagggcaaucg2580
guagcgcguaugacucuuaaucauaaacaaaagaagcuguuccagagagcccaagccgga2640
caaccccgguucgaauccggguaggacacuuucgguuuugauccggacaaccccgguggu2700
uaucacuuucgguuuugauccggacaacuagugguuaucacuuucgguuuugauccggau2760
guugccgcuaaguguaaggaagucgguauccugguauauucuauauacucacuuauuacu2820
uuucugguauauucuauauacucacuuauuacaauggcucuuuuuguuauucgaaagcuu2880
acauaaaaaguucggcuaucucuugggcucugccucugcccgcgcugguucaaauccugc2940
uggugcauggaugauauuuguaguauggcggaaaacguggagaucaucgguucaaauccg3000
auuggaaauacuauucaguuucucagauauagguugcagcaauuggaaaaaucuauuaac3060
ccagaugaaccagugcgucuacuauuacucggccaaauauucguaauuugagaucucugc3120
aaaacaaugcaaaacaaugcaccuccuggcaaaaacaucaauaaacaucaaugucaauug3180
uuugaacgucaaugaacgucaauucuuguucguuguccgcaagcaauuaauauggcuugu3240
aauggaaacaagcaaaaacaagcaagaucuucccauaccguuucccaccguuuccccugc3300
auguagaaugcaacgauaucaauguuuaaucauaacagaucaaagagcaucaaagagcag3360
ugguacuacagaugcgucaagguacgcauaagcgugaaccccggucgacgccggucgacg3420
auacauacagagcuguuacaauauagcaaaggacaguagaaaccugaguaauccugagua3480
auggaucuuugaaugauauuaacugauauuaacgaaaaugaagagcuccaaaaugcucca3540
aaauuuccauagaaaaaucagcgaauuccccaaggaaaaauagcgaaaccagaaagguua3600
augcgggaagauuacauugccuugaaaugccuugaaacaaccuccaagcuugggagauga3660
ggagaucucgucguuuaagaaccaagucaaaccaagucauucgguaacaaguuccaagac3720
guuccaagacauuacugucgaaccucaaucccguagguaaaguguauuuagugagggaac3780
gccgcgacaagugaucauccauuuauugugacauauugugacacuguaucauuccuuuca3840
aaccaaacauauuacugcaucaaucuggucaugucuggucaugucaugcuuucugacuuu3900
gauuuaagauacaaaaauuuguucagauggauucagauggauucagaacuaauuccuuug3960
uugguacuuggcuguacuccauuuaaaggagauaauucacgucaaauuuccacaugauaa4020
ggaaguuucgggaaguuucgaagaauuguaaagaccugauacuucuucaaaaaaguucag4080
uggucguucuuaccccccucuaauaccugcauuaaaugauaacuccuuuuauauugucuu4140
gcaauaaacacccgaaacgaugaugaaauugaugaggcugauccaggcugauccauucca4200
ugauuuuaauucuaugcuacucugaaaauuauaccuacggaaaaauuaucuuaugauaaa4260
aauguaaaaaauuauuuaaaacgagaaagugaaugaaaaauauaauaucauauaauauca4320
uuuauugucugauaaugcuguacguaccauccgcaucaguggauauccaaugauauccaa4380
ugauaguaauuucgcgaguuuacgcgaguuuauccguugcuguuauauuaucauauauua4440
ucacuuuuuaauauucuuuucaaaggauuccuuccgcaauucuucugaaauacugcucgc4500
caguuuuuuguucuuccacguaauccccuuauuaacggagauuugauuucucccagcacc4560
gauucgagugaguacguuuucaaauauguucaaauaugcuuaaucugaucucuucugcgg4620
ccgaucugugccauuauaguaagcagugccaagcagugccacuugucuaauauaagauga4680
ugauuuuaccguuuucuggggacaucaugauacaucaugauaucauuugguacauaauga4740
acagauaauuggauuucuugcauuuuuugcgauuuuuugcgauuauggcuuguugaccau4800
ucacaaccauucacaaaaguuggucuaacauaauuuuaaguccuuguaauauucuagcuu4860
uugagucucugggagugguaaaucuacugaccaucuucuuuuauccaaucaucuggcaag4920
uccuuaauuuucaucucuaaaauuuagauauggacguuuguggacguuugagauauuuuc4980
guauuucugccuauuucugccaauucuuccuuuaacugugacuaacugugaccguacuga5040
uucgcuuucccuugcuuucccuuugaauuuuuauuauacccucucauuacuugcuuaucu5100
gaauuuuuuuccauuuuguuugccuauccuuccaucugaugacuugugaugacuugaaau5160
guucugacagguaagauucucaacauucuuaauccaaacgauguccuucuccugcuugug5220
uauuaaaggacaugaaauauuucgcuacauguaauggaagaucaucuguagauucguauc5280
uguaguuucucaucagcaagaucuuucaaaaacgcuugauuugcuggcaccucuuaauag5340
cgcuuguuucugcuugcucuacccucuaggugaacguuuaaucugacauccgggaaguuu5400
gaugugaaguauucugcuaaccguucagguccuucaccuguuuguccaagggaguaaggg5460
gacuuucuggcuuuuuuuuuuacgaaacucuuccucaucaucuucagccucaacauuuuc5520
caaccgcaacuucuuguucuugcuuaugcccugcuuauuguggguugucccgccauuauu5580
cgccauuauuguuaauagauucaacaaaauauuuaucauugaaaauucacgugaucgcaa5640
uagaucgcaauauuccgucaggagugauaaauaucgucauugcacaaauuaguuuauuau5700
ucacuauuauucacgacucuuaacaacgacaauuuuagacaggucguccguagauauuua5760
cauaaauacuacacagacuacuauuagaauuugcgaaaauuugcgaaggauuuaccgaag5820
aaaagcacagaaaagcacagaccuuauugagcuuuugaaucaauaaccaggaguuucaaa5880
aacaaacaggcacuuuucauugaucuauuugauaaaucugccacuagaguccaaucuacg5940
cgacuuauugcaacuuauugcauuccuuggaaggugaaagucuugcacgaugguccaguu6000
auugaugaauuuuuauuuggccauucaacuucauaaguggucgguaagguaccaggaaag6060
uuucugaaaccaucccaaucuuuacucuacuuauuauccauugcaucccccugaauaucu6120
uauuuuagcauuagucaacauuagucaaagaaaugaagcgguucguuuugguucguuuua6180
uugauagaaaacaggacagu6200
<210>3
<211>4200
<212>rna
<213>homosapiens
<400>3
gcuaaacagguacugcugggcuuauugagugucuacuguguggauaaacuguuacgcaua60
uauuugucgguguuaacaaaauggucgggccuaguucaaaccuuuuuuuuaaguauacag120
gggucuggccggucuguagcggaucacuagcuaucgcuucucggccuuugaaaguaacuu180
ugcccgagcacuauucuguuaaaaucaggagcagcugccuuuccaacagcccaaaaugac240
uuucguucuucuuucagauacuuacauaguuuuccgaaucaacuuugccguguugacuca300
aaguuacucuccuuccuacccaccuuucccagaaguggacaauauauuaaauggauugag360
gacaauauauuaaauggaguguaguaucuguucuuaucaaaguguaguaucuguucuuau420
ucaaguguaguaucuguucuuagaucaaguguaguaucuguucuucucggccuuuuggcu480
aagaguaaucgcuucucggccuuugaguaaucgcuucucggccuugauguauuguuugca540
cucuucaugauucuauuauaguauucuuguuuuuguauuguugcuccuuucuuuuuuuug600
gccuuucucgcuaaacagguacugcugggcccauuaucgcuucucggccuucauuaucgc660
uucucggccuuuuguaauauuuuaucccuggacuaguaucuguucuuaucaguuguagua720
ucuguucuuaucaguguguaguaucuguucuuaucaaaguguaguaucuguucuuauuca780
aguguaguaucuguucuuagaucaaguguaguaucuguuguauugagugucuacugugug840
uuuucaucacuauggcuuagcgcaucaaaacuucacuuuuugauuggugguauaguggug900
agcgauaaaaggcuaauauccagaggucccugguucgaucccgggagacugaagaucuaa960
agguccgggagagcguuagacugaagaugggagagcguuagacugaagaaauccuuucua1020
aauugcaugcauaaaaaguuuuuucuucagagaguauggauuccgauaugaaagacauga1080
auaagaacugaugacuuucaauuaucugugugagccuuuucuuuguuuguaacuagccau1140
cagguaagccaagaucuucucggccuuuuggcuaagauccaucgcuucucggccuuuggg1200
cccagggugcuguggagaauuguccuccuucugaagcccccuccuuuucugaggaaggug1260
auuggaacgauacagagaagaagacuauacuuucagggaucagcgccccaauuauuauga1320
cuguaaguuauuuugcucucacuggcaauuugguuccaccacaucacucaauacuuaccu1380
ggcaggcacucaauacuuaccuggcagcuggcugcuguaggucuuuucauuguugauauu1440
ugcccagcagggccucaguuagcucucaagucccaugguguaaugguuagcguuagcacu1500
cuggacuuugaaggacuuugaauccagcgauccgcgauccgaguucaaaucucgcgaucc1560
gaguucaaaucucggucauuuuauguauauuuaucaccuuuccaguuacuccuuauauaa1620
guuauuuugcucucacugucaaguguaguaucuguucuugguaggugaguuuaaagucuu1680
cucuuaccuguuaaaaucagggcaacagaguucaacuaucuccauuugcuguuacucugg1740
agaucaaguguaguaucuguucuuguaaaaggguuacucucauacuuuuauuauuuggau1800
gaauaucuucucggccuuuuggcuaagaacuaucgcuucucggccuuuaaacuaucgcuu1860
cucggccuucccuggagguuccaauccugcuucuccaugauucgugcaucucuaauuaug1920
cuggacuguuuuauuggaacgauacagagaagaauauuucucauuucuuuuaguuauacu1980
aaaauuggaacgauaauuggaacgauacagagaagaacacgcaaauucgugaagcguaag2040
uguaguaucuguucuuauucaaguguaguaucuguucuuucaaguguaguaucuguucuu2100
gugauauaacucaguggcagaggccuuggauuucauccccagggagagggagugggaaca2160
ggauuugcaagacuccuaguaccuuguguagcaaugguguccaggaguaacaaguucagg2220
uucaccgcaaagucacucuauucugaucccaaagguuuacuuaauguuuagguuccuguu2280
gcuugccaucuaagagguuuguuguccuauuggaagucuuuuccuuuaaagucucuuagc2340
aucagacacuuaagagagagaaugagaaucaucguggaaugaauagacuuaacugucagg2400
aggcugucuuacguacacaauugcauguggaagcugcaauaacucauuccuacagcccca2460
caaacgguuuaagcuugagucacaauaaucaucauuucauuccuucaaauaaaaaaaaau2520
cauuucugaauucagauguaucuaucauaguuggguuuaagaaucagaacauuggguaua2580
uuccaccauggugucugggagcacacauuaccccucccuucccgcaccaacgaucugcuu2640
gugaacagagcuuuaguccagagcaagcccccgccuuuuuuucuguuguaaauuuuguua2700
ugcaauuaauuuagaggaauagggaaaguggacgugucuguuguuucucaaggguccgga2760
cuguuugacacugaugaaugcuuucucaaaaguuuaaacaguuucauuuggaaguagggu2820
cgccuuaagucaacaucacagaugcuccagcaggcaaccauauguuuagaaauaaaacca2880
gccgcggugccagcaaagaacagacacauuacuugaacuuguucugaguucuacugucuu2940
acccaaaugcucggaaacucucuuaugacugugacuucagaaaaagaaggauuccaaaga3000
caaacucaaauucuuagaugaccaaggcagacaguaggaagaguaauggaaauccuuuug3060
uuuuguuguucuguuguugucaagugcaaaaauauaauuuguugaauaugugugcuucug3120
uccuacuacauuucuuccauuuuuaauuaaaaaguagagcuaggacccacucuuguuccu3180
guacucacuguaggaccccaccuaaaaguauaauccugagaguucacgcugagccuuuuc3240
ucucucuuccugaaaacugaaguguucccaaagcuauguguaaagguuugguucucaucu3300
cucucucucucucucucuuguagguggguaguaggugagcagcugggaguuaaauacucu3360
guggaaccucucuaguuaaaaguaaccagucugugggaaguaaaagcaacauucccugcu3420
ggaggcuccaggauccuaagggacgucuguacucuaaggggacauuuaaauugcaucucc3480
cucauuaaaugaugacugaugcuacuauguuuaaacauuggauuuaacguuuauuucauu3540
guuuuuauuucacugugggucugggcuuuaagacccucauuuuagcugccuagccuucag3600
augaagggggggucucugcuaauuauacaucuggaguucagccuucagaacuugucagcc3660
acccuacccuacuuggaccaugucuugaaaagacaagugguugacuuuggguuucuuaug3720
uguuuguuuguuuguuuguuuguuuugcuccugacaccaccacccucuuuuaaguagauu3780
gugaccagaauaguaacuaaaauguugaauuuauuugcuuaacaaauguggcucuaaauu3840
uuaaggaucauuaugaaagaugaauagcuccccuuucucugcuugugaacacguaugcca3900
auggacucugcucccguguuacagugugaccuaacuuuggauacuuuuuccucuauaguu3960
aaccacauuaauuucaaaauugcagagaaauggaucacuuugcaucaguagggcugguaa4020
auugaaauacuggaccaucacauauuuccuggugcuucuuuguuuauucauuuggcuauu4080
ccauuguuccuguaccaucaaucuuucucaguuugugaacaugagcucuugagauucauu4140
caggaggucucagaacacuaaggcuuuauugucuccuaaucuuaacucuuggggcuggua4200
<210>3
<211>4200
<212>rna
<213>musmusculus
<400>3
gacucugcuguuccaaaggacaacccagaauuauuauuucuuauucuugguuuuuuuuuu60
cucauguacuuuguagugguuuaucugccuuuguuugaucugagcuauucuuauauuugu120
uuuuuagcuucugggguuugugauucuucugcacugcugucccgagaccucgcugcuuuu180
cucaagcaaaugccccaccucuggacaaguggcccugcacuaugauauauguucucaggg240
uuagauccccauugccaguggcuucauugguggcuguucacuguauugggggaaaacaaa300
uccuuauucagccuccccaggagguuccaauccugcgggacccgacuuauuccuuagcgg360
ucagcccuccgugugcuuuuacagacaauuucaaagucaguuggugguauuaaagaagac420
guccucacuguacagugccaaaacaaagauguucuuuugucucauuuggauuugcauucc480
agcuacuaagacuuguugguagcccaccucuuccuuaagccugcugcauggaugcuaugc540
accccagaaguuuuuguacaggcagauaagaagcaaguauuaggaccacugguggcagug600
gaagcaccaccugcuacucuacccaccaaaagguaccgcuuuccucaaguggucuacaag660
cuacacgugguuccuuuuugaauuuguaaggacguaacaucuguauauuuaaucgaaggc720
acacuuucagccagcgucuuugaaauauuaguuucaucuuaacagauuagugccuuugga780
ucccaaguuccuggugaacgcugcugcuuuucaugguccacccagugacuaacaucugcc840
gcgcugucuuuuccgaucucguacauggagguuccucugggggggcugcggcuacuucug900
cacaucggcucuguagacacuuucuugcccagacuauauaauggcuuguuaaugauuuuu960
uuuuuucuuuuggcucuaugagcucuggacuccaaacuucaucauggcgcucagcuacua1020
caaccagagaguaauggguuagaaaccaucaguaauggguuagaaaccaucacacucugc1080
uuacggucagacucuggacuuuuacauccacgaccucuugucaucccuggaagcccucuu1140
gucaucccuggaagcccaggagcccuacacuucuguagaccgggguucaauuccuagaga1200
ccgggguucaauuccuagccucuuuccguaccauucuagacuaacucuguagaacagucu1260
uuaucuuguauucauuguaccgaugcugagguacugccucgcuccaccuuuuuaccaaaa1320
ugucugugaugucuacaaaguucuacuccaagaguaugcggcagcagaauuauauuauau1380
ggacaccuacuacuacuacuacuaacuugaagcuguuuauaguagacugauggccgacua1440
acuaaaaagccacgauguguaucgagccacaugcucacuuuauuauccaugggacuccuc1500
uuuccaucggaaaaguugagcauauguucucagaguuugacauuucgaaaucugugguca1560
guauuuuaauaauuagcuuuaaaguuauaaaaagcaaacagaugucuuuguacccagagc1620
ugcuauuuuuagauacuuagucaacuuuuaaaauaccaccauaggcaguuacauucugca1680
guucuuucuuucuuuuuuuuucggaccagcuuacagagucagcugcuacauuuacauaga1740
gugcaguuucuuuuuucagauuuuuuuguaucacuuuguagaccaacuaggaaucuacag1800
auuaagugaagcucuuuauauaguugagucuguaacauuccugaugaucuucauaaugua1860
ucccuuacaggguccuuccuacaagaaagaacuuuuaauauuaguagcagaauuuuuacu1920
aucuauccauuacagccaguccuguggcuugcuagccuggaguucuaaucuucagaucuu1980
gauuuaacagcagaggaaaaaggcauauagaaaauuugugacaguguagcugugauucag2040
ggcccgguucaugacccggcagucuucguuugucagucaaaaagaagccuuuagugugug2100
ucaacccaccugcucucuguagacaguuugcuauuggggugaauuuagauauucaucuag2160
caagguggggcaguaauaucuuacccauuuuucauagaugucuuuucauaggcaauguaa2220
gcuuuuacccagcaccuguagaacaggucuguuuuggugagaaucuguuuggugagaauc2280
uguuuugguggaacgggaauacccugcaugccucuauugcucauucugggccagcucauc2340
auguagauguuggaucuucuccuaaaggcuuugacagacccacuggucauagucacuccc2400
uggauuguauugcucgcaaaaucacuuaugcuucccuauaauuuucuggcccuucuacac2460
uguuaugucauuuuguuucuugacugagucuaucugugugaccauagguuucauacugug2520
ucuagugcacuagugacauuucccaauucaguuugauuuuuuugagguauuauaaugucc2580
aaugaagcaaggcuaacaaagccuagucaucaugucaaguggacauaugggaaaguaaaa2640
uucaccaaauucaccauaauacuagccaccauauguguaaggaauuauaacagggaguuu2700
guaaucgguuuucagagcauccauugucacucuacacaaguagcagucaucgccuuagua2760
auacaugaucuuuagaguauuauacucuacucauugauuugacuacauauuucuuaauga2820
aaacuaugcucaacuuuuugcacaaugggaagacuuaaccuguacauagcuguuuaauuu2880
cucugacucacgauaucccuguuuccaaugugaagacaggcaguguaaauagagaugaug2940
gcagaaugccugaauuuauggauugacuggcugggagccuucgcaaugaguauuaauuaa3000
uacagagagagaauaggauauggaguuagagacguggaaacaaugccauggagacuacag3060
aguugacagccugaacuuguacagcacauuaaucuacuggaaaguauaacugggaagauu3120
uccaggaacauaaaauguauuugacucuugcuccaaauaauaaauuaagggggccuuaca3180
cucacuggacagucacccccucugaugagcuguagaguuggacuauucuggugagcuuag3240
ucccugggccaugccgcuugguuaguguguuuugugcucuuuaaaguugagugauauacc3300
ucauauauacaacacaccgaagagacacucaggguccuauaucuuuugcuguugagggac3360
caaugcagguucaaggugacacacacuagguuuagagucauguguucugugaucagugga3420
ccaucuguauggcugagaugaaauuuguccuucaucucaccauuguguagcccgauccuc3480
uccucuugaugccuacucauuuuucaguucuuacuucugccagagucuccugcuaaguuu3540
cccauggaacauguacaucugaaucuuugcaaccaagcagugacugaccuuuaauuuggc3600
aucuuugaguuggaaaucccaguacagaguaaccauuagcugauaaaugagugagacaua3660
aagcucugagcaggcauugcaaugauaaaaugaauaaacacaggacuaacuuuacauacu3720
uaauuacuucauaacugcaaaauaggaauuauagaucucucauaaugcuuugcccagugc3780
uacugggugcuacuguacaauaagcagccgauauagguaguucccacccaaguaauucau3840
ucuccaguuuaccuugcaaaaccagaugcagagauaggccccaauaaagaggaaaugguc3900
auaauuuuauuuaauauuucccauauacaccucauuaucugcuguaccucauuauauaug3960
aucuauuuuuuagucuuccuuagcaggcccaacucucagcuuuaugacuccuuacgccag4020
cucucugaggccgcagauaauguauuuguguugaauaaaucccucacacacagauagcag4080
cacauagcagcucaucgggcuuucauaucacaucaccagccucucauuagauccuugauu4140
accacugugccuucuucacacagauguuugacacucaauuuuuacccccuucugauuuac4200