There are two sources of Mitragyna speciosa data that we are aware of.
A search of NCBI will provide 40 sequences. These are well annotated sequences that sum to 344,569bp including two chloroplast sequences, ITS sequences and many genes of interest. They are derived from multiple different researchers from around the world using multiple different sequencing technologies. Anyone can download these as a FASTA file posted here.
Below is a BLASTN output table of these 40 sequences aligned to our assembly. All of them have hits. Some SNPs are expected as the Red Vein Thai sample sequenced is not the same as previous strains sequenced.
The ironclad evidence that this sequence is in fact Mitragyna speciosa is the 100% perfect BLAST hit to the published ITS sequence AB249645.1
There is also a SRA archive of 1.3M reads from the FDA. You will need to download specialized software and be knowledgable with command line interfaces to download this data.
|Run||Spots||Bases||Size||GC content||Published||Access Type|
|L=248, 100%||L=248, 100%|
We have attempted to assemble this data with little success. After further inspection it appears the FastQ file has the Forward and Reverse reads concatenated into a single 499 base pair read.
This can be seen with an AT analysis over Read Length which demonstrates a spike at the strand flipping base at 250bp. This is consistent with the SRA table above. In order to make use of this data one needs to decouple these Forward and Reverse reads so assemblers do not attempt to assemble each read as contiguous 499bp reads and adapter trim the strand flipping base and the 1st and last base which appear to be adapter derived. We are still working to do this.
The unix command cut can help trim the first 200 bases off the reads.
cut -c 1-200 in.fastq >out.fastqThese reads will map to the assembly with elevated error rates on the 3 prime end of the reads.