Import OTU Data
The Explicet program provides a GUI for exploring OTU data, but does not generate the OTU data. Rather, taxonomic data for a project can be imported from a variety of sequence processing software programs including the RDP Classifier [2], ARB NDS [5,7], Mothur [10], and Qiime [1], and in multiple formats. Data import is fully incremental, so new data may be imported into an existing project at any time.
Note: The import format, date, time, and file location of OTU data are logged in Edit → Project Settings.
A. OTU Data File Format
- The import file must be a delimited file (i.e., delimited by tabs, commas, or spaces).
B. Import a File
- An import file contains multiple libraries (a.k.a. samples) in a single file.
File → Import → File → …
1. Three-line (legacy)
- One file with three lines for each OTU result:
- (line 1) library name and sequence name separated by a tab
(line 2) this second line is ignored
- (line 3) OTU names separated by slash/tab/semi-colon/colon/comma/period/space
- Example:
TPN0001B <tab> TPN0001B_000001
- A
- /Root/Bacteria/"Bacteroidetes"/"Bacteroidia"/"Bacteroidales"
- One file with three lines for each OTU result:
2. Two-line (fasta style)
- Requires two files...
- File 1 = file with two lines for each OTU result:
(line 1) >sequence name
- (line 2) OTU names separated by slash/tab/semi-colon/colon/comma/period/space
- Example:
>TPN0001B_000001
- /Root/Bacteria/"Bacteroidetes"/"Bacteroidia"/"Bacteroidales"
- File 2 = file mapping sequences to libraries, with one line for each OTU result:
- (line 1) sequence name and library name separated by a tab
- Example:
TPN0001B_000001 <tab> TPN0001B
- Note: It is the user’s responsibility to create the mapping file, for instance using a spreadsheet.
3. One-line (ARB NDS)
- Requires two files...
- File 1 = file with one line for each OTU result - such as that emitted by the NDS export function of ARB:
- (line 1) sequence name and OTU name separated by a tab
- Example:
TPN0001B_000001 <tab> /Root/Bacteria/"Bacteroidetes"/"Bacteroidia"/"Bacteroidales"
- File 2 = file mapping sequences to libraries, with one line for each OTU result:
- (line 1) sequence name and library name separated by a tab
- Example:
TPN0001B_000001 <tab> TPN0001B
- Note: It is the user’s responsibility to create the mapping file, for instance using a spreadsheet.
4. RDP Classifier
- Explicet handles both local and web formats. Requires two files...
- File 1 = file with one line for each OTU result:
- (line 1) sequence name and OTU name with bootstrap scores
- This file is formatted as exported by RDP. Please reference the website for more information:
Upon import, the user has the option to change the Bootstrap Threshold for the OTUs being imported. The import default on Explicet is 80%. For more information on recommended bootstrap scores, visit the RDP website listed above.
- File 2 = file mapping sequences to libraries, with one line for each OTU result:
- (line 1) sequence name and library name separated by a tab
- Example:
TPN0001B_000001 <tab> TPN0001B
- Note: It is the user’s responsibility to create the mapping file, for instance using a spreadsheet.
5. OTU Table Counts
- This is the most compact and efficient way to import OTU files. Instead of listing each individual sequence, the OTU table contains each OTU name listed once and a count for each time it appears in each library:
- (top line) library names as column headings separated by a tab
- (following lines) OTU name and library counts separated by a tab
- Example:
OTU Name <tab> Total <tab> HV1-1-BtRSc <tab> HV10-BtRSc <tab> HV2-1-BtRSc
root <tab> 9,049 <tab> 326 <tab> 198
Bacteria <tab> 2 <tab> 2 <tab> 0
Bacteria;Acidobacteria;Acidobacteria;RB41 <tab> 1 <tab> 0 <tab> 0
- Note: OTU table count files can be exported from Explicet, Mothur, and Qiime.
- This is the most compact and efficient way to import OTU files. Instead of listing each individual sequence, the OTU table contains each OTU name listed once and a count for each time it appears in each library:
C. Import a Directory
- A directory contains a separate file for each library. The names of each file (less the file extension) will be used as the library name for the sequences within the file.
File → Import → Directory → …
1. Two-line (fasta style) / One-line (ARB NDS) / RDP Classifier
- Each individual file is formatted as described above in “Import a File”; however, each file name (less the file extension) is the library name (instead of a mapping file).