# igrep

a fast CUDA implementation of agrep algorithm for approximate nucleotide sequence matching

a fast CUDA implementation of agrep algorithm for approximate nucleotide sequence matching

The input to igrep is twofold:

- A genome to search. Totally 26 assembled genomes are collected from ftp://ftp.ncbi.nih.gov/genomes. Their sizes vary from 3.50Gnt to 0.19Gnt, accounting for 44Gnt in total.
- A set of queries. A query consists of a pattern of alphabet A, C, G, T, N, followed by an edit distance. N is a wildcard and can match either A, C, G, or T in the genome. The pattern length must be between 1 and 64. The edit distance must be between 0 and 9, and must not exceed the pattern length. Substitution, insertion and deletion have a uniform cost of one edit distance. For each job, up to 10,000 queries will be processed.

The output from igrep is twofold:

- log.csv: summary of queries and results.
- pos.csv: ending positions of matches. For each query, up to 1,000 matches will be returned.

Hongjian Li, Bing Ni, Man-Hon Wong, and Kwong-Sak Leung. A Fast CUDA Implementation of Agrep Algorithm for Approximate Nucleotide Sequence Matching. *9th IEEE Symposium on Application Specific Processors (SASP)*, pp.74-77, San Diego, United States, 5-6 June 2011. DOI: 10.1109/SASP.2011.5941082