Broman KW, Heath SC (2007) Managing and manipulating genetic data. In preparation for Barnes MR, Gray IC (eds) Bioinformatics for Geneticists, 2nd edition, Wiley, pp. 17-31

The ever-increasing size and complexity of genetic data has led to an increasing need for geneticists to learn computer programming: for efficiency, to avoid introducing errors into data, and to make simple what would otherwise be infeasible. As the most fundamental task for the genetic data analysis involves the manipulation of data files, proficiency in a computer language, such as Perl, with which such manipulation of text files is most natural, is recommended. We describe the essential issues in the management and manipulation of genetic data: never modify data by hand, be organized and keep notes, and plan for the future but get the job done. We focus on the case of human linkage data, though the basic principles apply to all types of data. We provide some example Perl code, to give the reader a flavor of the language and to emphasize certain features of Perl that are especially valuable for this type of work.

[pdf (250k)]