Research Computing staff helps students, postdocs get started with high-performance computing

For students and postdocs with limited computing experience, the idea of doing research with Purdue’s community cluster supercomputers can be intimidating. ITaP Research Computing staff can help demystify the process.

Basudev Chowdhury, a former Purdue postdoctoral researcher now at the University of Massachusetts Medical School, studies a genetic mutation found in approximately 40 percent of patients with clear cell renal cell carcinoma, the most common form of kidney cancer.

Although Chowdhury was trained as a bench biologist, he soon realized that doing the kind of genomic analysis necessary for his research required computational resources beyond the standard workstations available in his lab.

“The nature of science has evolved so much in the last decade,” says Chowdhury. “The cost of sequencing has fallen and it’s now so much easier to do sequencing and gather comprehensive genomic information in ways we couldn’t envision earlier.”

To get started using Purdue’s community clusters, he attended a series of Research Computing workshops offered each semester that focus on helping new users learn the basics of high-performance computing. There, he got to know ITaP senior scientific applications analysts Gladys Andino and Steve Kelley, and began working with them one-on-one and at Research Computing Coffee Breaks. With their help, he was able to use Purdue’s Rice community cluster to perform DNA and RNA sequencing analysis.

“They helped me a lot in terms of getting accustomed to the supercomputing resources on campus and analyzing my data using these resources,” says Chowdhury. In just under six months, he’s gone from not knowing anything about UNIX to being able to confidently edit scripts, run jobs on the clusters and process his data as necessary.

Marwa El Hindawy, a graduate student in food science, has also worked individually with Andino to learn how to use the community clusters for her research into the effects of different carbohydrate structures on intestinal cells.

“I really started from scratch,” says El Hindawy. “Gladys helped me with all the basics of UNIX and how to use modules and clusters. And then we worked with my data, and she showed me how to open it, extract it and analyze it.” The two are still meeting weekly to work with El Hindawy’s data.   

ITaP Research Computing’s spring 2018 training series will begin on Tuesday, Jan. 23, with UNIX 101, a class designed for those who have no previous experience with UNIX-based systems like the community clusters. An advanced UNIX class, as well as a Clusters 101 workshop about high-performance computing with Purdue’s community clusters will follow in February. To register, click here. Slides from the workshops are also available online.

In collaboration with the Department of Biological Sciences, ITaP Research Computing will also offer an eight-week class this spring to teach students the sequencing techniques Chowdhury and El Hindawy are using. The course, BIOL 59500, High-Performance Computing for the Life Sciences, is designed for advanced undergraduates and graduate students who are interested in learning how to use Purdue’s community clusters for applications in genomics and bioinformatics, with a focus on hands-on analysis of differential gene expression using RNA-Seq data. No previous experience in UNIX, programming or genomics is required.

Andino will be an instructor for the class, along with Michael Gribskov, a professor of biological sciences and computer science, and Nadia Atallah, a bioinformatician at the Purdue Center for Cancer Research. For more information about the course, contact Andino, gandino@purdue.edu.

For more information about ITaP Research Computing training opportunities or Purdue’s research computing resources, email rcac-help@purdue.edu.

Writer: Adrienne Miller, science and technology writer, Information Technology at Purdue (ITaP), 765-496-8204, mill2027@purdue.edu

Last updated: January 18, 2018