Overview

This MOOC Repository covers Science Clouds. Lecture topics include the use of Clouds in genetic research, a specific higher education cluster called FutureGrid, and virtual computing via Clouds. When you have completed your time here, please help us to improve our selection by providing feedback about lectures, quizzes, and the site in general.
Enroll *a Google account is required

Instructors

Geoffrey Fox
Geoffrey Fox received a Ph.D. in Theoretical Physics from Cambridge University and is now professor of Informatics and Computing, and Physics at Indiana University where he is director of the Digital Science Center and Associate Dean for Research and Graduate Studies at the School of Informatics and Computing. He previously held positions at Caltech, Syracuse University and Florida State University. He has supervised the PhD of 65 students and published around 1000 papers in physics and computer science with an hindex of 63 and over 21000 citations. He currently works in applying computer science to Bioinformatics, Sensor Clouds, Earthquake and Ice-sheet Science, and Particle Physics. He is principal investigator of FutureGrid – a facility to enable development of new approaches to computing. He is involved in several projects to enhance the capabilities of Minority Serving Institutions including the eHumanity portal. He has experience in online education and its use in MOOC’s for areas like Data and Computational Science. He is a Fellow of APS and ACM.
Michael Schatz
Michael Schatz is an assistant professor in the Simons Center for Quantitative Biology at Cold Spring Harbor Laboratory. His research interests include developing large-scale sequence analysis methods for de novo assembly, variation detection, and related analysis. In recent years, Schatz has pioneered the use of parallel and cloud computing technologies in the life sciences, by developing some of the leading NGS analysis systems with them. Schatz received his Ph.D. and M.S. in Computer Science from the University of Maryland in 2010 and 2008, and his B.S. in Computer Science from Carnegie Mellon University in 2000. For more information see: http://schatzlab.cshl.edu.
Thomas J. Hacker
Tom Hacker is an Associate Professor of Computer and Information Technology at Purdue University. Dr. Hacker’s research interests center around high performance computing and networking on the operating system and middleware layers. Recently, his research has focused on cloud computing, cyberinfrastructure, scientific workflows, and data-oriented cyberinfrastructure. Dr. Hacker is also Co-Leader for Information Technology for the Network for Earthquake Engineering Simulation (NEES), which brings together researchers from fourteen universities across the country to share innovations in earthquake research and engineering. In addition, he recently received a four-year grant through the University of Stavanger from the Norwegian Center for International Coooperation (SIU) to develop a joint education and research program in data analytics, high performance, and cloud computing.
Abhishek Chandra
Abhishek Chandra is an Associate Professor in the Department of Computer Science and Engineering at the University of Minnesota. His research interests are in the areas of Operating Systems and Distributed Systems, with recent focus on performance and resource management in large-scale systems such as Clouds, Grids, and Data centers. He received his B.Tech. degree in Computer Science and Engineering from IIT Kanpur, India, and M.S. and PhD degrees in Computer Science from the University of Massachusetts Amherst. He is a recipient of the NSF CAREER Award and IBM Faculty Award, his PhD dissertation was nominated for the ACM Dissertation Award, and he is a co-author on two Best Paper/Student Paper Awards.
Jonathan Klinginsmith
Jonathan Klinginsmith is a Ph.D. Candidate in the Computer Science Department at Indiana University.
Jerome Mitchell
Jerome Mitchell is pursuing a Ph.D in computer science from Indiana University and is interested in coupling the fields of computer and polar science. He has participated in the United State Antarctic Program, (USAP), where he collaborated with a multidisciplinary team of engineers and scientists to design a mobile robot for harsh polar environments to autonomously collect ice sheet data, decrease the human footprint of polar expeditions, and enhance measurement precision. His current work include: using computer vision techniques to help polar scientists understand the bedrock and internal layers in radar imagery as well as using graphical processing units for signal processing algorithms. He has also been involved in facilitating workshops to educate faculty and students on the importance of parallel and distributed computing.
Thilina Gunarathne
Thilina Gunarathne is a PhD candidate at the School of Informatics and Computing, Indiana University. His research interests are in the fields of distributed & parallel computing, cloud computing, many/multicore systems and SOA. His current research focuses on exploring architectures and programming models for scalable parallel computing on cloud environments. He received his B.Sc. (Computer Science and Engineering) from the University of Moratuwa, Sri Lanka in 2006 and M.Sc. (Computer Science) from Indiana University in 2009.
Hui Li
Hui Li is a PhD student at the School of Informatics and Computing, Indiana University. His research interests are in the fields of distributed & parallel computing, cloud computing. His current research focuses on data parallel runtime and higher level programming language interface. He worked for DryadLINQ and Twister project.
Judy Qiu
Dr. Judy Qiu is an Assistant Professor of Computer Science at Indiana University. Her areas of study include parallel and distributed systems, Cloud/Grid computing and high performance computing. She directs SALSA project encompassing data-intensive computing at the intersection of Cloud and multicore technologies. An extended research beyond MapReduce is to support iterative algorithms in data mining and machine learning and her research team has released both Java and Azure versions of Twister iterative MapReduce system.
Manish Parashar
Manish Parashar is Professor of Electrical and Computer Engineering at Rutgers University. He is also the founding Director of the Rutgers Discovery Informatics Institute (RDI2), the NSF Cloud and Autonomic Computing Center (CAC) at Rutgers (CAC@Rutgers) and the The Applied Software Systems Laboratory (TASSL), and is Associate Director of the Rutgers Center for Information Assurance (RUCIA). He recently served as Program Director in the Office of Cyberinfrastructure (OCI) at the National Science Foundation (NSF), where he managed an approximately $150 Million research portfolio in the areas of software sustainability, computational and data-enabled science and engineering and cloud computing. At NSF, he established and led the crosscutting Software Infrastructure for Sustained Innovation (SI2) program, the CI TraCS Computational Science Fellowship programs, was involved in establishing the Computing in the Cloud (CiC) program, and worked on the NSF-wide Cyberinfrastructure Framework for 21st Century Science and Engineering (CIF21) initiative. His research interests are in the broad area of Parallel and Distributed Computing with a focus on Computational and Data-Enabled Science and Engineering. Manish has held a visiting position at the eScience Institute at Edinburgh, UK (2009-2010), a joint research appointment with the Center for Subsurface Modeling, The University of Texas at Austin (1996-2006), and a visiting position at the Laboratoire d'InfoRmatique en Images et Systemes d'information, (LIRIS), Lyon France. He has also been a visiting fellow at the Department of Computer Science and DOE ASCI/ASAP Center, California Institute of Technology (2000-2001), at the DOE ASCI/ASAP FLASH Center, University of Chicago (1998), and at the Max-Plank Institute in Potsdam, Germany (1994-1998). Manish received the IBM Faculty Award in 2008 and 2010, the Tewkesbury Fellowship from University of Melbourne, Australia (2006), the Rutgers Board of Trustees Award for Excellence in Research (The Award) (2004-2005), the NSF CAREER Award (1999), TICAM (now ICES) (University of Texas at Austin) Distinguished Fellowship (1999-2001), and the Enrico Fermi Scholarship, Argonne National Laboratory (1996). He is a Fellow of AAAS, Fellow of IEEE / IEEE Computer Society , and Senior Member of ACM. Manish serves on the editorial boards and organizing committees of a large number of journals and international conferences and workshops. He has also served as a panelist for NSF, DoE and other funding agencies, and regularly reviews technical articles for journals and conferences. At Rutgers, he service in leadership roles on various, University, School and Department level committees and is actively involved in curriculum development, specially in the area of applied parallel and distributed computing and computational and data-intensive computing. Manish has co-authored a significant number technical publications including paper in international journals and conferences, invited papers and presentations and book chapters. He has also co-authored/edited books, conference proceedings and journal special issues and has presented large number of keynotes and distinguished seminars. He has also developed and deployed several software systems that are widely used, including CometCloud, DataSpaces/DIMES/DART, Discover, AutoMate (Accord, Rudder/Comet, Meteor, Squid, Topos, Pawn, DAIS, SESAME), GrACE/DAGH, MACE, Pragma/ARMaDA and the CORBA CoG Kit. His research and software were part of the Help Defeat Cancer Project on the IBM World Community Grid , and are also being considered for commercial deployment. Manish received a BE degree in Electronics and Telecommunications from Bombay University, India, and MS and Ph.D. degrees in Computer Engineering from Syracuse University.
Gideon Juve
Gideon Juve is a Computer Scientist at USC Information Sciences Institute. His research interests include Computational workflows, scientific computing, high-throughput parallel computing, grid computing, and cloud computing.
Renato Figueiredo
Renato Figueiredo is an Associate Professor of Electrical and Computer Engineering at the University of Florida. He is also affiliated with the Advanced Computing and Information Systems (ACIS) Laboratory, and with the NSF I/UCRC Center for Autonomic Computing (CAC). His research focuses on Virtualization in distributed systems (virtual machines, networks, file systems), Autonomic computing, Peer-to-peer systems, Social networks and their applications in systems design, Cloud computing, and Computer architecture. He currently serves as associate editor of the IEEE Transactions on Computers and the Cluster Computing Journal, and has served on several technical program committees, including the HPDC, SC, ICAC, Cluster, CCGrid, and SBAC-PAD conferences. He is the lead of FutureGrid’s Training, Education, and Outreach Team.
Martin Swany
Martin Swany is an Associate Professor of Computer Science at Indiana University. He is also Associate Director, Data to Insight Center of the Pervasive Technology Institute, and Associate Director, Indiana Center for Translational Network Research and Education (InCNTRE). His research areas include compilers, computer networks, cyberinfrastructure and e-Science, High Performance Computing, Parallel and Distributed Computing, and Software and Systems.
Rick Stevens
Rick Stevens is a professor at the University of Chicago in the Department of Computer Science and holds senior fellow appointments in the University’s Computation Institute and the Institute for Genomics and Systems Biology, where he teaches and supervises graduate students in the areas of computational biology, collaboration and visualization technology, and computer architecture. He co-founded and has co-directed the University of Chicago/Argonne Computation Institute, which provides an intellectual home for large-scale interdisciplinary projects involving computation. He is also Associate Laboratory Director responsible for Computing, Environment, and Life Sciences research at Argonne National Laboratory. Recently Rick has been co-leading the DOE planning effort for exascale computing research aiming to develop computer systems 1,000 times faster than current supercomputers and apply these systems to fundamental problems in science including genomic analysis, whole cell modeling, climate models and problems in fundamental physics and energy technology development.
XiaoFeng Wang
Xiaofeng Wang received his Ph.D. in Computer Engineering from Carnegie Mellon University in 2004, and has since then joined Indiana University at Bloomington as assistant professor (Aug, 2004 to Jun. 2010), and then associate professor (after Jun. 2010). He serves as acting director of the Security Informatics Program at IU from Jan. 2010 to Dec. 2010. Dr. Wang is a well-recognized active researcher in information security and privacy. His group continuously publishes at leading security venues and vigorously pursues innovative and high-impact research directions. He has also been actively serving the research community, participating in the program committees and organization committees of numerous conferences and workshops.
Jamie Kinney
Jamie Kinney joined Amazon.com in 1998 and has served in a number of roles including Software Development Engineer, Database Engineer, Data Warehouse Architect and Strategic Alliance Manager. He is currently a member of the Amazon Web Services team where he helps the scientific community leverage the cloud for High Performance Computing and Big Data applications. Jamie works closely with customers like NASA/JPL, the US Department of Energy, and research institutions around the world. He holds a degree in Marine Biology from the University of Miami and is an amateur astronomer and space enthusiast.
Lavanya Ramakrishnan
Lavanya Ramakrishnan is a scientist at the Lawrence Berkeley National Lab. Previously she worked as a research staff member at Renaissance Computing Institute and MCNC in North Carolina. She has Masters and doctoral degrees from Indiana University and a bachelor degree in computer engineering from VJTI, University of Mumbai. Her primary areas of interest are in software tools for high performance computing, data-intensive computing and distributed systems (grids, clouds, etc) for scientific applications. She is interested in data management, workflow tools, resource management, monitoring and adaptation for performance and fault tolerance.
Andrew J. Younge
Andrew J. Younge is a Ph.D Candidate in Computer Science at Indiana University at Bloomington, and currently a visiting researcher at the University of Southern California's Information Sciences Institute. Andrew’s research interests include Cloud Computing, Cyberinfrastructure, and Distributed Systems with specializations in virtualization and high-performance and parallel computing. He is an active member of the FutureGrid project, an NSF-funded experimental Cloud testbed for scientific researchers at Indiana University. He received his Bachelors and Masters of Science from the Department of Computer Science at Rochester Institute of Technology (RIT) in 2008 and 2010, respectively. During this time, Andrew worked as a Graduate Researcher on the Cyberaide Project in the Service Oriented Cyberinfrastructure Laboratory and as a Research Assistant in experimental Social Psychology. Andrew also completed an internship at the University of Maryland, College Park in 2007 where he contributed to the Lattice Project, a regional Grid computing system to support advanced scientific research projects.

Repository

  • Unit 1 - Introduction Geoffrey Fox

  • Unit 2 - Biology in the Clouds Michael Schatz

  • Unit 3 - Infrastructure Used: FutureGrid Geoffrey Fox

  • Unit 4 - Virtualization on HPC Thomas J. Hacker

  • Unit 5 - Running MapReduce in Non-Traditional Environments Abhishek Chandra

  • Unit 6 - Virtual Clusters Supporting MapReduce in Cloud Jonathan Klinginsmith

  • Unit 7 - Hadoop and HDFS Jerome Mitchell

  • Unit 8 - HBase and Bigtable Storage Thilina Gunarathne

  • Unit 9 - Data Mining with Twister Iterative MapReduce Hui Li

  • Unit 10 - Twister Introduction Judy Qiu

  • Unit 11 - Commercial IaaS/PaaS I: AWS: The Platform for Data Science Jamie Kinney

  • Unit 12 - Commercial IaaS/PaaS II: Azure and Twister4Azure Thilina Gunarathne

  • Unit 13 - Federating HPC, Cyberinfrastructure and Clouds using CometCloud I Manish Parashar

  • Unit 14 - Scientific Workflows in the Cloud Gideon Juve

  • Unit 15 - Cloud Technology: Virtual Private Clusters: Virtual Appliances and Networks in the Cloud Renato Figueiredo

  • Unit 16 - Clouds and the Network Martin Swany

  • Unit 17 - Applications of Cloud: DOE Systems Biology Knowledgebase Rick Stevens

  • Unit 18 - Cloud Technology: Cloud Security: New Challenges and New Opportunities XiaoFeng Wang

  • Unit 19 - Magellan: Evaluating Cloud Computing for Science Lavanya Ramakrishnan

  • Unit 20 - Cloud Technology: GPU on Clouds Andrew J. Younge