
Microsoft will provide the cloud computing research
projects identified by NSF through its rigorous peer review process with access to Windows Azure cloud computing platform that provides on-demand computing and storage to host, scale and manage Web applications on the Internet through Microsoft data centers, for a 2-year period, along with a support team to help researchers quickly integrate cloud technology into their research.
Microsoft researchers and developers will work with grant recipients to equip them with a set of common tools, applications and data collections that can be shared with the broad academic community.
The following projects, each led by the named principal investigator (PI), have received NSF funding to participate in the NSF/Microsoft cloud computing collaboration:
Cornell University (Kenneth Birman) Building Scalable Trust in Cloud Computing A growing spectrum of societal-critical, highly sensitive applications are shifting towards cloud computing to benefit from lower costs. These include those related to medicine, from treatment to surgical procedures. Issues need yet to be addressed, including high availability, secure access, fault tolerance and preservation of privacy and real-time responsiveness. Researchers will explore consistency issues as cloud computing apps are applied to large-scale systems, which contribute towards scientific foundation for scalable trust in cloud computing.
J. Craig Venter Institute Inc. (Audrey Tovchigrechko) Bettering Interactive Protein-Protein Docking
Understanding the detailed mechanism of protein-protein interactions is essential in molecular biology. Modelling protein to protein interactions in the 3D "protein-protein docking," is computationally intensive. Due to problem complexity, protein-protein docking may often suggest only a set of putative complex structural arrangements. But, combined with certain molecular experiments, computation docking can be a powerful tool. The Azure cloud platform will be used to address two restrictions in the currently existing protein-protein docking paradigms: insufficient scalability and lack of interactivity.
State University NY, Buffalo (Tevfik Kosar) Enhancing Stork Data Scheduler for Azure. This has been actively used in many application areas, including coastal hazard prediction and storm surge modeling; oil flow and reservoir uncertainty analysis; numerical relativity and black hole collisions; educational video processing and behavioral assessment; digital sky imaging; multiscale computational fluid dynamics. Research aims to develop and enhance Stork Data Scheduler to support the Azure environment, and mitigate end-to-end data handling bottleneck in data-intensive cloud computing applications. The result predicted will dramatically change how domain scientists perform research by facilitating shared raw data and experiences.
University California, San Diego (Kenneth Yocum) Utilizing Continuous Bulk Processing Rapid data "deluge" in scientific enterprise gives rise to many exciting data mining opportunities.The project explores alternative data processing architecture that fundamentally improves computing efficiency to reduce costs and enhanced data mining capabilities for cloud computing, or continuous bulk processing. Key approach facet is allow analytics to simply be updated, not recomputed when new data arrives. Theultimate reach of this incremental approach, will determine how users may trade cost for performance for incremental analytics.
University of Colorado, Boulder (Richard Han) Enabling Mobile Cloud Computing. Main goal is to define and develop a common cloud computing framework used to stimulate design and development of nextgen mobile applications. Future mobile applications will increasingly be context-specific, demanding more cloud resources,
insisting on real-time performance. A representative next-generation mobile application called "VideoLense tricorder" is identified. With this application, addressing the new and unique challenges it poses from traditional cloud services, the team will investigate key research questions to enable such futuristic mobile applications.
University of Michigan, Ann Arbor (Qiaozhu Mei) Refining Language models using web -scale language networks. In rich, online communities, an overload of text data is continuously produced where rich, interesting information, topics, events, opinions, behaviors, intents, rumors, needs - even scientific discoveries - may be buried. Statistical language models enable efficient retrieval of that information, and enable discovery of interesting patterns from the text content, affording insight into the people who created the content. Quality and performance of language models are usually limited due to sparse data, mismatch of context and neglect of connections between words and phrases. Researchers will refine models by using large, web-scale datasets and cloud power. Experimenting in health informatics, they aim to glean valuable insights on how to more effectively seek and route information.

University of North Carolina at Charlotte (Zhengchang Su) Predicting Transcription Factor Binding Sites for Genes. Although huge advances have been made identifying the gene-coding DNA sequences in bacterial genomes using computational methods, the understanding of regulatory DNA sequences is limited due to the lack of efficient computational and experimental methods for predicting them.
These researchers will capitalize on the recently reduced time and cost of sequencing a genome, to improve knowledge of gene regulatory systems in single-celled organisms. The knowledge garnered will amplify scientists' understanding of biology, with broad predicted impacts on applicationsas renewable energy production, environmental protection and disease prevention.
University of South Carolina Research Fund
(L: Jonathan Goodall) and the University of Virginia (R: Marty A. Humphrey) Managing Large Watershed Systems. Understanding hydrologic systems at the scale of large watersheds is critically important to society when faced with extreme events, such as floods and droughts, or with concern about water quality. Climate change and increasing population further complicate watershed-scale prediction, placing additional stress and uncertainty on future hydrologic conditions.
This advances hydrologic science and water resource management by creating and using a cloud-enabled hydrologic model & data processing workflows to examine the Savannah River Basin in the Southeastern USA. This will provide detail and scale to address fundamental research questions related to quantifying climate change impacts on water resources.
University Southern California (Viktor Prasanna) Tackling Large Scale Graph Problems. Adoption of cloud computing has been impeded by concerns of corporate governance, data privacy, security, Health Insurance Portability and Accountability Act (HIPAA) that mandate data storage and access auditing. Managing private clouds that offer scalability is expensive. Integrating public, private clouds seamlessly is not easy.
More daunting are complexities of developing applications that understand the cloud programming paradigm and best derive benefits of the cloud infrastructure. This will devise a framework to address these challenges,enhance the availability and efficiency of the cloud. Researchers plan to demonstrate its framework using applications in real-time search, ranking and semantic association discovery for healthcare and energy informatics.
University of Texas at Austin (Michael Walfish) Storing Data with Minimal Trust Researchers will determine how to build a cloud storage service under minimal trust assumptions, without the clients having to assume that providers will always operate correctly.
Issues particularly relevant to cloud storage include involving storage service providers operated by a party other than the data owner, software bugs, correlated manufacturing defects, misconfigured servers, operator error, malicious insiders, bankruptcy, fires and more.
University of Washington (Magdalena Balazinska) - Understanding Relational Data Markets.While today's cloud computing systems offer simple pricing schemes for storage and computing resources, the economics of data sharing are poorly understood and only coarsely supported.
This endeavor will develop models and infrastructures to establish relational data markets in the cloud, build a prototype system to implement models and support both data pricing and ad hoc data sharing. This will enable users to sell their data in the cloud, choosing how to price it and query results and to buy and combine data from different providers, possibly reselling it in turn. It will support efficient, fair data sharing between individual scientists.
Virginia Tech (Wuchun Feng) - Conducting intensive biocomputing.
With DNA sequencers in the life sciences able to generate a terabyte of data a minute, the size of DNA sequence databases will increase 10-fold every 18 months. This will create a computational power to increase 50 times faster than Moore's Law. Scientists and engineers must increasingly rely on HPC to keep pace, which is costly and difficult to access and use. Research aims to create a newgen efficient data management and analysis software for large-scale, data-intensive scientific applications in the cloud. Recent experience in delivering reliable computing over volatile cloud resources will be leveraged to enhance robustness of data management and analysis software, striving to eliminate the "no hardware failures" assumption or "very infrequent failures" as with traditional HPC data management techniques.
Virginia Tech (Kwa-Sur Tam) Effectively and widely using renewable energy sources. An accurate forecast is key to effective utilization of weather-dependent renewable energy sources, such as wind and solar. Weather forecasting is a complex and data-intensive computing process. Research is to develop the Forecast-as-a-Service (FaaS) framework to: enable combined use of different types of data from different sources for new prediction models to enhance synthesis of more accurate forecasts: support on-demand delivery of different types of forecasts at different levels of detail for varying prices
t
o accommodate renewable energy users with different needs and varying budgets.