PIs: Andy Li, Yugi Lee, Alen Malony
DeepCloud is designed as an open software-defined ecosystem for researchers at different levels with the following salient features and transformative impacts. It is one of the first massively scalable multi-tenant open cloud platform with full-fledged building blocks and comprehensive shared stores (app, model, knowledge, data) for deep learning research and applications. It is designed to deeply share models, knowledge, data, apps, and computing resources by the community and for the community, leveraging recent progress in open source communities. It offers comprehensive suites of services to manage the full cycle of deep learning research and applications. We design and implement user-friendly interfaces and developer-friendly programming models, SDK, and APIs. We offer pluggable modules to compose new DL models, execute DL jobs with large data input/output, automatically parallelize DL jobs among large-scale hybrid CPUs and GPUs, enable flexible composition and decomposition at runtime, and measure the performance of algorithms and systems. In this way, DeepCloud dramatically lowers the barrier to entry with apps, models, knowledge via transfer learning, and resources. We also provide DL-as-a-Rack (DLaaR) for other campuses with preconfigured racks or DL container as a service (DLaaS) with self-configurable software packages to be deployed on local clusters with other campuses or industry partners.