IBM is working with NASA to leverage the space agency’s vast amount of data to address the challenges of the changing climate here on Earth.
The two organizations have developed an open-source geospatial foundation model built with IBM’s watsonx.ai platform and trained on massive amounts of data from NASA’s satellites as part of a project announced in February. And now that foundation model will be openly available on Hugging Face, which is home for a growing number of open AI models, with a core mission being making AI technologies available to a broad community.
By putting the foundation model on Hugging Face and making it widely available beyond researchers and scientists, IBM and NASA officials hope it will spur the development of other AI models, projects, and innovations in climate and Earth science.
“The essential role of open-source technologies to accelerate critical areas of discovery such as climate change has never been clearer,” Sriram Raghavan, vice president of IBM Research AI, said in a statement. “By combining IBM’s foundation model efforts aimed at creating flexible, reusable AI systems with NASA’s repository of Earth-satellite data, and making it available on the leading open-source AI platform, Hugging Face, we can leverage the power of collaboration to implement faster and more impactful solutions that will improve our planet.”
A Foundation for More AI Models
In announcing the project six months ago, the organizations said the goal was to create reusable foundation models to analyze the petabytes of text and remote-sensing data from NASA and make it easier for scientists to build AI applications to answer specific questions.
Foundation models have become integral to training vast amounts of data like that collected by NASA, and doing so at scale. Rather than training on labeled data, foundation models can be trained on unlabeled data and adapted for specific jobs using some targeted – or labeled – data to create a more customized model.
“We believe that foundation models have the potential to change the way observational data is analyzed and help us to better understand our planet,” said Kevin Murphy, NASA’s chief science data officer “By open sourcing such models and making them available to the world, we hope to multiply their impact.”
The model was trained by IBM and NASA using data collected by the space agency’s Harmonized Landsat Sentinel-2 satellite project over a year across the continental US and fine-tuned using labeled data for flood and burn-scar mapping.
According to IBM, the model was trained on Vela, its AI supercomputer, and leveraged the PyTorch open machine learning framework and ecosystem libraries to train and tune it on labeled images of floods and burn scars created by wildfires. The new foundation model has shown a 15% accuracy improvement over other deep-learning models that use half as much labeled data floods and fires.
IBM and NASA fine-tuned the base model for flood and fire mapping, but others may do the same for such tasks as tracking deforestation, predicting crop yields, or detecting and monitoring greenhouse gases, they said. IBM and NASA are working with Clark University to adapt the model for such applications as time-series segmentation and similarity research.
Available and Accessible
Building the foundation model through open source resources and making it available on Hugging Face are part of larger efforts by IBM and NASA – and central to what Hugging Face is doing – to democratize access to AI and the technologies around it, both to give more organizations access to them and to spread their benefits.
IBM is doing that with watsonx, an AI and data platform announced in May and an evolution of its Watson AI project (of Jeopardy fame in 2011) aimed at advancing foundation models and generative AI. A commercial version of the new foundation model will be available later this year through IBM’s Environmental Intelligence Suite.
NASA’s decade-long Open-Source Science Initiative is designed to make its code, data, and AI models available to a wider population beyond a small community of scientists and researchers.
NASA estimates that by next year, scientists will have 250,000TB of data from new missions. Right now the agency is holding 70PB of earth science data, a number that push to 600PB by 2030 with the launch of a dozen new missions, including the Surface Water and Ocean Topography (SWOT) and NASA-ISRO SAR missions.
That’s a lot of data for scientists and researchers to analyze. Building the foundation model – and making it available via Hugging Face – will help.
“AI remains a science-driven field, and science can only progress through information sharing and collaboration,” said Jeff Boudier, head of product and growth at Hugging Face. “This is why open-source AI and the open release of models and datasets are so fundamental to the continued progress of AI.”