4CeeD: An Advanced Service and Data Management System for Digital Materials and Nanotechnology Environments
Digital materials and nanotechnology environments are facing major challenges dealing with the 5Vs of materials data: Volume, Variety, Value, Velocity, and Veracity. This data is generated by scientific instruments such as SEM (Scanning Electron Microscope), TEM (Transmission Electron Microscope), AFM (Atomic Force Microscope) and others.
We see the volume of materials data generated not only during one microscopy session, but over longer timespan of many experiments which take months and years to gain new insights. We see the variety of data and metadata since the scientific instruments provide not only large scale images to researchers, but also a number of instrument setup parameters, sensory measurements (e.g., pressure and temperature measurements to test new materials characteristics) and environmental data (e.g., humidity, vibration around instruments). We see value challenges since not all captured data contribute to insights. Velocity of materials data is another issue as the scientific instruments improve and the speed of generating of data increases. Finally, the veracity of data can be an issue since very different groups of users (e.g., novice students, senior students, staff, faculty) collect materials and nanotechnology data, and not all data might be trustworthy and accurate.
To solve some of the 5Vs data problem, we have designed, developed, deployed and further optimized the 4CeeD (Capture, Curate, Coordinate, Correlate, Distribute) distributed system to advance the services and data management infrastructure for digital materials and nanotechnology environments. This web-based private cloud system is ideal for midsize materials and nanotechnology laboratories.
At the 4CeeD instrument side, materials researchers use a web-browser to upload their instrument data and metadata similar to the Dropbox approach. The difference, however, is that with 4CeeD you can upload not just large scale instrument images, but the web-based interface allows you to specify their metadata, recipes, and associate them digitally with uploaded images. In the past, the metadata used to be captured only in paper lab-notebooks and not associated digitally with uploaded images.
4CeeD’s private cloud side provides advanced storage and modern Docker-based compute facility. Five major services are offered: (1) all pre-specified instrument metadata get indexed and associated with instrument images, (2) new metadata get extracted from instrument images, indexed and associated with images, (3) all metadata and data is searchable, (4) all data and metadata can be pre-viewed, and visualized in effective manner, and (5) new extraction, analytics, and visualization services can be written over materials data and metadata via Jupyter Notebook tool, integrated with the 4CeeD data management. The service cloud infrastructure relies on the modified Clowder data management, developed jointly by NCSA and 4CeeD development teams. The 4CeeD underlying data management system lets you organize your data into spaces and datasets as higher level abstractions above regular files and file folders. This data organization enables building datasets correlated with a microscopy session, and data spaces associated with a long-term experiment on a particular instrument or a set of instruments.
4CeeD is becoming a valuable data and service tool for two laboratories at the University of Illinois at Urbana-Champaign: the Nano-and-Micro-Technology Lab and the Materials Research Lab. We anticipate more labs and researchers across the globe incorporating 4CeeD infrastructure as the 5Vs of materials data becomes more and more acute.
This article on 4CeeD was contributed by Klara Nahrstedt, the Ralph and Catherine Fisher Professor of Computer Science and Director, Coordinated Science Laboratory, at the University of Illinois at Urbana–Champaign.
Meet the Clowder Architect: Luigi Marini
Like its dictionary definition, a group of cats, the Clowder framework was built to support cohesion across individuality; namely, the integration of data formats, representations and analytics across multiple research domains.
Seven years ago, Luigi created the first commit for what would become the Clowder framework but was initially known as Medici. In scaling up the system to handle larger datasets and computational loads, it soon became clear that many of the initial software design decisions would need to be revisited. As part of a partnership with the National Archives and Records Administration, Luigi redesigned the original Medici system from the ground up and renamed it Clowder.
Since its launch, the Clowder core has received more than 5000 commits from over 30 committers across the nation and more than 50 metadata extractors developed across more than 15 projects. As the software architect of Clowder, Luigi’s focus is to ensure different project contributions co-exist to support a wide spectrum of diverse research communities, from earth sciences to digital humanities. This has not been an easy task, but, thanks to the latest CSSI NSF grant, Luigi is optimistic about freshening up the codebase with the help of the Clowder community in order to improve the system for the next decade of research data management.
In his 19 years at NCSA, Luigi has had the privilege of developing cyberinfrastructure for a wide variety of domains. As data-driven science and research continues to grow, his hope is that robust and flexible generic data frameworks, like Clowder, will become an indispensable part of successful scientific collaboration. Not just in helping researchers navigate big data’s data deluge, but also in supporting the unique datasets and transdisciplinary methods prevalent in big data’s long tail.
What did you miss in our last webinar?
The first Friday of every month is a webinar open to anyone with the link. The link is posted on the Clowder website and reminders sent out to everyone on the mailing list. These monthly All Paws Webinars are for the Clowder community to learn about the status of the project and future directions. For more information please join the mailing list or ask on Slack.
During the November 18 webinar we had presentations by Todd Nicholson and Colter Wehmeier on new capabilities that are being added to Clowder.
Todd spoke about the new mobile app being developed for materials research as part of the 4CeeD project (see our featured project partner story above).
The app currently in development will allow you to take pictures in the lab or on the move, and automatically upload them to Clowder directly from your phone. You can also browse existing content, right in the app!
Colter spoke on Wikar, a mobile app capable of displaying virtual objects in real world space using augmented reality. Wikar uses Clowder as a backend to store the digital objects and related metadata.
You can see a summary or view a copy of these presentations as well as past webinars here.
What did you think of the All Paws Meeting?
We are happy to report a 93% satisfaction rating with the All Paws Meeting that was held in July 2019! The location was a hit but overall people said the room was too small with not enough space to spread out for conversations and the length of the meeting was too short. Most suggested having the meeting during the week over 1.5 to 2 days to allow more time for lightning talks and breakout sessions. The main takeaway is that everyone would like to do it again! If you haven’t had a chance to share your input on the All Paws Meeting it’s not too late. Please fill out the survey. Thank you!
Clowder v1.8.0 has been released! This version brings:
- the ability for users to provide run time parameters when manually submitting extractions;
- improvements to the search web interface and web API endpoints;
- a new tree folder like view starting at spaces and going down to files (available in the “Tree View” tab on the home page);
- minor improvements to the pagination and extraction pages; improvements to the Docker build;
- support for MongoDB 3.6;
- internal optimizations of MongoDB queries;
- and a variety of bug fixes.
For the full list please see the changelog.
Clowder is an open source software to help researchers manage data. It is designed to support any data format and multiple research domains. When new data is added to the system, whether it is via the web front-end or through its web service API, a cluster of extraction services process the data to extract interesting metadata and create web-based data visualizations.