The Environmental Data Science ⛰ 🌳 🏙️ ❄️ 🔥 🌊 online Collaboration Cafe
Contents
This HackMD is based on The Turing Way collaboration cafe template
A permanent document exists in the HackMD: https://hackmd.io/@environmental-ds/collaboration-cafe that is regularly updated with the empty template for next event.
The Environmental Data Science ⛰ 🌳 🏙️ ❄️ 🔥 🌊 online Collaboration Cafe¶
25 January 2021 | TBC¶
Thank you for joining the The Environmental Data Science’s online Collaboration Cafe!¶
We’re delighted to have you here ☕ ✨ 🍰
When? 25 January 2022, 14:00 - 16:00 UTC (see in your time zone)
Next call: 22 Febraury 2022
What? The Environmental Data Science is a community aiming to learn and discuss about good research practices to use existing Data Science and AI solutions to a better understanding of the planet earth across multiple environmental settings. Collaboration Cafes are online coworking calls that engage anyone interested in learning and discussing about relevant themes in Data Science and AI to environmental studies.
Who? Everyone interested in reproducible, ethical, and inclusive Data Science and research for environmental studies are welcome to join the full or any part of The Environmental Data Science project, community, and/or this call.
How? Join Zoom Meeting https://turing-uk.zoom.us/j/6779579342?pwd=L25scnhXUUNmVjFsc0hRWTAzTVJ1dz09
==The waiting room is enabled. The host of this call will let you in.==
All questions, comments, and recommendations are welcome!
Code of conduct¶
Useful links¶
All about online Collaboration Cafes
Sign up below¶
Name + Icebreaker question + an emoji to represent it (emoji cheatsheet)
(Remember that this is a public document. You can use a pseudonym if you’d prefer.) ==If you are new to HackMD, please see this document for a short guide (right click, open in a new window): https://hackmd.io/@turingway/hackmd-guide.==
Agenda¶
Schedule¶
Duration |
Activity |
---|---|
Start |
👋 Welcome, code of conduct review |
10 mins |
Introductions and personal goal setting |
25 mins |
🍅 1st Pomodoro session |
5 mins |
☕️ Break |
20 mins |
🍅 2nd Pomodoro session |
5 mins |
☕️ Break |
20 mins |
🍅 3rd Pomodoro session |
5 mins |
☕️ Break |
30 mins |
Open discussion: celebrations, reflections and future directions |
5 mins |
👋 Close |
Breakout rooms: Topic proposals¶
If you have an idea for a topic you’d like to discuss in a breakout room, please add it below and put your name next to it. If you like one of the topics that are already suggested, please add your name next to that one. Teamwork makes the dream work. For more information about breakout rooms see the description on GitHub.
Topics for breakout / Names
Notes and questions¶
Request for reviews!¶
Feedback at the end of the call¶
Notes from the last call:¶
Archive: 23 November 2021 | FAIR data in Environmental Sciences¶
The Turing Way project illustration by Scriberia. Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807.
Name + What is your recent favorite resource or tool or app or software? + an emoji to represent it (emoji cheatsheet)
Alejandro + The Turing Way + :milky_way:
Bea + scivision + :koala:
Conversation Starters¶
Advertise and promote your event or anything exciting you’re working on. ✨
None
Breakout rooms: Topic proposals¶
Topics for breakout / Names
Main stage (silent mode)
Bea: working on the submission of her PhD thesis.
Alejandro:
adding helpful resources about FAIR and example of research repositories for Environmental Sciences.
checking which sample data within the Environmental Data Science book can be curated in the Environmental Data Science Zenodo community.
Notes and questions¶
Alejandro
Useful resources about FAIR :
FAIR Cookbook: an online resource for the Life Sciences with recipes to make and keep data FAIR.
The Turing Way: The FAIR Principles: light introduction to FAIR principles, pointing to key resources in the topic.
Library Carpentry: FAIR Data and Software: lesson exploring the meaning of FAIR elements.
Research data platforms:
General
re3data.org: initiative indexing research data platforms by content topic and knowledge domain.
Stats datacite: dashboard mapping the registration of persistent identifiers (DOIs) for research data and other research outputs.
Environmental science (list of platforms with the highest number of total DOIs registrations according to Stats datacite):
PANGAEA: earth system research.
Environmental Data Initiative (EDI): platform suited to curate environmental data, includes code snippets to import the data across multiple programming languages (python, R).
Other interesting FAIR-driven platforms:
ROHub: research object management platform supporting the preservation and lifecycle management of scientific investigations, research campaigns and operational processes. It implements FAIR digital objects and specific metadata for data-cube in Earth Science.
Challenges of FAIR data repositories for Environmental sciences (ES):
ES is structured as tabular data collected in the field or laboratory (see further discussion in BEXIS2).
FAIR-enabled data available could be daunting for many ES researchers and organisations due to the lack of awareness, efficient data management tools, infrastructure and skills see further discussion in BEXIS2).
Spatio-temporal (data cubes) > this seems to be adressed by novel research object management platforms such as ROHub.
Request for reviews!¶
None
Feedback at the end of the call¶
Alejandro: Few participants in this particular collaboration cafe. We should restructure the promotion strategy, proposing new topics and/or changing the format for coming collaborations cafes in 2022 :face_with_monocle:.
Archive: 26 October 2021 - Reproducibility in Environmental Science¶
Name + Share a song that expresses your personality + an emoji to represent it (emoji cheatsheet)
Alejandro + Should stay or Should I go (The Clash) + 🧳
Sam + Wish you were here (Pink Floyd)
Matt - BBC Grandstand Theme - :horse_racing:
Conversation Starters¶
Advertise and promote your event or anything exciting you’re working on. ✨
EGU22 session was accepted, Bridging the spatial scales, from surface sensors to satellite sensors: Innovative approaches towards the construction of Earth’s digital twin. Deadlines:
Abstract submission deadline: 12 January 2022, 13:00 CET
Travel Support application deadline: 1 December 2021
Breakout rooms: Topic proposals¶
Topics for breakout / Names
Matt, making a reproducible GitHub code for his MRes dissertation
Alejandro, preparing contributions guidelines for the Environmentel AI book
Sam J, exploration of resources for reproducibility and feedback on Matt and Alejandro’s topics
Notes and questions¶
Sam J:
The Turing Way, a great resource to guide Environmental scientist in reproducible research.
Cornell Dataset Description a good starting template for dataset documentation!
Standards in data catalogues, e.g. STAC (but it isn’t mature)
Alejandro:
Zenodo:
It is great to keep your sample data (up to 50 GB).
notebooksharing.space
A nice resource to share notebooks with interactive plotting (up to 10Mb). However, it doesn’t allow track changes as ReviewNB does.
Contributors guidelines for the EnvAI book
Sam suggests example environmental python packages with links to notebooks (e.g. hvplot, geopandas etc.)
Minimal publishable version guidelines e.g. Binder
Use external links for general versioning principles e.g. how to pull request in Github
Provide examples how to create lock environments
Section of tools for sharing notebooks e.g. ReviewNB, notebooksharing.space
Matt
Publishing reproducible code for environmental science
It can be more important that the process can be reproduced rather than accuracies to the nearest 0.01%
Use a subset of data to demonstrate the tool where the owners aren’t happy to share the whole thing - training & inference
In env science a visual demonstration of the results can be more useful than a commandline readout of accuracy
Suggest sensible ranges for hyperparameters in the documentation
Request for reviews!¶
Sam J: reviewers need for SEVIRI wildfire data notebook of the EnvAI book, see PR#12
Feedback at the end of the call¶
None
Archive: 28 September - Data preprocessing¶
Name + What’s the hardest part about working virtually for you? and the easiest? + an emoji to represent it (emoji cheatsheet)
Alejandro + social interaction, more sleep time + :busts_in_silhouette: :sleeping:
Sam A. + I still have just as many meetings if not more and it is soooo tiring! :sleeping: :pleading_face:
Evangeline + Feeling self-conscious on camera, flexibility + :movie_camera: :clock1:
Conversation Starters¶
Advertise and promote your event or anything exciting you’re working on. ✨
Met Office / Joint centre for excellence in environmental intelligence conference 16/17 Dec 2021!
We have a fresh interactive notebook in the Environmental Data S Book :earth_asia::books: The notebook focuses on detecting tree crowns using the DeepForest model :deciduous_tree:. Have a look at the rendered version here. Other recent community contributions are the exploration of sensor data, Met Office UKV high-resolution atmosphere model data for urban settings and MODIS satellite imagery and wildfire data.
Breakout rooms: Topic proposals¶
Topics for breakout / Names
Sam A. Manufacture Urban Data in GIS format
Evie. Preprocessing satellite data for crop yield prediction
Alejandro. Preprocess FluxNet data and related gridded products
Notes and questions¶
We showcased the SEPAL platform for Vegetation Satellite Image analysis.
Discussed challenges around scoping and extracting satellite data for machine learning models of vegetation (agricultural crops):
Appropriate satellite platform (Sentinel/LANDSAT?)
Preprocessing of radar and optical data (i.e. dealing with cloud cover)
Appropriate time series/critical dates for plant growth
Sam A. used ArcGIS pro to extract site-specific temperature information from a gridded netCDF dataset using the Spatial Analyst ‘Sample’ tool. It is very useful in that it works across the time dimension so I could do this for 1 year of data in one go. It is also possible to set a desired output coordinate system. I could save the data out as a csv file and then use standard python tools like pandas and numpy for further processing
Sam A. suggests using Iris package for reprojecting gridded netCDF files. The project is
Data preprocessing is still too time-consuming, and there is lack of communication of the tools available.
Request for reviews!¶
None
Feedback at the end of the call¶
None
Archive: 29 June - Data Visualization¶
Name + Something you watch (video, movie, documentary. etc) recently that was inspiring for you? + an emoji to represent it (emoji cheatsheet)
Alejandro + Black Holes: The Edge of All We Know + :milky_way:
Scott + Coded Bias + 🧠
Tom Andersson + The Dig (Netflix film on Sutton Hoo dig site) + :spades:
Emily + actual paint drying on my bedroom wall + :lower_left_paintbrush:
Sam Jackson + Calibre + :smile:
Conversation Starters¶
Advertise and promote your event or anything exciting you’re working on. ✨
Alejandro: EGU Public call-for-session-proposals all other sessions: Deadline: 6 September 2021
Scott: Pangeo European Community is growing and there are plans of coffee chats and regular showcase meetings (see here)
Breakout rooms: Topic proposals¶
Topics for breakout / Names
Sam J: Regridding MODIS data for wildfires detection
Tom: Produce script to reproduce IceNet paper figures for Nature Communications
Emily: Visualization of LiDAR data
Scott: Organizing and Admin EnvSensors WPs project timetable
Alejandro: Deploying a FluxNet use case visualization outputs for the Environmental Data Science book
Notes and questions¶
Emily showed a cool visualization of a laser scan image (100 GB) using the propietary software of the scanner device. After data preprocessing, she will use libraries for visualizing individual trees.
Emily says there are also some radar sensors that collect soil data.
Tools for regridding MODIS data. Sam is using satpy. Suggestions of other existing tools are welcome.
Tom is making his code nicer i.e. modules and efficient i.e using dask.
Alejandro shows FluxNet demo
Emily suggest adding woodlands and shrubs to subset FluxNet data.
Feedback at the end of the call¶
Add a disclaimer collaboration cafes’ hackMDs are public.
Names for breakout rooms.
We should aim to keep to time, once we are used to the format etc.