-
Notifications
You must be signed in to change notification settings - Fork 33
Add Google Satellite Embeddings download support #46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Claudio9701
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @nicoschuldt! Amazing work! super clean and this is a super useful addition to UrbanPy. The notebook is perfect and is a great starting point for beginners that already puts them ahead of multiple hours tutorials on qgis and similar. Just made a few comments that could improve the readability and consistency of the codebase. Let me know what you think and feel free to ask me to implement this changes myself if needed :)
urbanpy/download/download.py
Outdated
|
|
||
| if filtered.size().getInfo() == 0: | ||
| warn(f"No satellite data found for year {year}. Returning GeoDataFrame with NaN values.") | ||
| return _add_embedding_nan_columns(gdf, bands) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also return the original GDF without new columns. Returning columns with nans might be tricky to catch for beginners
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, will remove that part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for downloading Google's AlphaEarth Foundations satellite embedding dataset through a new google_satellite_embeddings() function. The function extracts 64-dimensional learned representations that encode temporal trajectories of surface conditions from satellite data.
- Adds new
google_satellite_embeddings()function with comprehensive parameter validation and error handling - Implements Earth Engine integration with optional dependency handling and authentication requirements
- Provides utility functions for converting between GeoDataFrames and Earth Engine FeatureCollections
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
urbanpy/download/download.py
Outdated
| ) | ||
|
|
||
| if len(gdf) > 5000: | ||
| raise ValueError(f"Too many geometries ({len(gdf)}). Maximum 5,000 supported. Try batch processing.") |
Copilot
AI
Aug 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The magic number 5000 should be defined as a named constant to improve maintainability and make it configurable if needed.
| raise ValueError(f"Too many geometries ({len(gdf)}). Maximum 5,000 supported. Try batch processing.") | |
| if len(gdf) > MAX_GEOMETRIES: | |
| raise ValueError(f"Too many geometries ({len(gdf)}). Maximum {MAX_GEOMETRIES:,} supported. Try batch processing.") |
Co-authored-by: Copilot <[email protected]>
urbanpy/download/download.py
Outdated
|
|
||
| filtered = collection.filterDate(start_date, end_date).filterBounds(roi) | ||
|
|
||
| if filtered.size().getInfo() == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs seems misleading, .size() does not return a python Int, but rather a server-side EE.Number, which we can only access using getInfo(). If you look at the python example they use size().getInfo().
Fetching values from server-side objects requires a call to getInfo(). In the JavaScript Code Editor, getInfo() is automatically called when printing a server-side object. In the Python client library, you must explicitly call getInfo() when printing server-side objects
You can see more here
However, we probably can handle this better by checking if it's empty downstream in join_ee_embedding_results, avoid the unnecessary call and still handle the edge case. What do you think?
Co-authored-by: Copilot <[email protected]>
|
Thanks for the clear feedback @Claudio9701 ! Taking a look now :) |
- Return original GDF unchanged when no data found (instead of NaN columns) - Use pandas.merge() for cleaner result joining vs manual index mapping - COmpressed syntax for GeoDataFrame to FeatureCollection conversion - Improved error handling - Improve documentation with clearer GEE limitation explanations - Add constants for better maintainability
|
@Claudio9701 just commited updates |
…atellite_embeddings
…st should be improved in the future)
Summary
Adds
google_satellite_embeddings()function to enable downloading Google's AlphaEarth Foundations satellite embedding dataset with UrbanPy's workflow.What's Added
up.download.google_satellite_embeddings()notebooks/AlphaEarth Embeddings.ipynb, Demonstrate how to use new method and work with embeddings for similarity search and clustering.Reviewer: @Claudio9701