Skip to content

Conversation

@nicoschuldt
Copy link

Summary

Adds google_satellite_embeddings() function to enable downloading Google's AlphaEarth Foundations satellite embedding dataset with UrbanPy's workflow.

What's Added

  • New function: up.download.google_satellite_embeddings()
  • Notebook: notebooks/AlphaEarth Embeddings.ipynb, Demonstrate how to use new method and work with embeddings for similarity search and clustering.

Reviewer: @Claudio9701

Copy link
Collaborator

@Claudio9701 Claudio9701 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @nicoschuldt! Amazing work! super clean and this is a super useful addition to UrbanPy. The notebook is perfect and is a great starting point for beginners that already puts them ahead of multiple hours tutorials on qgis and similar. Just made a few comments that could improve the readability and consistency of the codebase. Let me know what you think and feel free to ask me to implement this changes myself if needed :)


if filtered.size().getInfo() == 0:
warn(f"No satellite data found for year {year}. Returning GeoDataFrame with NaN values.")
return _add_embedding_nan_columns(gdf, bands)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also return the original GDF without new columns. Returning columns with nans might be tricky to catch for beginners

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, will remove that part

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for downloading Google's AlphaEarth Foundations satellite embedding dataset through a new google_satellite_embeddings() function. The function extracts 64-dimensional learned representations that encode temporal trajectories of surface conditions from satellite data.

  • Adds new google_satellite_embeddings() function with comprehensive parameter validation and error handling
  • Implements Earth Engine integration with optional dependency handling and authentication requirements
  • Provides utility functions for converting between GeoDataFrames and Earth Engine FeatureCollections

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

)

if len(gdf) > 5000:
raise ValueError(f"Too many geometries ({len(gdf)}). Maximum 5,000 supported. Try batch processing.")
Copy link

Copilot AI Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The magic number 5000 should be defined as a named constant to improve maintainability and make it configurable if needed.

Suggested change
raise ValueError(f"Too many geometries ({len(gdf)}). Maximum 5,000 supported. Try batch processing.")
if len(gdf) > MAX_GEOMETRIES:
raise ValueError(f"Too many geometries ({len(gdf)}). Maximum {MAX_GEOMETRIES:,} supported. Try batch processing.")

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <[email protected]>

filtered = collection.filterDate(start_date, end_date).filterBounds(roi)

if filtered.size().getInfo() == 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the docs, .size() should return an integer with the number of features. Could we remove .getInfo()? Should we do .getInfo() after we know the size is bigger than 0?

if filtered.size() == 0

Docs:

Copy link
Author

@nicoschuldt nicoschuldt Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs seems misleading, .size() does not return a python Int, but rather a server-side EE.Number, which we can only access using getInfo(). If you look at the python example they use size().getInfo().

Fetching values from server-side objects requires a call to getInfo(). In the JavaScript Code Editor, getInfo() is automatically called when printing a server-side object. In the Python client library, you must explicitly call getInfo() when printing server-side objects

You can see more here

However, we probably can handle this better by checking if it's empty downstream in join_ee_embedding_results, avoid the unnecessary call and still handle the edge case. What do you think?

@nicoschuldt
Copy link
Author

Thanks for the clear feedback @Claudio9701 ! Taking a look now :)

- Return original GDF unchanged when no data found (instead of NaN columns)
- Use pandas.merge() for cleaner result joining vs manual index mapping
- COmpressed syntax for GeoDataFrame to FeatureCollection conversion
- Improved error handling
- Improve documentation with clearer GEE limitation explanations
- Add constants for better maintainability
@nicoschuldt
Copy link
Author

@Claudio9701 just commited updates

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants