Replies: 5 comments 1 reply
-
Hey there! Thanks for a great tool, excited to watch and listen. I've got one.
I'm referring to "roll my own" as more than adding configurable fields onto something that extends ConfigurableIOManager. This came up as I wanted to write an IOManager to mess with DuckLake. I can add more color here if needed, but can keep that in the Slack chat to reduce noise. |
Beta Was this translation helpful? Give feedback.
-
Hello Dagster team, I'm excited about the office hour tomorrow. Our team is deciding on the tech stacks to build the data infrastructure. Iceberg + Dagster is one of our top candidates. The current Dagster + Iceberg integration is at preview stage according to the docs, the potential breaking changes in the future concerning me a bit. What's the timeline on moving it to a more stable stage? And would you please introduce the access / permission management as well? Thanks a lot! |
Beta Was this translation helpful? Give feedback.
-
Hi there, I’m trying to get a deeper understanding of how Dagster approaches idempotency and fault tolerancy in practice. I know partitions are a big part of the story — they provide referential timestamps and allow reruns/backfills in a deterministic way. But beyond partitions, what are the main built-in patterns Dagster encourages (or even enforces) to make jobs rerunnable and safe (code_version, version control ?). Thanks for any insights — I’m trying to build pipelines that are easy to backfill and re-execute without surprises. |
Beta Was this translation helpful? Give feedback.
-
Hello there, |
Beta Was this translation helpful? Give feedback.
-
One question that came up during the chat was having a partition that differs from your schedule cadence. In these cases you can't use the import dagster as dg
from datetime import timedelta
hourly_partition = dg.HourlyPartitionsDefinition(start_date="2025-09-01-00:00")
@dg.asset(
partitions_def=hourly_partition,
)
def partition_asset(context: dg.AssetExecutionContext) -> dg.MaterializeResult:
partition_key = context.partition_key
return dg.MaterializeResult(metadata={"partition": partition_key})
@dg.schedule(
job=dg.define_asset_job("hourly_partition_job", selection=[partition_asset]),
cron_schedule="*/1 * * * *",
)
def every_minute_partition_schedule(context: dg.ScheduleEvaluationContext):
"""Schedule that runs the partition asset every minute with the appropriate hourly partition."""
scheduled_date = context.scheduled_execution_time
# Use the previous hour for the partition key
previous_hour = scheduled_date - timedelta(hours=1)
# Convert datetime to string format expected by hourly partition (YYYY-MM-DD-HH:00)
partition_key = previous_hour.strftime("%Y-%m-%d-%H:00")
return dg.RunRequest(
partition_key=partition_key,
) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Upcoming office hours
Hey everyone!
We are going to be hosting an event next Friday, September 5th at 2 PM EST / 11 AM PT, to give you all the opportunity to ask questions that we will answer live. Need some help troubleshooting some code? Want to ask about an upcoming feature? Or do you just want to talk about the data engineering landscape? We'll be happy to chat about all of that.
https://dagster.io/events/september-community-office-hours
You can use this thread to provide questions before hand, otherwise you are also welcome to ask them at the time of the event.
The event will be recorded and uploaded to our YouTube channel if you're unable to make it.
Beta Was this translation helpful? Give feedback.
All reactions