Office hours live stream [September 5, 2025] #31989

cmpadden · 2025-08-28T18:43:59Z

cmpadden
Aug 28, 2025
Maintainer

Upcoming office hours

Hey everyone!

We are going to be hosting an event next Friday, September 5th at 2 PM EST / 11 AM PT, to give you all the opportunity to ask questions that we will answer live. Need some help troubleshooting some code? Want to ask about an upcoming feature? Or do you just want to talk about the data engineering landscape? We'll be happy to chat about all of that.

https://dagster.io/events/september-community-office-hours

You can use this thread to provide questions before hand, otherwise you are also welcome to ask them at the time of the event.

The event will be recorded and uploaded to our YouTube channel if you're unable to make it.

OrionYeung · 2025-08-28T22:44:07Z

OrionYeung
Aug 28, 2025

Hey there! Thanks for a great tool, excited to watch and listen. I've got one.

What problems should I be running up against before deciding to roll my own IOManagers?
And how should I design my own IOManager?

I'm referring to "roll my own" as more than adding configurable fields onto something that extends ConfigurableIOManager. This came up as I wanted to write an IOManager to mess with DuckLake. I can add more color here if needed, but can keep that in the Slack chat to reduce noise.

0 replies

sylviazang · 2025-09-04T14:41:14Z

sylviazang
Sep 4, 2025

Hello Dagster team, I'm excited about the office hour tomorrow.

Our team is deciding on the tech stacks to build the data infrastructure. Iceberg + Dagster is one of our top candidates. The current Dagster + Iceberg integration is at preview stage according to the docs, the potential breaking changes in the future concerning me a bit. What's the timeline on moving it to a more stable stage?

And would you please introduce the access / permission management as well?

Thanks a lot!

0 replies

sebastienoulevey · 2025-09-04T15:13:16Z

sebastienoulevey
Sep 4, 2025

Hi there,

I’m trying to get a deeper understanding of how Dagster approaches idempotency and fault tolerancy in practice.

I know partitions are a big part of the story — they provide referential timestamps and allow reruns/backfills in a deterministic way. But beyond partitions, what are the main built-in patterns Dagster encourages (or even enforces) to make jobs rerunnable and safe (code_version, version control ?).

Thanks for any insights — I’m trying to build pipelines that are easy to backfill and re-execute without surprises.

0 replies

sebastienoulevey · 2025-09-05T07:51:23Z

sebastienoulevey
Sep 5, 2025

Hello there,
I also have a second question. You might need to forgive my ignorance, but for a team that is not using DBT, is there any interest adding it to the Dagster stack ?I know that DBT is mainly for SQL transformations, but even if my Dagster pipelines are mostly Duckdb transformations, it seems from what I read that DBT's main benefits like code versioning and asset lineage are already core features from Dagster.
I am probably missing the point and would be glad to know Dagster strategies with DBT.
Thanks in advance !

1 reply

cmpadden Sep 5, 2025
Maintainer Author

Hey @sebastienoulevey , we can definitely discuss the benefits of using dbt during the office hours. But in general I do think it's beneficial to adopt a framework like dbt as your project scales, specifically for SQL modelling. It makes it easier to create "models" which are essentially SQL queries that can be used by other models downstream, which leads to benefits in being able to only run models that are needed (similar to asset lineage materializations). In general, it just adds a nice a structure to a project if you have a lot of modelling taking place.

That being said, if you and your team are comfortable with using Python / SQL in your Dagster assets, then there's no requirement to migrate things!

dehume · 2025-09-05T20:10:41Z

dehume
Sep 5, 2025
Maintainer

One question that came up during the chat was having a partition that differs from your schedule cadence. In these cases you can't use the build_schedule_from_partitioned_job function since it will build the schedule base on the partition cadence. To solve this you can create your own schedule that generates the appropriate partition key from the scheduled_execution_time of the schedule context.

import dagster as dg
from datetime import timedelta


hourly_partition = dg.HourlyPartitionsDefinition(start_date="2025-09-01-00:00")


@dg.asset(
    partitions_def=hourly_partition,
)
def partition_asset(context: dg.AssetExecutionContext) -> dg.MaterializeResult:
    partition_key = context.partition_key
    return dg.MaterializeResult(metadata={"partition": partition_key})


@dg.schedule(
    job=dg.define_asset_job("hourly_partition_job", selection=[partition_asset]),
    cron_schedule="*/1 * * * *",
)
def every_minute_partition_schedule(context: dg.ScheduleEvaluationContext):
    """Schedule that runs the partition asset every minute with the appropriate hourly partition."""
    scheduled_date = context.scheduled_execution_time
    
    # Use the previous hour for the partition key
    previous_hour = scheduled_date - timedelta(hours=1)
    
    # Convert datetime to string format expected by hourly partition (YYYY-MM-DD-HH:00)
    partition_key = previous_hour.strftime("%Y-%m-%d-%H:00")
    
    return dg.RunRequest(
        partition_key=partition_key,
    )

0 replies

Office hours live stream [September 5, 2025] #31989

Uh oh!

Uh oh!

cmpadden Aug 28, 2025 Maintainer

Upcoming office hours

Replies: 5 comments · 1 reply

Uh oh!

OrionYeung Aug 28, 2025

Uh oh!

Uh oh!

sylviazang Sep 4, 2025

Uh oh!

Uh oh!

sebastienoulevey Sep 4, 2025

Uh oh!

sebastienoulevey Sep 5, 2025

Uh oh!

cmpadden Sep 5, 2025 Maintainer Author

Uh oh!

dehume Sep 5, 2025 Maintainer

cmpadden
Aug 28, 2025
Maintainer

Replies: 5 comments 1 reply

OrionYeung
Aug 28, 2025

sylviazang
Sep 4, 2025

sebastienoulevey
Sep 4, 2025

sebastienoulevey
Sep 5, 2025

cmpadden Sep 5, 2025
Maintainer Author

dehume
Sep 5, 2025
Maintainer