How to avoid importing modules that a job doesn't make use of #28631

CamFromStar · 2025-03-20T10:01:01Z

CamFromStar
Mar 20, 2025

I commonly construct my dagster codespaces with a file structure as follows:

my_project:
  - __init__.py 
  - assets:
    - __init__.py
    - some_asset_group.py
  - resources:
    - __init__.py
    - custom_resource.py
  - jobs:
    - __init__.py
  - [More sub folders continue]

The main init.py contains my definitions object.

Imagine I have created multiple custom resource classes, performing various functions. They each inherit from ConfigurableResource. Some resources make use of pandas, other resources, make use of sk-learn, etc.

At the top of each resource file I import modules I will be making use of in that file.

I then create instances of those resources in the resources/init.py file,, making use of env vars. To do this I obviously import in each [custom_resource].py file. I then import these instances in the main init.py file and add them to my definitions object. The import would look something like this:

from .resources import resource_1, resource_2, resource_3

A very common scenario I have, is that a run only makes use of one or two of the resources. But when executing these runs, all of the resources and their accompanying modules are imported. All of the accompanying imports might take up 300MB of memory, but the ones I need for a run might only be 150MB. In a multi-process executor situation, If I am running 8 processes in parallel, the total memory usage is 300MB * 8 = ~2.4GB vs, if I only import the modules I need, this would be half that. In a scenario where I have a codespace containing many expensive modules, with a degree for potential parallelisation, this creates situations with far too much memory being consumed.

One way I can get around this is creating a separate code repository that hosts assets that make use of more obscure expensive modules, to prevent them from being imported in the more common case. However this doesn't always feel natural. One natural separation would be: one codespace for extract and loading, another for transformation and perhaps another for machine learning. This helps, but for a smaller project, it would be nice to maintain one codespace, and prevent the unnecessary imports from happening.

Is there a way for me to setup my import chain, file structure, etc. To prevent these unnecessary imports of modules in runs that do not use them?

CamFromStar · 2025-04-10T10:26:28Z

CamFromStar
Apr 10, 2025
Author

Any help would be greatly appreciated

0 replies

babaMar · 2025-04-10T13:24:57Z

babaMar
Apr 10, 2025

I've been looking into this lately as well. My first thought was to leverage lazy imports somehow, now I'm still trying to understand how to make this work with Dagster. I guess one key to this problem is to know which actual command Dagster is using when executing the run, which I think depends on the deployment set up used.
One caveat of this approach is that the definitions are all loaded into a single Definitions object, so probably, one would need to subclass that object to make this happen, or find a way to understand the conditions under which the image is being run?

0 replies

nsteins · 2025-07-30T23:05:08Z

nsteins
Jul 30, 2025

I'm interested in understanding this more as well. In particular, I have a repo with a large number of assets generated with an asset factory, which can take a long time to load.

0 replies

babaMar · 2025-08-11T09:40:51Z

babaMar
Aug 11, 2025

I've been looking into this lately as well. My first thought was to leverage lazy imports somehow, now I'm still trying to understand how to make this work with Dagster. I guess one key to this problem is to know which actual command Dagster is using when executing the run, which I think depends on the deployment set up used. One caveat of this approach is that the definitions are all loaded into a single Definitions object, so probably, one would need to subclass that object to make this happen, or find a way to understand the conditions under which the image is being run?

@nsteins After discovering the Definitions.merge method I'm exploring the option of instantiating a single Definitions object per asset or per asset group, merging them for the user-deployment pod, and avoid to do that for the run pod. Haven't tested it yet, but it could work.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to avoid importing modules that a job doesn't make use of #28631

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to avoid importing modules that a job doesn't make use of #28631

Uh oh!

CamFromStar Mar 20, 2025

Replies: 4 comments

Uh oh!

CamFromStar Apr 10, 2025 Author

Uh oh!

Uh oh!

babaMar Apr 10, 2025

Uh oh!

nsteins Jul 30, 2025

Uh oh!

babaMar Aug 11, 2025

CamFromStar
Mar 20, 2025

CamFromStar
Apr 10, 2025
Author

babaMar
Apr 10, 2025

nsteins
Jul 30, 2025

babaMar
Aug 11, 2025