Skip to content

Temporary Memory Leak with Async Iterators in For Await X of Y #30298

Closed
@ulfgebhardt

Description

@ulfgebhardt
  • Version: v12.13.0
  • Platform: Linux 5.3.7-arch1-2-ARCH x86_64 GNU/Linux and also Linux 4.4.0-109-generic #132-Ubuntu x86_64 GNU/Linux
  • Subsystem:

In Short

When using an AsyncIterator the Memory is rising drastically. It drops once the Iteration is done.

The x in `for await (x of y) is not freed till the Iteration is done. Also every Promise awaited inside the for-loop is not freed.

I came to the conclusion that the Garbage Collector cannot catch the contents of Iteration, since the Promises generated by the AsyncIterator will only fully resolve once the Iteration is done.
I think this might be a Bug.

Repro & Background

When using AsyncIterator i have a substential memory leak when used in for-x-of-y

I need this when scraping a HTML-Page which includes the information about the next HTML-Page to be scraped:

  1. Scrap Data
  2. Evaluate Data
  3. Scrape Next Data

The async Part is needed since axios is used to obtain the HTML

Here is a repro, which allows to see the memory rising von ~4MB to ~25MB at the end of the script. The memory is not freed till the program terminates.

const scraper = async ():Promise<void> => {
    let browser = new BrowserTest();
    let parser = new ParserTest();

    for await (const data of browser){
        console.log(await parser.parse(data))
    }
}

class BrowserTest {
    private i: number = 0;

    public async next(): Promise<IteratorResult<string>> {
        this.i += 1;
        return {
            done: this.i > 1000,
            value: 'peter '.repeat(this.i)
        }
    }

    [Symbol.asyncIterator](): AsyncIterator<string> {
        return this;
    }
}

class ParserTest {
    public async parse(data: string): Promise<string[]> {
        return data.split(' ');
    }
}

scraper()

It looks like that the data of the for-await-x-of-y is dangling in memory. The callstack gets huge aswell.

In the repro the Problem could still be handled. But for my actual code a whole HTML-Page stays in memory which is ~250kb each call.

In this screenshot you can see the heap memory on the first iteration compared to the heap memory after the last iteration

Cannot post inline Screenshots yet

The expected workflow would be the following:

  • Obtain Data
  • Process Data
  • Extract Info for the next "Obtain Data"
  • Free all Memory from the last "Obtain Data"
  • Use extracted information to restart the loop with new Data obtained.

I am unsure an AsyncIterator is the right choice here to archive what is needed.

Workaround Repro

As workaround to free the contents of the Parser we encapsulate the Result in a lightweight Container. We then free the contents, so only the Container itself remains in Memory.
The data Object cannot be freed even if you use the same technic to encapsulate it - so it seems to be the case when debugging at least.

const scraper = async ():Promise<void> => {
    let browser = new BrowserTest();
    
    for await (const data of browser){
        let parser = new ParserTest();
        let result = await parser.parse(data);
        console.log(result);
        
        /**
         * This avoids memory leaks, due to a garbage collector bug
         * of async iterators in js
         */
        result.free();
    }
}

class BrowserTest {
    private i: number = 0;
    private value: string = "";

    public async next(): Promise<IteratorResult<string>> {
        this.i += 1;
        this.value = 'peter '.repeat(this.i);
        return {
            done: this.i > 1000,
            value: this.value
        }
    }

    public [Symbol.asyncIterator](): AsyncIterator<string> {
        return this;
    }
}

/**
 * Result class for wrapping the result of the parser.
 */
class Result {
    private result: string[] = [];

    constructor(result: string[]){
        this.setResult(result);
    }

    public setResult(result: string[]) {
        this.result = result;
    }

    public getResult(): string[] {
        return this.result;
    }

    public free(): void {
        delete this.result;
    }
}

class ParserTest {
    public async parse(data: string): Promise<Result>{
        let result = data.split(' ');
        return new Result(result);
    }
}

scraper())

Workaround in actual context

What is not shown in the Repro-Solution is that we also try to free the Result of the Iteration itself. This seems not to have any effect tho(?).

public static async scrape<D,M>(scraper: IScraper<D,M>, callback: (data: DataPackage<Object,Object> | null) => Promise<void>) {
        let browser = scraper.getBrowser();
        let parser = scraper.getParser();

        for await (const parserFragment of browser) {
            const fragment = await parserFragment;
            const json = await parser.parse(fragment);
            await callback(json);
            json.free();
            fragment.free();
        }
    }

See: https://github.com/demokratie-live/scapacra/blob/master/src/Scraper.ts
To test with an actual Application: https://github.com/demokratie-live/scapacra-bt (yarn dev ConferenceWeekDetail)

References

Conclusion

We are not sure if the whole issue is a bug or a feature. Any clarifying comment would be appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    streamIssues and PRs related to the stream subsystem.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions