Skip to content

Memory leak and crash, now and 2 years ago, tfjs-node #8326

@borodadada

Description

@borodadada

The problem has not disappeared anywhere, 2 years ago and now the same thing. Visually, through the task manager in the past there was an information leak, and the process was constantly increasing in memory, now this is not the case, everything is fine, but the result is the same, after a while everything crashes. I managed to take screenshots at the moment when it all started. The screenshots show my working environment, NOT the TEST that I posted, the test itself is as close and simplified as possible, in the future I will post information on this test.
I'm not doing anything, just taking screenshots.

There are 4 identical programs running on the computer, 4 copies, one of them begins to fail. This happens when the number of epochs is measured in millions. For test you can run only one copy. On a modern processor, the procedure usually takes 4-6 hours, on an old one more than a day.

Memory leak starts, this starts happening quickly, as can be seen in the screenshot
Snipaste_2024-07-06_07-28-22

Process, note 3 other processes, usually from 100 to 200 megabytes in size
Snipaste_2024-07-06_07-28-54

Full memory
Snipaste_2024-07-06_07-30-42

After
Snipaste_2024-07-06_07-30-58

All node js have closed
Snipaste_2024-07-06_07-31-53

Logs, there is nothing in them, they are empty, the editor is open
Snipaste_2024-07-06_07-37-51

For test simple code, just copy past

TEST CODE

const tf = require('@tensorflow/tfjs-node');

const size = 50
const units = 100

const letsgo = async function(){

    const model = tf.sequential();
    model.add( tf.layers.dense({ inputShape: [units], units, activation: 'linear', useBias: true }));
    model.add( tf.layers.dense({ units, activation: 'linear', useBias: true }));
    model.add( tf.layers.dense({ units, activation: 'linear', useBias: true }));
    model.compile({ optimizer: tf.train.adam(0.005, 0.9, 0.999), loss: tf.losses.absoluteDifference });

    let a = []
    let b = []
    for (let i = 0; i < size; i++) {
        let aa = []
        let bb = []
        for (let ii = 0; ii < units; ii++) {
            aa.push( Math.random() )
            bb.push( Math.random() )
        }
        a.push(aa)
        b.push(bb)
    }

    let xs = tf.tensor2d( a );
    let ys = tf.tensor2d( b );

    await model.fit(xs, ys, {
        epochs: 50000000,
        shuffle: false,
        verbose: 0,
        callbacks:{
            onTrainBegin: ()=>{
                console.log('start')
            },
            onTrainEnd: ()=>{
                console.log('done')
            },
            onEpochEnd: async (epoch, logs)=>{
                if( epoch % 100000 === 0 )
                    console.log(epoch, logs.loss)
            }
        }
    })
}

const loop = async function(){
    for (let i = 0; i < 1; i++) {
        await letsgo()
    }
}

loop()

System information

  • Windows 11 x64
  • node-v19.9.0-x64
  • node-v20.15.0-x64
  • "@tensorflow/tfjs": "^4.20.0",
  • "@tensorflow/tfjs-node": "^4.20.0",

Okey, this is results from test code:

modern PC intel 13700, crash after 4.4 millions epochs

Snipaste_2024-07-06_12-38-43

old PC intel 3770, crash after 4.4 millions epochs - windows 10 x64 + nodejs 20.10.0

Snipaste_2024-07-06_16-49-30

I can't do the calculations because the program always crashes, and I need many more epochs than here!!! I really hope you fix this, it's a disaster that this bug hasn't been fixed for years!

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions