-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
The problem has not disappeared anywhere, 2 years ago and now the same thing. Visually, through the task manager in the past there was an information leak, and the process was constantly increasing in memory, now this is not the case, everything is fine, but the result is the same, after a while everything crashes. I managed to take screenshots at the moment when it all started. The screenshots show my working environment, NOT the TEST that I posted, the test itself is as close and simplified as possible, in the future I will post information on this test.
I'm not doing anything, just taking screenshots.
There are 4 identical programs running on the computer, 4 copies, one of them begins to fail. This happens when the number of epochs is measured in millions. For test you can run only one copy. On a modern processor, the procedure usually takes 4-6 hours, on an old one more than a day.
Memory leak starts, this starts happening quickly, as can be seen in the screenshot

Process, note 3 other processes, usually from 100 to 200 megabytes in size

Logs, there is nothing in them, they are empty, the editor is open

For test simple code, just copy past
TEST CODE
const tf = require('@tensorflow/tfjs-node');
const size = 50
const units = 100
const letsgo = async function(){
const model = tf.sequential();
model.add( tf.layers.dense({ inputShape: [units], units, activation: 'linear', useBias: true }));
model.add( tf.layers.dense({ units, activation: 'linear', useBias: true }));
model.add( tf.layers.dense({ units, activation: 'linear', useBias: true }));
model.compile({ optimizer: tf.train.adam(0.005, 0.9, 0.999), loss: tf.losses.absoluteDifference });
let a = []
let b = []
for (let i = 0; i < size; i++) {
let aa = []
let bb = []
for (let ii = 0; ii < units; ii++) {
aa.push( Math.random() )
bb.push( Math.random() )
}
a.push(aa)
b.push(bb)
}
let xs = tf.tensor2d( a );
let ys = tf.tensor2d( b );
await model.fit(xs, ys, {
epochs: 50000000,
shuffle: false,
verbose: 0,
callbacks:{
onTrainBegin: ()=>{
console.log('start')
},
onTrainEnd: ()=>{
console.log('done')
},
onEpochEnd: async (epoch, logs)=>{
if( epoch % 100000 === 0 )
console.log(epoch, logs.loss)
}
}
})
}
const loop = async function(){
for (let i = 0; i < 1; i++) {
await letsgo()
}
}
loop()
System information
- Windows 11 x64
- node-v19.9.0-x64
- node-v20.15.0-x64
- "@tensorflow/tfjs": "^4.20.0",
- "@tensorflow/tfjs-node": "^4.20.0",
Okey, this is results from test code:
modern PC intel 13700, crash after 4.4 millions epochs
old PC intel 3770, crash after 4.4 millions epochs - windows 10 x64 + nodejs 20.10.0
I can't do the calculations because the program always crashes, and I need many more epochs than here!!! I really hope you fix this, it's a disaster that this bug hasn't been fixed for years!




