-
Notifications
You must be signed in to change notification settings - Fork 16
require empty db keyspace to run hash #184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| } | ||
|
|
||
| def isDBEmpty(session: Session, mode: String): Boolean = { | ||
| var row = session.execute(s"select count(*) from $keyspace.${tables.docFreq} where id='$mode'").one() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is select count(*) performance ok to run this before every command? Would it improve if it was select count(*) ... limit 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently, it's okay because this table can contain only 2 rows max.
but it's a good point and better to update in case we change it. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I missed that about the table :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
applied
carlosms
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍, left a small suggestion
|
|
||
| if (!gemini.isDBEmpty(cassandra, config.mode)) { | ||
| println("Database keyspace is not empty! Hashing may produce wrong results. " + | ||
| "Please choose another keyspace or pass --replace argument") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "Please choose another keyspace or pass --replace argument") | |
| "Please choose another keyspace or pass the --replace option") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed the first commit with new message
Hashing can't be executed incrementally due to calculation of document frequencies which require full input. this commit checks if hashtables and docfreq tables are empty and gemini exits with error if they are not. it also introduces new flag --replace which would clean up db for current hashing mode. Signed-off-by: Maxim Sukharev <[email protected]>
It allows to pass just `--cassandra` instead of `--cassandra=true` Signed-off-by: Maxim Sukharev <[email protected]>
Signed-off-by: Maxim Sukharev <[email protected]>
Signed-off-by: Maxim Sukharev <[email protected]>
Hashing can't be executed incrementally due to calculation of document
frequencies which require full input.
this commit checks if hashtables and docfreq tables are empty and gemini
exits with error if they are not.
it also introduces new flag --replace which would clean up db for
current hashing mode.
There is also separate commit that changes type of cassandra flag to unit.
It allows to pass just
--cassandrainstead of--cassandra=true(for consistency)
Output when db isn't empty: