You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've done the profiling of the --show options and I have found out how --show can increase performance by two times or more, and sometimes by a hundred times.
Let's take a look into the potfile_remove_parse function in src/potfile.c:
Hashcat tries to parse a line via module_hash_decode or a similar function.
Hashcat repeats the algorithm.
I've found that more than half of the hashcat’s operating time can be spent on memsetting buffers!
This time depends on the algorithm. Let's take a look into 13400 module, for example:
typedefstructkeepass
{
u32version;
u32algorithm;
/* key-file handling */u32keyfile_len;
u32keyfile[8];
u32final_random_seed[8];
u32transf_random_seed[8];
u32enc_iv[4];
u32contents_hash[8];
/* specific to version 1 */u32contents_len;
u32contents[0x200000];
/* specific to version 2 */u32expected_bytes[8];
} keepass_t;
The length of this structure is more than 2 megabytes, and hashcat memsets it over and over again, even if the line in the potfile doesn't match the hash at all. On my machine, this can go on for an hour.
I did a quick fix, but it's a hack and not a proper solution:
I would suggest that module_hash_decode functions should take raw buffers and do memsets themselves if necessary. Most of the time, these memsets are not even needed.
Also, some of the module_hash_decode functions do memsets before checking the signature. Let's take a look into 13100 module, for example:
If we move all memset calls into module_hash_decode functions, it will be easier to identify and fix slow module_hash_decode functions, such as above, because they will become the bottleneck. Although, I imagine that it's possible that these functions are already optimized by the compiler.
I can do some PRs, but I just would like to highlight the issue to develop a solution that will be good for everyone.
The text was updated successfully, but these errors were encountered:
I've not looked at these specific lines in context yet but I'm worried that these memsets are intentional and for a relatively niche problem that may not be clear from the code alone. I believe many of these 0 memsets in hashcat are related to an issue with GPU memory not being cleared after/between kernel executions which can lead to buffers containing dangerous and unexpected data when they are created over previously filled buffers. To get around this issue many buffers are zero'd out by force before they can be safely used. Now, these buffers appear not the used directly on the GPU as they are seemingly within the potfiile code but with so many buffers/structs being passed back and forth from the host<-> GPU, that doesn't immediately alleviate my concern that these memsets are related to the above or a similar safety issue.
Edit: After looking a bit more closely at some of these, I think I misunderstood the original issue as being related to another set of very slow memsets that cause a similar problem but that are probably only loosely related to these. Guess that's what I get for not reading through the specific indicated code first before starting on a comment. I think there is certainly some time to be saved in moving the memsets around, but for some of these it may be somewhat annoying to do as each module will need to be changed. If it turns out that these are related to the issue I mentioned due to downstream or other consumption of the buffers, it could also further complicate things.
Hello there,
I've done the profiling of the --show options and I have found out how --show can increase performance by two times or more, and sometimes by a hundred times.
Let's take a look into the
potfile_remove_parse
function insrc/potfile.c
:What is going on here is as follows:
I've found that more than half of the hashcat’s operating time can be spent on memsetting buffers!
This time depends on the algorithm. Let's take a look into 13400 module, for example:
The length of this structure is more than 2 megabytes, and hashcat memsets it over and over again, even if the line in the potfile doesn't match the hash at all. On my machine, this can go on for an hour.
I did a quick fix, but it's a hack and not a proper solution:
I would suggest that module_hash_decode functions should take raw buffers and do memsets themselves if necessary. Most of the time, these memsets are not even needed.
Also, some of the module_hash_decode functions do memsets before checking the signature. Let's take a look into 13100 module, for example:
If we move all memset calls into module_hash_decode functions, it will be easier to identify and fix slow module_hash_decode functions, such as above, because they will become the bottleneck. Although, I imagine that it's possible that these functions are already optimized by the compiler.
I can do some PRs, but I just would like to highlight the issue to develop a solution that will be good for everyone.
The text was updated successfully, but these errors were encountered: