Embedded PowerShell commands or scripts are one of the most popular malware payloads. For malware that prioritizes stealthiness, such as fileless malware, PowerShell's access to Windows API functions without additional libraries makes it particularly useful for evading detection. Detecting malicious PowerShell scripts and commands is an open challenge for proactive endpoint protection due to three major issues: 1) The malicious commands are usually hidden in a very long script that is beyond the processing limit of typical machine learning models. 2) They are usually mixed with bulky benign scripts. 3) Script obfuscation can easily conceal their potential matching signatures.

we introduce a novel model addressing these challenges. The model incorporates similarity learning, sentence transformer, sliding window method, and stochastic gradient descent (SGD) classifier. We utilize Siamese similarity learning to enhance a pre-trained natural language model's ability to capture deviations in malicious scripts. This improves robustness against Out-of-Vocabulary tokens due to unseen code obfuscation methods. We employ sliding windows to segment lengthy scripts into short contextual pieces. Coupled with an SGD classifier, the model efficiently identifies potential malicious fragments. Achieving accuracies of 99.01%, 97.59%, 98.70%, and 99.73% on malicious scripts, mixed malicious scripts, and mixed scripts using two obfuscation techniques, respectively, our proposed model outperforms the existing state-of-the-art by over 30 percent in all aspects. Without a commonly assumed de-obfuscation step to reverse the effect of obfuscation techniques before detection, our proposed method has been proven to be more efficient.