Nothing but solving a real world problem can help boosting a piece of software to evolve.
In those days I’m engaged on a big Java written source code review. I submitted the code onto the commercial tool we use to scan very wide codebases but, since this tool output doesn’t impress me much, I start searching the web for a Java parser I can integrate quickly in my Owasp Orizon tool.
Unsatisfied I started working on the jsp engine for sake rubygem.
When I’m reviewing the code I use this workflow. Using the commercial tool and seeing its internal parser output. Then I can focus with manual reviewing pieces of code that parser thinks they are more prone of being vulnerable.
Manual review involves using ad hoc written script, tools from my toolbox and eventually something from the opensource world. In this phase I double check code review findings submitting to a running copy of the code, some well written pattern attack in order to check if the suspected vulnerable is either a false positive or not.
Starting from the view
My tools collection lacks about a good JSP interpreter/parser. Since I’m writing codesake.com that it will be an application security portal, I firedup vim starting a quick and dirty JSP scanning engine for sake.
In a first raw implementation I used regular expression to check: * the presence of passwords or other suspected sensitive information in the code; * the packages imported by the JSP page. This can be useful to detect third party libraries used by the page; * the values read from the HTTP request * the variables reflected as output to the user
The approach I used? A lightweight TDD session with irb used instead of rspec. Remember I want to cook a rough implementation of Sake::Jsp engine. I’ll add rspec when I’ll merge it in sake gem hopefully with a better implementation.
Deep into the scanner
First of all, after the shebang, I turned on UTF-8 encoding since the source code can have Italian chars inside, I required trimmy a gem I wrote to add a trim method to ruby String just like PHP and rainbow gem to add a touch of color to the output.
1 2 3 4 5 6 7 8 9 | |
The above comment is to force myself in finding a clever implementation since all the small security checks loop in the array full of source code lines, that it brings the complexity to O(n) where n is the number of lines of code.
Another option is to loop once making all the regular expression and have some helper methods giving back a particular result.
I added a list of false positive methods that they can be found as output routine but the value they returned can’t be tampered by the user. The engine behaviour is to use another color to show them, maybe in the future they can be silently ignored.
The suspected secrets list is very prone to false positives since developers around the world can save passwords or other secrets in variables called foo, goofy, aaa, fooled or whatever. This is the main limit of source code analysis: a tool cannot understand a source code, it can only give a probabilistic evaluation. Of course, I don’t expect all people agree with this. Which is your opinion?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | |
I implemented only 2 public methods for Sake::Jsp.
sake_it will run all the tests and it is the business logic of the whole scanning engine and a Sake::Jsp.is_false_positive? that check a variable in the list of false positive values.
1 2 3 4 5 6 7 8 9 10 11 | |
The suspected secrets list is very prone to false positives since developers
around the world can save passwords or other secrets in variables called _foo_,
_goofy_, _aaa_, _fooled_ or whatever. This is the main limit of source code
analysis: a tool cannot understand a source code, it can only give a
probabilistic evaluation.
Here it starts the private implementation of the engine. read_file is not that magic. I used the readlines routine to store all the lines of code in an Array.
1 2 3 4 5 | |
_find_secrets_ method takes a line of code, it splits it in tokens and then it loops for every returned word looking in the SUSPECTED_SECRETS Array. I wont win the Turing Price for this one but eventually it works fair for a first implementation.
1 2 3 4 5 6 7 8 9 10 | |
A typical activity I do every single time I work on a source code review, is drawing a mindmap of the interconnection between classes and source code files. That map helps me in taint propagation and in detecting pieces of code that are not referenced by anyone so to be marked as unmaintained in the final report.
Finding import declaration from a Jsp page it helps me understand the packages the page uses and eventually custom Java packages developed by internal team.
1 2 3 4 5 6 7 8 9 10 | |
The _find_imports_ code can be refined in many ways. The regular expression here doesn’t match a lot of different ways the package import clause can be written. For sure a single space can lead the import not to be detected.
The find_attack_entrypoints it is more interesting. It checks all the possible read from the request. Remember that all the stuff coming from the user can be tampered ando so parameters from the HTTP request that they must be considered as possible attack entrypoints.
A clever implementation it will consider stuff read from a POST and eventually other APIs that it can be used to read data in input (File, Database, …).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | |
The _find_reflected_vars_ is the output scanning part of the engine. It checks all the writing the Jsp page makes as output looking for variables. Of course it doesn’t check for custom written validations, so you have to pay attention in understanding what this routine gives you as output.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | |
Then the runtime bit it will disappear when I’ll merge the engine into sake and codesake.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | |
Off by one
Working on a source code scanning engine is really fun. You will learn a lot about the target programming language and you learn a lot in terms of hacking and source code assessment.
From the failing experience about Owasp Orizon I understood that writing a full featured source code parser is hard and eventually useless.
There are so many variables that can influence the way a developer write a piece of code that trying to understand them is a topic I leave for a sci-fi fiction like minority report.
Source code review is a particular kind of security assessment that involves creatitivy and that it requires a tool to help security specialist to understand where he/she should look instead of telling “I’m pretty sure, you will find a cross site scripting here”.
What do you think about this? Have you got different experience about source code reviews?
Don’t be shy, tell me your story