Howto crawl web.xml with ruby to discover servlet urls for a pentest

Something very boring happening in a web application penetration test is to reach out URLs that are not referenced in other pages.

You may think about APIs or old legacy code that is online for mistake… it’s not so uncommon to miss something in the scanning perimeter since it can’t be reached by your favourite crawler.

A clever solution is to ask developers who eventually write the code. We will ask them for their web.xml file.

(of course this applies for j2ee web applications)

Don’t bother my please

Let’s make some important assumptions:

developers hate you. No matter how proactive you are or how gentle you are in approaching their code vulns, you’re attacking their baby… their website. You’re ugly, they are victims. They are the good ones, you are the villain.
developers don’t have any idea about which servlet is actually published. Consider a very large website with a team of 30 developers. The team can be very dynamic nowadays, so a developer can go to another customer (if your company outsourced the service) or he can be allocated to another project. The result is that a lot of people publish code, no one write documentation so after some months no one can say how many URLs are published or not… true to be told this information is not that much important for a developer
developers don’t have time for you, my dear application security specialist. And true to be told this is true since deadlines are everyday closer. You must think to a developer like your boss… you have a 2 minutes slot, fine tune your question and make sure to hit the problem. You’ll have another chance next week… maybe.
developers are forced to maintain a web.xml file for each j2ee project with the published servlet list. The file has an hard to read format, XML is annoying but developers must keep this file updated.

** bingo! **

The information you need are stored in that file. Just ask developers (or even sysadmins) to give you a fresh copy.

In order to apply a full application security process, you should have read only access to their source code repository, so you can be autonomous in grabbing the information you need.

The web.xml file

I don’t want to go deep into web.xml file details. IMHO its structure can be improved so far. https://armoredcode.com It’s important for us the binding between the source code class com.armoredcode.MyServlet and the URL http://yourtarget/my_servlet.do.

Thank you to J2EE naming convention, you know also that the class source file is stored in a MyServlet.java source file in a com/armoredcode directory tree placed somewhere.

``` xml a typical web.xml file <?xml version=”1.0” encoding=”ISO-8859-1”?>

a project name

the description

parameter_name

parameter_value

...

MyServlet

com.armoredcode.MyServlet

...

MySerlvlet

/my_servlet.do

Even if there are no page linking to ```http://yourtarget/my_servlet.do``` you
can now be aware to that page, looking at the source code in order to check
parameters and method allowed and start hacking it.

## Our web.xml parser

We will use [xml-simple](http://xml-simple.rubyforge.org) xml parsing library
to do all the dirty job.
When I do pentest and friend, I use an ```hacking``` gemset. If you don't have
one just create it and install at list
[bundler](http://rubygems.org/gems/bundler) gem.

$ rvm gemset create hacking $ rvm use 2.0.0@hacking $ gem install bundler

After this you have to install xml-simple library. Please note that the gem is
named ```xml-simple``` but when you will use it you have to require ```xmlsimple```.
A little bit inconsistent IMHO.

Here it is our script:
``` ruby parsing web.xml to obtain urls associated to servlets
#!/usr/bin/env ruby

require 'xmlsimple'

raise 'missing filename' if ARGV[0].nil?
data = XmlSimple.xml_in(ARGV[0])

data['servlet-mapping'].each do |servlet|
  puts "#{servlet['servlet-name'][0]} - /#{data['display-name'][0]}#{servlet['url-pattern'][0]}"
end

puts "#{data['servlet-mapping'].count} servlets found"

Pretty simple, isn’t it!?! I won’t win the Turing award again for this code… however now I can discover links that may even not linked to any other page and so not to be discovered by a web crawler.

Having old legacy code deployed is far from being an uncommon habit. It happens for a lot of reasons and one of them is simply human mistakes. An attacker can exploit such kinf od mistakes that’s the reason why appsec specialist should address also such kind of scenarios.

Now a bonus feature. Some months ago I wrote about hybrid testing. I recall you what hybrid testing or hybrid analysis is.

By hybrid testing I mean (and a lot of venerable security guys around the world tool) that over a target both static analysis than penetration testing activities has to be performed in order to achieve a good security level.

You can use web.xml stored information to detect the java source class implementing a server, so you can combine both statical than dynamic analysis.

Just modify the script this way: ``` ruby parsing web.xml to obtain source code classes implementing servlets #!/usr/bin/env ruby

require ‘xmlsimple’

raise ‘missing filename’ if ARGV[0].nil? data = XmlSimple.xml_in(ARGV[0])

data[‘servlet’].each do |servlet| puts “#{servlet[‘servlet-name’][0]} - #{servlet[‘servlet-class’][0]}” end

puts “#{data[‘servlet-mapping’].count} servlets found” ```

Off by one

Starting from these two script it’s easy to write a little tool giving you all the information you need. Of course you may want to add error checking and automatic getting or posting in the discovered urls to see how the target application will react.

You may also want to know more about the source code, parsing the java class trying to figure it out parameters it accepts. This it will be very useful in order to know their attack surface.

Of course all of those functionality will be part both of codesake-dusk (the dynamic analysis tool) and codesake-dawn (the static analysis one) in a very near future.

Enjoy it!!!

Image courtesy by Alex Eylar