Overview

XPathFind is a utility that looks through directories and/or ZIP files searching for XML files that match a particular XPath expression. Regular expressions can also be used.

Getting Started

The INSTALL.txt file, provided in the distribution, describes how to get up and running with XPathFind.

Here are some examples of how it can be used.

  • xfind -e xml,zip -r --xpath "/a/b/c" --enc UTF8 resources
    Recursively search through the "resources" directory and all its subdirectories for "*.xml" or "*.zip" files that have a root element "a" that has a child "b" that has a child "c". Open all files with the UTF8 encoding.
  • xfind -x "//b/c[@attr='value']" -n iso8859-1 resources/tests/no-transforms/*.xml
    Look at all XML files in the "resources/tests/no-transforms" directory for one with an element "b" (not necessarily the root) with a child element "c" that has an attribute called "attr" with value "value". Don't recursively search directories (no -r)[1]. Open all files with the "ISO 8859-1" encoding.
  • xfind -t resources/config/transform-config.xml -e edi -r resources/*
          -x "/paores/body/header/travel[@sourceAirport='JFK']"
    Look at all "edi" files in the "resources" directory (recursively) and perform the required transforms from the transform-config.xml file to convert those "edi" files into a set of JDOM documents. Then look for a root element "body" that has a child "header" with a child "travel" that has an attribute "sourceAirport" with a value of "JFK". Then print out the matches found.
  • xfind --regex "<c.*Content" resources/*
    Recursively search[1] the resources directory for files with the default extensions[2]. Find all files that match the regular expression "<c.*Content"[4].
  • xfind -g "(?i)<c.*content" resources/*
    Recursively search[1] the resources directory for files with the default extensions[2]. Find all files that match the JDK 1.4 regular expression "(?i)<c.*content"[4]. The first 4 characters in the regular expression are called an "embedded flag expression", and can only be used at the beginning of a pattern. These characters indicate that a case insensitive match is required.
  • xfind -n utf8 build.xml -g "(?im)project[^$]*$[^$]*cevans[^$]*$[^$]*desc"
    Look at only the file build.xml, loading it with the UTF8 encoding. Look for a multiline regular expression and ignore case. Find a line with "project" followed immediately with a line that contains "cevans" followed immediately with a line that contains "desc".
Important notes:
  1. Using path/* by itself as a filename will do a recursive search, but *.xml or path/*.xml will not.
  2. The configuration file has a list of extensions. File globs which refer to specific filetypes (*.xml) will override these extensions. The feature is provided so that you can just provide a directory name and ask for recursion, and have XPathFind figure it out.
  3. On Windows, either slash or backslash can be used for file paths. On UNIX, only slash is allowed.
  4. Since XPathFind uses JDK 1.4 regular expressions, see a guide dedicated to those. (Sun's guide) (An independent guide)
  5. Regular expressions can be used alongside XPath expressions in the same command line execution, however, which of these matched is not indicated. If you must know which it is that matches, run them separately.
Two tutorials and examples on XPath are at W3Schools and zvon.org. More advanced XPath is discussed at xml.com. Any XPath 1.0 expression is supported, as long as it returns zero or more elements. XPath expressions returning other types will cause XPathFind to fail.