Turns out that we were applying the "universal" filter to the content that we were indexing. This meant that Verity would examine the content (or the first few K) to see if it could ID the document type. This takes time and in our case was doing the wrong thing as it would see free form text with the XML declaration in it, pass it off the to XML parser, which would choke and the record would not get indexed. You can see this result by using the new STATUS attribute to assign the indexing status to a variable and dumping that after indexing.
Anyway, here is the fix. Find the file verity/Data/stylesets/ColdFusionK2/style.dft and comment out the /filter = universal line as follows:
# VDK will attempt to autorecognize BODY content if we use this
You can either recreate your collections after making this change, or you can shut down the CFMX Search Service and edit the style.dft file directly in your collections that you use to index custom data.