For the most part our Identify Control files are simply based on number of pages, we don't scrape any data from the PDF. I would think the Identify step should be able to create the necessary dpf/jdf files without even reading the PDF which in some cases could save a lot of time in the workflow on large PDFs, for instance we have a 350000 page PDF that was going to take 10+ hours to "Identify". This is using the newer PDF processing library. On the same note, you could theoretically even just have a job property that it uses instead of a control file all together. Job.PagesPerDocument or something, then just create the dpf using math.