This week we met with a department on campus in need of the ability to manipulate and store data from PDF documents. The scenario was that all many forms were downloaded from a secure website, as one large file. These documents needed to be split, read, and sorted based on criteria in the document itself. This was my challenge beginning Thursday afternoon.
I was soon reading into the officially supported solution – the Adobe Javascript Object, or JSO. This is a PDF API that will allow you to write in Javascript, making calls to this object. Within a few hours I was able to step through a 144 page document and read sections of each file, store this information in an array, and move these files to an appropriate directory based on what information they did or did not contain.
I was very impressed with the JSO. If you can write Javascript (which is basically the same syntax as PHP) you can read from a PDF. It is very well documented, and all readers come with the console – just press “Control + J” to bring it up.
Future plans for this script is to connect to Banner, call a PL/SQL procedure and take the data returned, along with the info in the PDF, and move it in into an import table in Recruitment Plus. Once this is complete, we will effectively save this depart a couple of thousand dollars a month, having reduced their overhead.
The funny thing is you hated programing when we were room mates. 😛
LikeLike