Jump to content

extracting text


mikey2k

Recommended Posts

Hello guys. I am in need of some help. I have a PDF file that contains 100 tests (they are divided into 3 parts: A, B and C). How can I create another PDF that contains only the A part from each test? I found some tutorials with Python, but the thing is I don't really know how to use it. 
Thank you in advance :) I'm sorry for the grammatical mistakes :) #stillLearning

Link to comment
Share on other sites

PDF contains text and images. Start from making sure your text is really text, not image. e.g. some people scan paper documents and output from scanner (images) are put as is inside of the PDF document. To handle images there is needed OCR. Completely different procedure.

Also text can be in several columns.

Attempt to OCR will result in having couple words from each column mixed each row!

Find some example here and copy and paste it for a start:

https://www.google.com/search?q=python+extract+text+from+pdf

4 hours ago, iNow said:

Can you copy/paste the A parts from each PDF into a single Word document, then Save As and change file type to PDF from Word?

This is what ordinary layman would do. Programmers write scripts which will automatically extract needed data. Manual extraction of data from thousands files would take months or years of work. In some not computerised countries and companies, people still work that way in offices. That's bizarre. And results in waste of human resources, ineffectiveness, inproductivity of company, office or government. Inability to compete with the real world were such job is done by programmers.

Programmers wanting to extract data from documents have different than amount of information, problems like damage of character encodings (it doesn't bother much UK, US, Australia and Canada programmers, but the rest of world indeed), text in scanned images, text in columns, incorrect recognition of the letter by OCR etc. etc.

Edited by Sensei
Link to comment
Share on other sites

  • 7 months later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.