This project is a Python-based solution for extracting text from PDF files, preprocessing the text, vectorizing it using Cohere embeddings, and storing the vectors in Pinecone for further use. PyMuPDF ...
Also this method has trouble with converting certain types of text in PDFs into DXF. It works mainly for polys and other vectors, e.g. drawings that were originally CAD or SVG and saved into PDF.