How to Extract Data from Tender Documents in Minutes with AI
The Problem: Time-Consuming Tender Review
Procurement teams and businesses face a major challenge when dealing with large volumes of tender documents—examining hundreds of pages to find crucial information like turnover criteria, bidder qualifications, or specific contract terms. This process is not only time-consuming but also prone to human error, leading to missed opportunities or incorrect bidding decisions. With tight deadlines and increasing competition, you need a faster and more accurate way to extract and analyze essential details from tender documents. This will free you to focus on making strategic decisions rather than getting lost in paperwork.
One of our customers faced exactly this challenge—searching for management consulting tenders with specific turnover requirements hidden deep in lengthy Notice Inviting Tender (NIT) documents. The documents issued by Indian authorities often extend beyond 300 pages, along with other hefty documents like Bills of Quantities (BOQ), specifications, and make lists. This situation meant spending countless hours just to identify and then read through these documents to find tenders that fit the bill.
This is a common problem faced by procurement teams across various industries. So, how can you efficiently extract important information from tender documents? The answer lies in AI-powered tender data extraction.
Our Solution: AI for Tender Data Extraction
At Nexizo, we built a system to tackle this problem head-on. With our extensive database of over 5 crore tenders and the addition of 20,000 tenders daily, we quickly identify the relevant management consulting tenders.
To take it further, we developed an AI solution using advanced search filters and Retrieval Augmentation Generation (RAG) technique. This advanced system is capable of sifting through the vast tender documents and extracting key criteria like turnover requirements automatically.
Fig 1: Pictorial representation of RAG applied to the Turnover and QCBS extraction task. Hat tip - gradient.ai
How We Do It: Breaking Down the Process
Choosing the Right Embedding Model
To create the embeddings and ensure the highest accuracy, we experimented with various models, including OpenAI and Instructor.
Both embeddings were useful except for a subtle difference, i.e., the OpenAI model was generating embeddings that were leading to “hallucination” (possibly because they are very generalistic models) but were able to capture values even from incorrect/ambiguous English. On the other hand, in the Instructor model, we could specify the domain, and it was strict about English, hence it avoided hallucination. At the same time, it was not able to handle poor English.
Given that our target was to have zero false positives, we chose the Instructor model. Then, using the Instructor model, we performed a vector search to retrieve all references to turnover or revenue criteria.
This AI-driven method means businesses no longer have to manually search through pages to find crucial information like revenue criteria or Quality cum Cost Basis score (QCBS).
Optimized Document Chunking
Tenders can come in a variety of formats and from different regions. To capture the correct information, we break these large documents into chunks for more accurate data extraction. We applied two chunking strategies:
- Longer chunks with shorter overlaps to reduce redundant retrieval.
- Shorter chunks with higher overlaps to ensure contextual information wasn’t missed, especially for data near sentence boundaries or in tables.
The vector search enabled by this chunking strategy ensures that every relevant detail is captured, from turnover requirements to detailed terms of the tender.
Prompting Strategy
When searching for specific data like turnover values, we customized our retrieval and prompting strategy to avoid missing key details. We used K-top vector search to find specific information from chunks of text. However, because some chunks were repeated or overlapped, we had to find the right K-top value. If it was too high, GPT-4 could get overloaded with too much information.
Further, when searching for turnover data, we noticed turnovers appeared in different parts of the document. So, we changed our search prompt to ask for turnovers specifically in lakhs or crores. After testing, we settled on a K-top value that gave us accurate results without errors or extra details.
This approach allowed us to extract only the most relevant information from the document.
Reducing Review Time
One of the major benefits of using our AI solution for tender document analysis is the reduction in manual review time.
Some RAG-generated answers lacked references, potentially leading to misleading information. To address this, we modified the pipeline to extract reference texts directly from the documents, avoiding any ambiguity.
Given that QCBS and turnover values had distinct characteristics in documents, with QCBS often found in lengthy sentences and turnovers in concise phrases, we optimized the review process for turnovers.
We extended the RAG pipeline by adding two retrieval layers. First, we retrieved chunks similar to QCBS. Then, we further divided these retrieved chunks into smaller ones and conducted a second retrieval (using the same prompt) to obtain shorter chunks.
Finally, we used GPT-4 to generate the desired turnover values, streamlining the review process.
The Outcome: Accurate, Fast and Efficient Tender Review
Our approach was a hit!
Using our AI-powered tender data extraction solution, we managed to accurately identify the turnover criteria in 89% of the documents we processed, without any error.
The remaining 11%? They were quickly handled with a manual review, thanks to the AI pointing us right to the necessary parts of the documents.
This not only saved a lot of time for our customer but also ensured that they could easily find tenders that matched their requirements.
Further, we applied the same strategy to dig out other important details like the Quality cum Cost Basis score (QCBS), which is crucial for deciding whether to bid on a tender.
What You Can Do with Nexizo’s AI-Powered Tender Data Extraction Solution
You start by telling us your specific requirements—whether it’s finding consulting opportunities in a specific sector, filtering tenders by project type, or identifying certain financial criteria like turnover.
Once we understand your needs, we set up custom filters to target the exact types of tenders or profiles you are looking for.
Next, we scan through our vast database of over 5 crore tenders from 200+ trusted sources, pulling only the documents that match your criteria.
We also help streamline the process by filtering out irrelevant opportunities. For example, if you need consulting tenders related to Detailed Project Reports (DPR) but want to exclude anything related to construction, we will set up negative keywords to automatically remove those entries.
But, we don’t stop there—our AI enriches the extracted data with key details like deadlines, application links, bidder qualifications, BOQ items, and stagewise results. This way, you have all the information at your fingertips.
Finally, we deliver a custom feed of relevant, high-priority opportunities directly to you, ensuring you never miss a valuable tender.
Next Steps: Expanding Our AI Capabilities
We are continuously improving our solution. Our next steps include:
- Training the AI on a wider variety of documents and procurement languages to capture regional variations in tender formats.
- Expanding the scope of data extraction, allowing the AI to pull out more types of information from tender documents, such as contract terms and bidder qualifications.
This way, we can offer a more comprehensive analysis of each tender to our users, simplifying the decision-making process even further. Our approach ensures that businesses and procurement teams can focus on bidding decisions rather than document review.
One thing is for sure, AI is changing the game in tender data extraction by making it easier and faster to find exactly what you are looking for. The best thing we all can do is learn how to leverage it effectively to grow our businesses.