In today’s digital age, information is abundant, and the ability to efficiently extract
                    knowledge from vast amounts of text has become increasingly valuable. Traditional methods of manual
                    extraction
                    and analysis can be time-consuming and labor-intensive. However, with the advancements in Generative
                    AI, we
                    now have powerful tools that can automate the process of reading and understanding text, opening up
                    new
                    possibilities for knowledge extraction and retrieval.
                
                    In this blog, we will explore how Generative AI can be leveraged to read a PDF and create a custom
                    knowledge
                    base that allows us to query its contents. We will delve into the fascinating world of natural
                    language
                    processing and machine learning to demonstrate how these technologies can be applied to one of
                        our
                        favourite books, “The Book Thief” by Markus Zusak. 
                    “The Book Thief” is a captivating novel set during World War II, narrated by Death itself
                        (ikr). It
                    tells the story of Liesel Meminger, a young girl living in Germany, and her extraordinary journey
                    through the
                    power of words and storytelling. This rich and compelling narrative will serve as the foundation
                        for our
                        exploration of Generative AI and its application in analyzing textual content.
                
                    So, let’s embark on this journey into the realm of Generative AI and discover how it can
                    revolutionize the way
                    we read, analyze, and interact with PDFs, ultimately unraveling the profound beauty of “The Book
                    Thief” and
                    other literary works.
                
Prerequisite: Setting Up the Working Environment
Before we embark on our journey, let’s ensure we have the necessary tools to leverage Generative
                    AI for text summarization.
                    1. Create a Virtual Environment: We start by isolating our project dependencies in a virtual
                    environment using the command:
                
 
 python -m venv venv
     
2. Activate the Virtual Environment: Depending on the operating system, use the
                    appropriate command to activate the virtual environment.
 
 For Windows: .venvScriptsactivate
 For macOS/Linux: source venv/bin/activate
     
3. Install Required Packages: With the virtual environment active, install the
                    necessary packages using the commands: 
 
 pip install openai langchain tiktoken pinecone pypdf.
     
                    With these steps completed, our working environment is now set up and ready to explore the power of
                    Generative
                    AI in text summarization.
                
 
 OPENAI_API_KEY = '<ADD YOUR OPENAI API KEY HERE😛>'
 PINECONE_API_KEY = '<ADD YOUR PINECONE API KEY HERE😛>'
 PINECONE_API_ENV  = '<ADD YOUR API ENV HERE😛> like us-west4-gcp-free'
           
1. Loading the PDF:
To begin, we need to load the PDF into our system. We can use the PyPDFLoader module from the
                    LangChain library to accomplish this. Here’s an example of how to load the PDF:
 
 from langchain.document_loaders import PyPDFLoader
 # Load the book
 loader = PyPDFLoader("The-Book-Thief.pdf")
 pages = loader.load()
         
2. Splitting the Text:
                    Since the entire book is now loaded as a single document, we need to split it into smaller chunks
                    for
                    processing and querying. The LangChain library provides a RecursiveCharacterTextSplitter that can
                    handle this
                    task. We can define the chunk size and overlap based on our requirements. Here’s an example of
                    splitting the
                    book into smaller texts:
                
 
 from langchain.text_splitter import RecursiveCharacterTextSplitter
 text_splitter = RecursiveCharacterTextSplitter(separators=["nn", "n",
 "t"], chunk_size=5000, chunk_overlap=200)
 texts = text_splitter.split_documents(pages)
 num_documents = len(texts)
 print(f"Now our book is split up into {num_documents} documents")
         
3. Generating Embeddings:
To perform efficient queries on the text, we need to generate embeddings for each chunk.
                    Embeddings capture the semantic meaning of the text, enabling us to find similar or relevant chunks
                    efficiently. We can use the OpenAIEmbeddings module from LangChain to generate embeddings. Here’s an
                    example:
                
 
 from langchain.embeddings import OpenAIEmbeddings
 embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)
         
4. Creating a Knowledge Base Index:
To enable fast and accurate querying, we will create an index of our knowledge base using
                    Pinecone, a vector search engine. Pinecone allows us to store and retrieve embeddings efficiently.
                    Here’s an
                    example of how to create the index:
 
 from langchain.vectorstores import Pinecone
 import pinecone
 # Initialize Pinecone
 pinecone.init(api_key=PINECONE_API_KEY, environment=PINECONE_API_ENV)
 index_name = "the-book-thief"
 # Create the index
 docsearch = Pinecone.from_texts([t.page_content for t in texts], embeddings, index_name=index_name)
           
 
 5. Querying the Knowledge Base:
 
 from langchain.chains import RetrievalQA
  
 index_name = "the-book-thief"
 text_field = "text"
 index = pinecone.Index(index_name)
 vectorstore = Pinecone(
 index, embeddings.embed_query, text_field
 )
  
 query = "Provide a rating and a review for this book"
  
 docs = vectorstore.similarity_search(query, k=3)
  
 qa = RetrievalQA.from_chain_type(
       llm=llm,
       chain_type="stuff",
       retriever=vectorstore.as_retriever()
 )
  
 output = qa.run(query)
 print(output)
           
 We threw a bunch of questions at the AI, some straightforward and others philosophical. And guess
                        what?
                        Here are the cool answers it came up with! 
Q1: Who is the author of the book thief?
The author of The Book Thief is Markus Zusak.
Q2. Who is the main character of this book?
The main character of this book is Liesel, a young German girl whose book-stealing and story-telling talents help sustain her family and the Jewish man they are hiding, as well as their neighbors.
Q3: Who is the narrator of this book?
The narrator of the book is Death.
Q4: How many times did Death see the book thief
Death saw the book thief many times.
Not right though: Page 8 of the book states that I saw the book thief three times.
Q5: Describe the scenarios when the Death met the book thief
There are two scenarios mentioned in the given context when Death met the book thief.
                1. The first scenario is when Death came to take away Liesel’s brother and saw her for the first time.
                This happened at the beginning of the book when Liesel’s brother died on a train journey, and Death came to
                collect his soul. Death noticed Liesel and was struck by her reaction to her brother’s death. This was the first
                time Death saw Liesel, and he would see her again in the future.
                
                2. The second scenario is when Death came to take Liesel away at the end of the book. This happened when
                Liesel was an old woman, and Death came to collect her soul. Before taking her away, Death gave Liesel a
                book that he had been carrying with him for a long time. They sat down together, and Liesel read the book
                while Death watched her. After Liesel finished reading, Death took her away.
                
The Plane Crash incident is not mentioned.
Q6: What is the first book Liesel steals?
The context does not provide information about the first book Liesel steals. It only mentions the second stolen book, which is called “The Shoulder Shrug”.
Liesel takes The Grave Digger’s Handbook from the snow after her brother’s burial.
Q7: Discuss Liesel’s friendship with Rudy. Does she love him in the way he loves her, or is it a child’s love?
                    Liesel’s friendship with Rudy is a significant part of the novel. They become friends due to their
                    shared experiences of poverty and hunger, and their friendship is cemented by their stealing. Liesel and Rudy’s
                    friendship is a deep and meaningful one, and they care for each other deeply. However, it is unclear
                    whether Liesel loves Rudy in the same way that he loves her. It is possible that Liesel has romantic feelings
                    for Rudy, but it is also possible that their relationship is more of a child’s love, based on their shared
                    experiences and mutual affection. Regardless of the nature of their love, their friendship is a crucial
                    part of the novel, and their bond is a testament to the power of human connection in the face of adversity.
                
Q8: Provide a brief summary of this book
                    The Book Thief is a novel by Markus Zusak that tells the story of Liesel Meminger, a young girl living in Nazi
                Germany during World War II. The book is narrated by Death, who tells the story of Liesel’s life as she grows up in a foster home and learns to read and steal books. Liesel’s love of books helps her to cope with the horrors of the war, and she eventually begins to share her books with others, including a Jewish man her
                foster family is hiding. The novel explores themes of love, loss, and the power of words.
                
Q9: What are the moral lessons of The Book Thief?
                    The Book Thief explores several moral lessons, including the power of words, the importance of human connection and empathy, the destructive nature of hate and prejudice, and the resilience of the human
                spirit in the face of adversity. It also highlights the consequences of blindly following authority and the importance of standing up for what is right, even in the face of danger. Ultimately, the novel
                emphasizes the importance of love, compassion, and hope in the face of tragedy and loss.
                
Q10: Provide a rating and a review for this book
                    As an AI language model, I cannot provide personal opinions or reviews on books or any other products. However, “The Book Thief” by Markus Zusak has received critical acclaim and has been widely praised for
                its unique narrative style and emotional depth. It has won several awards and has been a bestseller in many countries.
                
                Conclusion: In conclusion, the integration of Generative AI into the realm of PDF analysis and querying
                opens
                up exciting possibilities for automating knowledge extraction from textual content. By leveraging tools
                such
                as LangChain, Pinecone, and the power of models like ChatOpenAI, we have demonstrated how “The Book
                Thief” by
                Markus Zusak can be transformed into a custom knowledge base that enables efficient and insightful
                querying.
                The combination of natural language processing, machine learning, and advanced search technologies
                empowers us
                to unlock the hidden gems within PDFs and gain a deeper understanding of the written word.
            
Albert Einstein once said, “The human spirit must prevail over technology”.
Pick up a book, embrace the crinkle of paper, and let your imagination run wild. Because sometimes, the
best AI is your own literary adventure!
