BluetickEbooks

More from Author
Akash

Engineering Lead

9 min read

In today's digital era, eBooks have gained immense popularity as a convenient way to access a vast library of literature. However, merely reading eBooks is no longer the only option available to us. With advancements in natural language processing and generative AI models, we can now delve deeper into eBook contents, analyze them, and even query over them using cutting-edge tools like Langchain. In this blog post, we will explore the fascinating world of eBooks, learn how to analyze their contents, and query specific information using generative AI and Langchain.

In this blog, we will explore how Generative AI can be leveraged to read an ebook and create a custom knowledge based on one of our stories The Monkey's Paw by W. W. Jacobs

The Monkey's Paw is a captivating short story written by W. W. Jacobs. The story revolves around a mystical talisman, the monkey's paw, which grants its owner three wishes. However, as the characters soon discover, every wish comes at a price, leading them down a path of unforeseen and chilling events. With its gripping narrative and thought-provoking moral dilemmas, "The Monkey's Paw" continues to captivate readers with its timeless appeal.

Prerequisite

  • Introduction to BluetickPDF
  • In this blog post, we further expand upon our previous exploration of using generative AI to analyze PDFs and extract knowledge. If you haven't read our previous blog, "PDF Analysis and Querying with Generative AI," we encourage you to check it out for a comprehensive understanding of the topic, where we introduce a new open-source tool called BluetickPDF, which offers advanced capabilities in reading and analyzing PDF documents using generative AI.

    BLOG - PDF Analysis and Querying with Generative AI Blog -

    BluetickPDF code on GitHub

  • Comparing BluetickPDF with Other Popular Tools
  • In our ongoing quest to enhance PDF analysis with generative AI, we conducted a comprehensive comparison of BluetickPDF with two other prominent tools: Humata and ChatPDF

    To learn more about the results and insights from our comparison, we invite you to read our blog post titled
    "The Ultimate PDF Analyzer Showdown: Humata vs. ChatPDF vs. BluetickPDF."

    Let's start analyzing BluetickEBOOKS!

  • Analyzing eBook Contents:
  • To begin our exploration, let's consider an example eBook called "The Monkey's Paw." We start by importing the necessary libraries and loading the eBook file:

    Copy Code
                            
     file_name = "The Monkey's Paw.epub"
    
     import ebooklib
     from ebooklib import epub
    
     book = epub.read_epub(file_name)
     items = list(book.get_items_of_type(ebooklib.ITEM_DOCUMENT))
                            
                        
  • Extracting chapters
  • By accessing the book's items, we can extract individual chapters or sections for further analysis. We collect all the chapters into a list:

    Copy Code
                            
     chapters = []
     for item in book.get_items():
         if item.get_type() == ebooklib.ITEM_DOCUMENT:
             chapters.append(item.get_content())
                            
                        
  • HTML to Text conversion
  • Next, we convert each chapter's HTML content into plain text for easier processing:

    Copy Code
                            
     from bs4 import BeautifulSoup
    
     def chapter_to_str(chapter):
         soup = BeautifulSoup(chapter, 'html.parser')
         text = [para.get_text() for para in soup.find_all('p')]
         return ' '.join(text)
    
     texts = ""
     for c in chapters:
         raw_text = chapter_to_str(c)
         texts += raw_text.replace("\n", "")
                            
                        
  • Splitting the Text:
  • Now we have all the text from the eBook concatenated into a single string, ready for analysis.

    Querying eBook Contents: To enable querying over the eBook's contents, we utilize Langchain, a powerful framework that integrates generative AI models and other tools. Firstly, we split the eBook into smaller documents using Langchain's text splitter:

    Copy Code
                            
     from langchain.text_splitter import CharacterTextSplitter
    
     text_splitter = CharacterTextSplitter(separator=".", chunk_size=2000, chunk_overlap=200, length_function=len)
     pages = text_splitter.create_documents([texts])
    
     num_documents = len(pages)
     print(f"Now our book is split up into {num_documents} documents")
     print(pages[0])
                            
                        
  • Generating Embeddings and Creating a Knowledge Base Index:
  • By splitting the eBook into smaller documents, we can perform more efficient and targeted queries.

    Next, we leverage Langchain's embeddings and vector stores to enable similarity search and question-answering capabilities. We use OpenAI's embeddings and Pinecone as the vector store:

    Copy Code
                            
     from langchain.embeddings import OpenAIEmbeddings
     from langchain.vectorstores import Pinecone
     import pinecone
    
     embeddings = OpenAIEmbeddings(openai_api_key=os.environ.get("OPENAI_API_KEY"))
    
     # Initialize Pinecone
     pinecone.init(api_key=os.environ.get("PINECONE_API_KEY"), environment=os.environ.get("PINECONE_API_ENV"))
     index_name = "the-monkeys-paw"
    
     # Create the index
     docsearch = Pinecone.from_texts([t.page_content for t in pages], embeddings, index_name=index_name)
                            
                        
  • Querying the Knowledge Base:
  • We have now set up a vector store index, allowing us to perform similarity searches and retrieve relevant documents based on queries.

    Finally, we can utilize Langchain's generative AI models to answer questions about the eBook's contents. We employ the ChatOpenAI model for this purpose:

    Copy Code
                            
     from langchain.chat_models import ChatOpenAI
     llm = ChatOpenAI(temperature=0, max_tokens=1000, model_name='gpt-3.5-turbo', openai_api_key=os.environ.get("OPENAI_API_KEY"))
    
     from langchain.chains import RetrievalQA
    
     index_name = "the-monkeys-paw"
     text_field = "text"
     index = pinecone.Index(index_name)
     vectorstore = Pinecone(
         index, embeddings.embed_query, text_field
     )
    
     query = "Who is the author of The Monkey's Paw"
    
     docs = vectorstore.similarity_search(query, k=3)
    
     qa = RetrievalQA.from_chain_type(
         llm=llm,
         chain_type="stuff",
         retriever=vectorstore.as_retriever()
     )
    
     output = qa.run(query)
     print(output)
                            
                        

The author of The Monkey's Paw is W. W. Jacobs.

Here are some of the questions asked to BluetickEBOOK related to The Monkey's Paw

Copy Code
                    
 query = "What is the mokey's paw?"

 docs = vectorstore.similarity_search(query, k=3)

 qa = RetrievalQA.from_chain_type(
     llm=llm,
     chain_type="refine",
     retriever=vectorstore.as_retriever()
 )

 output = qa.run(query)
 print(output)
                    
                

The monkey's paw is a magical object mentioned in the short story "The Monkey's Paw" by W.W. Jacobs. It is introduced by a visitor, Sergeant-Major Morris, who describes it as a dried-up paw of a monkey that has the power to grant three wishes to the person who possesses it. The paw is initially dismissed as nonsense by the family of the protagonist, Mr. White, but they become increasingly curious about it. After Mr. White acquires the paw from Sergeant-Major Morris, he makes a wish for two hundred pounds, and the wish is granted in a way that brings about tragic consequences. The paw is a central element of the story and serves as a cautionary tale about the dangers of greed and the unintended consequences of making wishes.

Copy Code
                    
 query = "Where did  Sergeant-Major Morris find the Monkey's paw"

 docs = vectorstore.similarity_search(query, k=3)

 qa = RetrievalQA.from_chain_type(
     llm=llm,
     chain_type="refine",
     retriever=vectorstore.as_retriever()
 )

 output = qa.run(query)
 print(output)
                    
                

According to the given context, Sergeant-Major Morris obtained the Monkey's paw from an old fakir who put a spell on it, allowing three separate men to each have three wishes from it. The first man had his three wishes, and the third wish was for death. The sergeant-major obtained the paw after that wish was granted. He had considered selling it but decided against it due to the mischief it had already caused. He ultimately threw it into the fire. However, there is no information about where he found the Monkey's paw.

Copy Code
                    
 query = "What is the moral of the story?"

 docs = vectorstore.similarity_search(query, k=3)

 qa = RetrievalQA.from_chain_type(
     llm=llm,
     chain_type="stuff",
     retriever=vectorstore.as_retriever()
 )
 
 output = qa.run(query)
 print(output)
                    
                

The moral of the story is that one should be careful what they wish for, as the consequences of their wishes may not be what they expect and can lead to unforeseen and tragic outcomes.

Copy Code
                    
 query = "Provide the summary of this story"

 docs = vectorstore.similarity_search(query, k=20)

 qa = RetrievalQA.from_chain_type(
     llm=llm_new,
     chain_type="refine",
     retriever=vectorstore.as_retriever()
 )

 output = qa.run(query)
 print(output)
                    
                

The story is about an old couple who come into possession of a monkey's paw that grants three wishes. The first wish brings tragedy upon them, and they bury their son. Later, a visitor comes to their home to inform them that their son was caught in machinery and has died. The old woman becomes obsessed with using the paw to bring him back to life and eventually convinces her husband to use the second wish. However, their decision leads to a terrifying consequence, and they realize that they should have left their son to rest in peace. In the end, the old man finds the monkey's paw and frantically makes his third and final wish just as their son, who has become a terrifying corpse, knocks on the door. The knocking stops, and the old couple hears their son's long, loud wail of disappointment and misery. The story highlights the consequences of greed and the dangers of meddling with fate.

Source Code

"Generative AI and eBooks: Where fiction meets friction and imagination gets algorithmically adventurous!"

Written by a human and Assisted by ChatGPT

Back To Blogs


Find out our capabilities to match your requirements

contact us