code package¶
Submodules¶
code.browser_output module¶
browser_output.py
- code.browser_output.content_formatter(lines)¶
Returns the browser output and opens in the default browser of the system
- Parameters
lines – The result file contents line by line
- Returns
The browser output in HTML form
- code.browser_output.output_formatter()¶
Returns the browser output and opens in the default browser of the system :return: The browser output in HTML form
- code.browser_output.result_display(content, wordcloud_image_name)¶
Returns the browser output and opens in the default browser of the system
- Parameters
content – The result file contents in html form
wordcloud_image_name – The name for word cloud image
- Returns
The browser output in HTML form
code.extract_sizes module¶
File completing step 2: given a pdf document, return a dictionary of headers and paragraphs
- code.extract_sizes.extract_words(file: str) dict ¶
Given a filename, opens the PDF and extracts words and metadata from each slide.
- Parameters
file – String representing file path
- Type
string
- Return type
dict
- Returns
dictionary representing document metadata and words extracted from each slide
- code.extract_sizes.get_sizes(doc: dict) list ¶
Helper function to get unique sizes within a PDF
- Parameters
doc – The list of blocks within a PDF
- Type
list
- Return type
list
- Returns
a list of unique font sizes
- code.extract_sizes.tag_text(unique_fonts: list, doc: dict) list ¶
Categorizes each text into either Heading or paragraph. Heading includes the top 2 sizes, either title or main heading. Paragraph contains all other sizes
- Parameters
unique_fonts (list) – a list of unique fonts in the powerpoint
doc (dict) – a list of blocks per each document page
- Return type
list
- Returns
a list of dictionaries categorizing each text into its respective category
- code.extract_sizes.text_to_groupings(doc: dict) list ¶
Given a pdf document, returns a dictionary of Headers, Paragraphs, and page number
- Parameters
doc – a PDF document containing only words
- Type
dict
- Return type
list
- Returns
a dictionary categorizing each text into its respective category
code.google_search module¶
google_search.py
- code.google_search.get_people_also_ask_links(search_term: str) list ¶
Given a search term, returns the google People Also Ask links
- Parameters
search_term – The query to google
- Type
str
- Return type
list
- Returns
list of links returned by people also ask
code.user_cli module¶
user_cli.py
- code.user_cli.generate_wordcloud(data: list, file_name: str) None ¶
Given keywords of a document, display a wordcloud.
- Parameters
data – List of cleaned keywords in a document
file_name – The name of the lecture document
- Type
list
- Type
str
- Return type
None
- Returns
None
Runner class. Prompts the user for input and returns a txt file of results
code.wordprocessing module¶
wordprocessing.py
- code.wordprocessing.construct_search_query(data: list) list ¶
Constructs a search query given a PDF data
- Parameters
data – The list of data
- Type
list
- Returns
List of words to search
- Return type
list
- code.wordprocessing.duplicate_word_removal(data: list) list ¶
Function to remove duplicate words
- Parameters
data – The list of dictionaries of the form
- Type
[{“Header”:””, “Header_keywords”: [], “Paragraph_keywords”: [], slides:[int]}]
- Returns
The list of dictionaries with duplicate keywords removed of the form
- Return type
[{“Header”:””, “Header_keywords”: [], “Paragraph_keywords”: [], slides:[int]}]
- code.wordprocessing.extract_noun_chunks(data: list) list ¶
Extracts nouns using Spacy
- Parameters
data – list of PDF data
- Type
list
- Returns
list of data with nouns extracted
- Return type
list
- code.wordprocessing.keyword_extractor(data: list) list ¶
Function to extract keywords from the headers and paragraphs of slides
- Parameters
data – The list of dictionaries of the form
- Type
[{“Header”:””, “Paragraph”:””, slide:int}]
- Returns
The list of dictionaries with keywords extracted of the form
- Return type
[{“Header”:””, “Paragraph”:””, “Header_keywords”: [], “Paragraph_keywords”: [], slide:int}]
- code.wordprocessing.merge_slide_with_same_headers(data: list) list ¶
Function to merge slides with the same header.
- Parameters
data – The list of dictionaries of the form
- Type
[{“Header”:””, “Paragraph”:””, “Header_keywords”: [], “Paragraph_keywords”: [], slide:int}]
- Returns
The list of dictionaries where slides containing the same header are merged
- Return type
[{“Header”:””, “Header_keywords”: [], “Paragraph_keywords”: [], slides:[int]}]
- code.wordprocessing.merge_slide_with_same_slide_number(data: list) list ¶
Function to merge slides with the same slide number into a single one.
- Parameters
data – The list of dictionaries of the form
- Type
[{“Header”:””, “Paragraph”:””, “Header_keywords”: [], “Paragraph_keywords”: [], slide:int}]
- Returns
The list of dictionaries where slides containing the same slide number are merged
- Return type
[{“Header”:””, “Header_keywords”: [], “Paragraph_keywords”: [], slide:int}]