code package¶

Submodules¶

code.browser_output module¶

browser_output.py

code.browser_output.content_formatter(lines)¶

Returns the browser output and opens in the default browser of the system

Parameters: lines – The result file contents line by line
Returns: The browser output in HTML form

code.browser_output.output_formatter()¶: Returns the browser output and opens in the default browser of the system :return: The browser output in HTML form

code.browser_output.result_display(content, wordcloud_image_name)¶

Returns the browser output and opens in the default browser of the system

Parameters

content – The result file contents in html form
wordcloud_image_name – The name for word cloud image

Returns

The browser output in HTML form

code.extract_sizes module¶

File completing step 2: given a pdf document, return a dictionary of headers and paragraphs

code.extract_sizes.extract_words(file: str) → dict¶

Given a filename, opens the PDF and extracts words and metadata from each slide.

Parameters: file – String representing file path
Type: string
Return type: dict
Returns: dictionary representing document metadata and words extracted from each slide

code.extract_sizes.get_sizes(doc: dict) → list¶

Helper function to get unique sizes within a PDF

Parameters: doc – The list of blocks within a PDF
Type: list
Return type: list
Returns: a list of unique font sizes

code.extract_sizes.tag_text(unique_fonts: list, doc: dict) → list¶

Categorizes each text into either Heading or paragraph. Heading includes the top 2 sizes, either title or main heading. Paragraph contains all other sizes

Parameters

unique_fonts (list) – a list of unique fonts in the powerpoint
doc (dict) – a list of blocks per each document page

Return type

list

Returns

a list of dictionaries categorizing each text into its respective category

code.extract_sizes.text_to_groupings(doc: dict) → list¶

Given a pdf document, returns a dictionary of Headers, Paragraphs, and page number

Parameters: doc – a PDF document containing only words
Type: dict
Return type: list
Returns: a dictionary categorizing each text into its respective category

code.google_search module¶

google_search.py

code.google_search.get_people_also_ask_links(search_term: str) → list¶

Given a search term, returns the google People Also Ask links

Parameters: search_term – The query to google
Type: str
Return type: list
Returns: list of links returned by people also ask

code.user_cli module¶

user_cli.py

code.user_cli.generate_wordcloud(data: list, file_name: str) → None¶

Given keywords of a document, display a wordcloud.

Parameters

data – List of cleaned keywords in a document
file_name – The name of the lecture document

Type

list

Type

str

Return type

None

Returns

None

code.user_cli.user_menu()¶: Runner class. Prompts the user for input and returns a txt file of results

code.wordprocessing module¶

wordprocessing.py

code.wordprocessing.construct_search_query(data: list) → list¶

Constructs a search query given a PDF data

Parameters: data – The list of data
Type: list
Returns: List of words to search
Return type: list

code.wordprocessing.duplicate_word_removal(data: list) → list¶

Function to remove duplicate words

Parameters: data – The list of dictionaries of the form
Type: [{“Header”:””, “Header_keywords”: [], “Paragraph_keywords”: [], slides:[int]}]
Returns: The list of dictionaries with duplicate keywords removed of the form
Return type: [{“Header”:””, “Header_keywords”: [], “Paragraph_keywords”: [], slides:[int]}]

code.wordprocessing.extract_noun_chunks(data: list) → list¶

Extracts nouns using Spacy

Parameters: data – list of PDF data
Type: list
Returns: list of data with nouns extracted
Return type: list

code.wordprocessing.keyword_extractor(data: list) → list¶

Function to extract keywords from the headers and paragraphs of slides

Parameters: data – The list of dictionaries of the form
Type: [{“Header”:””, “Paragraph”:””, slide:int}]
Returns: The list of dictionaries with keywords extracted of the form
Return type: [{“Header”:””, “Paragraph”:””, “Header_keywords”: [], “Paragraph_keywords”: [], slide:int}]

code.wordprocessing.merge_slide_with_same_headers(data: list) → list¶

Function to merge slides with the same header.

Parameters: data – The list of dictionaries of the form
Type: [{“Header”:””, “Paragraph”:””, “Header_keywords”: [], “Paragraph_keywords”: [], slide:int}]
Returns: The list of dictionaries where slides containing the same header are merged
Return type: [{“Header”:””, “Header_keywords”: [], “Paragraph_keywords”: [], slides:[int]}]

code.wordprocessing.merge_slide_with_same_slide_number(data: list) → list¶

Function to merge slides with the same slide number into a single one.

Parameters: data – The list of dictionaries of the form
Type: [{“Header”:””, “Paragraph”:””, “Header_keywords”: [], “Paragraph_keywords”: [], slide:int}]
Returns: The list of dictionaries where slides containing the same slide number are merged
Return type: [{“Header”:””, “Header_keywords”: [], “Paragraph_keywords”: [], slide:int}]

code package¶

Submodules¶

code.browser_output module¶

code.extract_sizes module¶

code.google_search module¶

code.user_cli module¶

code.wordprocessing module¶

Module contents¶

Lecture Aid

Navigation

Related Topics