The Answer is about to arrive! Today, we're going to build a simple Streamlit application using Python and NLTK that pulls text from the job description and your resume. When you click the Match button, magic happens, revealing the percentage of your resume that matches the job description. I'll also let you know how well the match rate was and, if necessary, suggest some key words. Isn't this awesome?
If you're excited and think it sounds cool, let's get started developing our project with Python, Streamlit, NLTK, and Sklearn. Here, you may observe how the completed application functions.
Before directly jumping into the coding, let me give you a high level overview of our project. You can break down entire project into 3 parts as follows
Match Percentage calculation
Keyword Extraction from JD
Streamlit for UI
Lets tackle one by one, first we will import all the necessary libraries to make our jobs easier
import streamlit as st
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import os
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
streamlit
):Streamlit is a Python library used to create web applications for data science and machine learning projects. It simplifies web app development, enabling data professionals to present their work interactively without extensive front-end knowledge.sklearn
):Scikit-learn is a comprehensive machine learning library in Python. It offers various tools for data analysis, model building, and evaluation.here we use sklearn for the followingCountVectorizer (from
sklearn.feature_extraction.text
):Converts text data into numerical vectors, tallying word occurrences, and enabling analysis in machine learning models.
Cosine Similarity (from
sklearn.metrics.pairwise
):Measures the similarity between two numerical vectors, often used to compare the resemblance of text documents orNLTK, or Natural Language Toolkit, is a potent Python package for tasks related to natural language processing (NLP). It offers features for part-of-speech tagging, tokenization, stemming, and more. It is essential for text processing and analysis, facilitating activities like text mining, sentiment analysis, and language interpretation.
NLTK Stopwords: During preprocessing, frequently used terms are eliminated from text by using the stopwords module of NLTK. Eliminating these stopwords improves analysis of textual data by helping to concentrate on the important words.
Word tokenization (NLTK Tokenize): The word tokenization function in NLTK divides text into discrete words. It's essential for breaking up text into manageable chunks, making text analysis easier by enabling the alteration and scrutiny of specific words or phrases.
Thats it about the introduction of modules that we use in our project now we will see about setting up and fixing environmental issues while using these libraries and also how to download nltk stopword packages
nltk.download('punkt')
nltk.download('stopwords')
os.environ['PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION'] = 'python'
If you are confused about terms like tokenizing , stopwords. I will try to explain in short about them in the next paragraph. if you are already comfortable with you can skip the next para.
Tokenizing: Tokenizing refers to the process of breaking down a piece of text into smaller units, usually words or sentences, called tokens. It helps in organizing and analyzing textual data at a more granular level.
Example: Consider the sentence: “The quick brown fox jumps over the lazy dog.” Tokenizing this sentence would result in individual words being extracted as tokens:[“The”, “quick”, “brown”, “fox”, “jumps”, “over”, “the”, “lazy”, “dog”, “.”]
Each word in the sentence becomes a separate token, making it easier to analyze or process.
Stopwords: Stopwords are common words in a language that often don’t carry significant meaning in a specific context and are typically filtered out during text analysis to focus on more meaningful words.
Example: In English, stopwords may include words like “the,” “is,” “at,” “and,” etc. Consider the sentence: “The art of simplicity is a puzzle of complexity.” When stopwords are removed, the sentence focuses on essential words:
Original: “The art of simplicity is a puzzle of complexity.”
After removing stopwords: “art simplicity puzzle complexity.”
Lets go back to our main workflow
we are going to code the function to calculate the match percentage between two texts here
def calculate_match_percentage(text1, text2):
vectorizer = CountVectorizer().fit_transform([text1, text2])
vectors = vectorizer.toarray()
cosine_sim = cosine_similarity(vectors)
match_percentage = cosine_sim[0, 1] * 100 # Convert to percentage
return match_percentage
text1
and text2
, for comparison. where text1 represents Resume text and text2 represents JD text respectively2. Vectorization:
CountVectorizer()
to transform the texts into numerical vectors.text1
has three times "apple" and text2
has one "apple," it's represented as vectors in a multi-dimensional space: [3, 0, ...]
and [1, 0, ...]
.3. Cosine Similarity:
cosine_similarity
.A
and B
is:cosine_sim = (A dot B) / (||A|| * ||B||)
A dot B
is the dot product of vectors A
and B
, and ||A||
and ||B||
are their magnitudes (lengths).4. Match Percentage:
cosine_similarity
.5. Conversion to Percentage:
Using this match Function we can calculate the Match percentage between Resume text and Job Description text. Our next step is to get the Keywords from both texts.
Here our aim is to filter the stopwords and distill the keywords from the text:
def extract_key_terms(text):
stop_words = set(stopwords.words('english'))
word_tokens = word_tokenize(text.lower())
filtered_words = [w for w in word_tokens if not w in stop_words and w.isalpha()]
return set(filtered_words)
2. Tokenization and Lowercasing:
word_tokenize
.3. Filtering Non-Stopwords:
So far We were done with the logic part of the code now lets create a simple and neat UI using streamlit
Here we are using Streamlit create the frontend part of our project which makes it easier to build instead of using HTML or CSS.
# Streamlit UI
st.title('Resume_JD Scorer')
# Text areas for user input
resume_text = st.text_area("Paste Your Resume Here")
jd_text = st.text_area("Paste Job Description Here")
if st.button('Match'):
if resume_text and jd_text:
# Calculate the match percentage
match_percentage = calculate_match_percentage(resume_text, jd_text)
st.write(f"Match Percentage: {match_percentage:.2f}%")
# Extracting key terms from JD and checking against the resume
jd_terms = extract_key_terms(jd_text)
resume_terms = extract_key_terms(resume_text)
missing_terms = jd_terms - resume_terms
if match_percentage >= 70:
st.success("Good Chances of getting your Resume Shortlisted.")
elif 40 <= match_percentage < 70:
st.warning("Good match but can be improved.")
if missing_terms:
st.info(f"Consider adding these key terms from the job description to your resume:\n {', '.join(missing_terms)}")
elif match_percentage < 40:
st.error("Poor match.")
if missing_terms:
st.info(f"Your resume is missing these key terms from the job description:\n {', '.join(missing_terms)}")
else:
st.warning("Please enter both Resume and Job Description.")
The code above is essentially self-explanatory; to learn more about the attributes of streamlit, go to the documentation for streamlit.
Using `resume_txt} and `jd_txt} as its variables, it intelligently labels input places for users to paste their resume and job description using Streamlit's features.
A "Match" button instantly displays match percentages and initiates comparison calculations. The interface provides users with customized feedback based on this percentage, advising them on how well their resume matches the job description. It's a smooth, user-friendly application that provides instantaneous insights and practical recommendations for optimizing resume content.
You should type the following command into your terminal to execute this on your local machine.
streamlit run file_name.py
As soon as you hit entire you will something like picture 1.0 and the app is up and running you can put this file into github and then deploy using streamlit cloud for making it publically available make sure to add requirments.txt file while deploying.
all the code at once in summary:
import streamlit as st
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import os
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
nltk.download('punkt')
nltk.download('stopwords')
os.environ['PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION'] = 'python'
# Rest of your Streamlit code
# Function to calculate the match percentage between two texts
def calculate_match_percentage(text1, text2):
vectorizer = CountVectorizer().fit_transform([text1, text2])
vectors = vectorizer.toarray()
cosine_sim = cosine_similarity(vectors)
match_percentage = cosine_sim[0, 1] * 100 # Convert to percentage
return match_percentage
def extract_key_terms(text):
stop_words = set(stopwords.words('english'))
word_tokens = word_tokenize(text.lower())
filtered_words = [w for w in word_tokens if not w in stop_words and w.isalpha()]
return set(filtered_words)
# Streamlit UI
st.title('Resume_JD Scorer')
# Text areas for user input
resume_text = st.text_area("Paste Your Resume Here")
jd_text = st.text_area("Paste Job Description Here")
if st.button('Match'):
if resume_text and jd_text:
# Calculate the match percentage
match_percentage = calculate_match_percentage(resume_text, jd_text)
st.write(f"Match Percentage: {match_percentage:.2f}%")
# Extracting key terms from JD and checking against the resume
jd_terms = extract_key_terms(jd_text)
resume_terms = extract_key_terms(resume_text)
missing_terms = jd_terms - resume_terms
if match_percentage >= 70:
st.success("Good Chances of getting your Resume Shortlisted.")
elif 40 <= match_percentage < 70:
st.warning("Good match but can be improved.")
if missing_terms:
st.info(f"Consider adding these key terms from the job description to your resume:\n {', '.join(missing_terms)}")
elif match_percentage < 40:
st.error("Poor match.")
if missing_terms:
st.info(f"Your resume is missing these key terms from the job description:\n {', '.join(missing_terms)}")
else:
st.warning("Please enter both Resume and Job Description.")
I hope you enjoyed building this along with me. if you find something useful consider following on medium and give a clap for the blog
Do follow me for more such content on working with python and its libraries
Thankyou !! happy learning