sms_df = pd.read_csv(DATA_DIR + "spam.csv", encoding="latin-1")
sms_df = sms_df.drop(columns = ["Unnamed: 2", "Unnamed: 3", "Unnamed: 4"])
sms_df = sms_df.rename(columns={"v1": "target", "v2": "sms"})
train_df, test_df = train_test_split(sms_df, test_size=0.10, random_state=42)
X_train, y_train = train_df["sms"], train_df["target"]
X_test, y_test = test_df["sms"], test_df["target"]
train_df.head(4)
target | sms | |
---|---|---|
3130 | spam | LookAtMe!: Thanks for your purchase of a video... |
106 | ham | Aight, I'll hit you up when I get some cash |
4697 | ham | Don no da:)whats you plan? |
856 | ham | Going to take your babe out ? |