PROJECT: Using Trifacta to Streamline Data Pre-Processing
In this project, I took the raw text from a list of internships I had applied to and used Python to convert it into an Excel file. I leveraged libraries like pandas to clean up the data and structure it into columns—such as company name, position, and application date. Once I had the basic data set up, I imported it into Trifacta for further cleaning and organizing. Using Trifacta's features, I was able to refine the layout, apply filters, and ensure everything was in a consistent format. The end result was a much more organized and usable Excel file that helped me keep track of my applications more efficiently.

Initial spreadsheet format
The project could have been streamlined even further by starting directly in Trifacta, bypassing the need for Python altogether. Instead of using Python to convert the raw text to an Excel file, I could have imported the text directly into Trifacta and used its built-in tools to clean and structure the data right from the start. This would have saved time and simplified the workflow.
Additionally, I could have taken the process a step further by connecting Trifacta to another cloud-based platform like data.world. By doing this, I could leverage SQL (or Python) to make more complex changes to the dataset. For example, I could have used SQL to convert the string data in the 'Date Applied' column into datetime objects, using a function like strftime() or .to_datetime() to standardize the date format. This approach would have provided more flexibility and power for handling and transforming the data efficiently within the cloud-based ecosystem.

The goal of this project was to familiarize myself with the various AI tools available, while simultaneously demonstrating my skills in Python. I wanted to explore how these tools could streamline tasks and enhance my data science workflow. I first discovered Trifacta after a quick 11-minute conversation with ChatGPT, where I asked for advice on integrating AI tools into my learning journey. ChatGPT recommended platforms, like Trifacta and Paxata, as useful platforms for data wrangling. It also mentioned how Trifacta was popular for its user-friendly interface, which led me to explore it further and formalize my internship tracking. This hands-on experience helped me better understand how AI-powered tools can simplify data processing and improve efficiency.

New spreadsheet format