Joinly — job boards’ aggregator post-mortem

Albert Pałka
6 min readMar 10, 2020

--

Imagine having all popular job boards’ offers from your country aggregated and displayed in one place. That’s what I did with Joinly. A small project which collected job offers from IT job boards in Poland, and allowed people to search through them based on the city and the title (i.e. JavaScript Developer).

When looking for a job, you’re browsing through various job boards to find and apply to a company you want to work for. I did that too. I was browsing the most popular job boards in Poland to find a job as a Project Manager. Unfortunately, it took time. Also, it was super hard for me to browse job offers by specific skills like Scrum, Kanban or tech stack.

I decided to build such a tool myself. But because I am a noobie developer and not a seasoned professional, I didn’t know what I’m getting myself into.

tl;dr version (for more info continue reading past this section)

  1. Foreign Keys and Not Null Constraints are super important when handling data
  2. module is an extremely useful concept when handling multiple services within your app
  3. jsonb is a fast and great data type to store parsed data in your tables
  4. Blindly following SOLID and DRY may not give you the results you expected (especially if you think you understand, but you really don’t :D )

Useful Links:

  1. https://github.com/Casecommons/pg_search
  2. https://github.com/mislav/will_paginate
  3. https://github.com/ddnexus/pagy
  4. https://thoughtbot.com/blog/referential-integrity-with-foreign-keys
  5. https://thoughtbot.com/blog/avoid-the-threestate-boolean-problem
  6. https://nandovieira.com/using-postgresql-and-jsonb-with-ruby-on-rails
  7. My project on GitHub: https://github.com/albertpalka/joinit-project

Problem 1: Data Aggregation, Parsing, and Normalization

Even though I selected three main job boards to scrape I wasn’t thinking much about storing and parsing data. Which became my number one issue.

At first, I thought creating one simple table to display specific jobs’ information will be enough. But, what about archiving my data for future analysis? Or what if something goes wrong during the parsing phase and my whole process dies?

Also, all three services have different data structures; two of them had JSON API endpoints, and one I had to manually scrape using Nokogiri. On top of that, they had inconsistent naming, data types, and displayed values. This sample app became a nightmare.

Problem solution

After discussing the issue with my friend, an extremely talented Senior Engineer, I decided to go with the following approach:

  • Split my code into Modules I called: JustJoinIt, NoFluffJobs, BulldogJobs
  • My services AND tables were put inside the corresponding modules like this:
module JustJoinIt
class RawDatum < ApplicationRecord
...
end
end
  • To make my life easier, I created a prefix for each table to shorten just_join_it_raw_datum to JustJoinIt::RawDatum. Sample prefix file:
module JustJoinIt
def self.table_name_prefix
'just_join_it_'
end
end

Problem 2: Write SOLID and DRY code

At first, I thought because all good developers do it, I should adhere to SOLID and DRY principles. Creating the cleanest classes I could with reusable code, which led to a few days spend on creating code that didn’t work as expected, looked terrible and gave me a headache every time I opened my code editor.

As a beginner dev, I tried applying things I didn’t fully understand in practice. Not a good move.

So I bothered my developer friends again. Here’s what we came up with.

Problem solution

  • We decided that it’s ok to have classes that are not DRY or SOLID. We wanted to finish the prototype as quickly as possible and allow me to finish the project quicker.
  • I still split everything into multiple classes of course, but because I had a clear vision of what I want to achieve, and the platform itself wasn’t that complicated, it was ok to break some of the rules now. We reused some of the methods and general ideas to keep the codebase consistent and not too complex.
  • However, if I decided to add more job board then it’d be a good time to optimize everything I wrote as sometimes finishing the project is much more important than pre-optimizing it for no reason.
  • I ended up with 3 services per module: Fetching Raw Data, Parsing Job Offers, and Normalizing Jobs. However, after making my code repository public another friend suggested that I could just keep it in one class since it wouldn’t be a big one and still readable enough. Lessons learned.

Problem 3: Displaying Normalized Data

I knew from the get-go that I wanted to keep the application as simple as possible when it comes to front-end development. One page and that’s it. Creating a table with data fetched from the database wasn’t a problem.

But I wanted to add something cool to make it better — a working search by skills, not by job titles. That also didn’t go as planned.

At first, based on another suggestion I created a very simple Postgres Full-Text Search. Unfortunately, my Postgres knowledge isn’t good enough to implement such a feature directly.

Also, I realized that data from each job board differs and I can’t easily normalize skills across all 3 pages.

Problem solution

  1. Since I’m working in Rails there’s a gem for everything. pg_search saved my life and was easy to implement. Not a perfect solution, but gems are not bad if used responsibly.
  2. Regarding the data, I decided to skip the skills search. It just wouldn’t work for such a small application and was too big of a feature to do it the right way.
  3. Instead, I just accepted the fact that I will allow users to search by job title and city (two easy searches I wanted to avoid at first). To be completely honest though, it worked!
  4. Because there were around 2500 offers displayed at once, I added will_paginate gem. Why I chose it over the more popular pagy gem? Because it was easier for me to understand how to pass params on each search page:
<%= will_paginate @offers, params: {title: params[:title], city: params[:city]} %>

Problem 4: Data Integrity Issues

At this point, I stored, parsed, normalized, and displayed my data. But on the second day of running my test app, I noticed some parts of the newly created data are missing and in my Heroku logs.

This meant I have inconsistent data in my database. And since Joinly is a data aggregator, I couldn’t allow this to happen. So I started asking questions again.

Problem solution

  1. At first, I thought to myself that if I caught an exception that’d solve the problem (rescue being my biggest friend here). Even though this solved my problem of rake task not finishing I still had to tackle data inconsistencies.
  2. This is where I learned A LOT about foreign_keys, not null constraintsand transactions.
  3. Foreign_keys are a concept thoroughly reviewed by the team at Thoughtbot: https://thoughtbot.com/blog/referential-integrity-with-foreign-keys
  4. Not Null Constraints are also covered in one of Thoughtbot’s blog posts: https://thoughtbot.com/blog/avoid-the-threestate-boolean-problem
  5. Transactions are fairly easy for me to explain. Basically, these are atomic operations on the database. If something fails during a process wrapped in a transaction, a table is rollbacked to its state before the operation happening. Here’s a sample directly from my codebase:
RAW_DATA_MODEL.parsed_offers.transaction do
job_offers.each do |offer|
parsed_offer = JSON.parse(offer)
RAW_DATA_MODEL.parsed_offers.create!(body: parsed_offer)
end
end

--

--