[VIDEO] GZIP compress web page’s content and save in MySQL using GORM (Golang) — Kanan Rahimov

Kanan Rahimov
2 min readJan 3, 2021

In the attached video I discuss the following topics:

  1. Save links and images from the webpage.
  2. Mark URL as complete in the pipeline once it is fully parsed.
  3. Refactor: extract text compressor to the separate function (similar to decompressor).
  4. To-do: define a task for the “webpage data parser” worker.

Compress using GZIP

We retrieve the web page content as a text body. Since we expect to have many URLs saved locally, it would be optimal to compress this date; thus, it will take less storage. By my benchmarks, we can see an average of 60–85 % compression level. See the video for examples.

I use gzip ( compress/gzip) to compress the text. In this video, I refactored text compression to a separate function. I then executed the whole pipeline to see if the webpage's data fetched and saved in the compressed version (manual test).

Here is the main gzip function:

func gzipWrite(w io.Writer, respBody []byte) error {
var err error
gz := gzip.NewWriter(w)
if _, err = gz.Write(respBody); err != nil {
return err
}
if err = gz.Close(); err != nil {
return err
}
return nil
}

GORM Models for URLLink and URLImage

In this section, we will introduce two new models: URLLink and URLImage. These tables aim to keep reference information from the given web page.

type URLLink struct {
CreatedAt time.Time
UpdatedAt time.Time
DeletedAt *time.Time `sql:"index"`
URL string `gorm:"index:idx_url;not null"`
LinkURL string `gorm:"not null"`
LinkTitle string
}
func (URLLink) TableName() string {
return TablePrefix + "url_links"
}
type URLImage struct {
CreatedAt time.Time
UpdatedAt time.Time
DeletedAt *time.Time `sql:"index"`
URL string `gorm:"index:idx_url;not null"`
ImageURL string `gorm:"not null"`
ImageTitle string
}
func (URLImage) TableName() string {
return TablePrefix + "url_images"
}

Originally published at https://kananrahimov.com on January 3, 2021.

Kanan Rahimov