Question

The Case for a Native Athena AWS-Pigment Connector

  • 20 May 2024
  • 3 replies
  • 82 views

Badge +1

In our organization, we use S3 to store data as parquet files and work with Athena AWS. Currently, with Pigment's import mechanism based solely on integrated CSV files, we are compelled to establish a process that converts our parquet files into CSV. This isn't a favored method by our (BD&D) Team, to say the least.

 

I can also elaborate on the pros of using the native connector to Athena AWS:

  1. Improved Efficiency: A native connector eliminates the need to convert parquet files to CSV for import into Pigment. This can significantly reduce the time and resources spent on file conversion, leading to improved efficiency in data operations.
  2. Enhanced Data Quality: Using a native connector can reduce the potential for data errors or loss that might occur during the file conversion process.
  3. Streamlined Process: Integration with Athena AWS via a native connector simplifies the data workflow, making getting data from S3 storage into Pigment easier and quicker.
  4. Increased Flexibility: A native connector allows for more flexibility in handling data, as you can directly leverage the querying benefits of Athena AWS and the diverse data types it supports.
  5. Cost Savings: Direct integration with Athena AWS via a native connector bypasses the need to store and manage extra CSV files, potentially resulting in cost savings.

I am interested in understanding if there are other organizations, similar to ours, which use S3 for data storage as parquet files and work with Athena AWS, and feel the need for a native Athena AWS connector to Pigment.

As it stands, Pigment's import methodology is entirely reliant on integrated CSV files.

Do other companies share our belief that introducing a direct Athena AWS connector in Pigment would enhance its functionality and relevance?

Any shared experiences would be appreciated.


3 replies

Userlevel 3
Badge +6

Hi @Tomer.A ,

I suggest you to post this in the Idea (https://community.gopigment.com/ideas) section for the Pigment team to evaluate the need to create the connector and for other to upvote it.
On my side, I’ve never used AWS with Pigment but I would say the more connector we have the better.

Userlevel 6
Badge +12

Hi, 

Thank you Clément!

Indeed, the idea section looks to be the right way to both collect upvotes and other members’ inputs, and raise the idea to our Product Team.

 

Just to understand a bit better, you’re saying that the current connector using CSVs isn’t a solution for you but it still works right?

 

Thanks,

Badge +1

Hi,

We've set up a temporary fix that requires us to set up each integration in GitLab manually. Any time we update a query, it has to be approved by our busy BD&D team, and we have to do some extra work ourselves. For example, a recent change took a whole week to get checked off. Despite these delays, the BD&D team has been very understanding and helpful.

 

As I've said, our BD&D team prefers not to continue with this workaround. They had to come up with new ways of doing things because of it in a very short time. We only thought of this as a quick fix, expecting that the direct connection to Athena would be ready sooner (we understood it would be part of the current Q scope, but the decided not to prioritize it because of the workaround). We want to avoid being stuck using CSV files in S3 for long, and we don't want to invest more resources in this workaround.

 

There are several disadvantages to using CSV files compared to Parquet files, especially in aspects of storage, efficiency, and compatibility.

 

An alternative would be for Pigment to be able to transform Parquet files into CSV format within their platform. This way, we could continue to use the S3 native connector and lose the manual process on our side.

 

Thanks,

Reply