Sebastian Granda (sgg10) and I'm a
Data Engineer Architect
Data
Engineer Architect
I’m aΒ System EngineerΒ fromΒ EAFIT University (MedellΓn, Colombia)π with a passion for buildingΒ efficient and scalable data architecturesΒ βοΈ. I’m also deeply driven by the opportunity toΒ create and contribute to impactful projectsΒ π, turning data into actionable insights π and fostering organizational growth π.
With a career spanning multiple roles, I started as a Fullstack DeveloperΒ π», mastering both frontendΒ andΒ backend technologies, as well as DevOps π οΈ. Over the years, I’ve specialized in Data Engineering ποΈ, eventually transitioning to my current role as a Data Architect ποΈ. My work often involves leveraging cloud solutions, particularly AWS services βοΈ, to design and implement robust, scalable systems.
In addition to my professional work, I have a strong interest inΒ tradingΒ π. I develop tradingΒ botsΒ π€ (Expert Advisors) and custom indicators toΒ automate strategies, manage risk, and enhance profitability π°.
In short, I could say that I like to create and therefore I like to have the necessary skills to create, solve and impact from the point of view or role that is required.
My Skillsπ€π»
Programming Languages
Databases
Cloud Providers
Some technologies, frameworks and tools
Β Β
Β Β
Β Β
Β Β
Β Β
Β Β
Β Β
Β Β
Β Β
Β Β
Β Β
Β Β
Β Β
Β Β
Β Β
Β Β
Β Β
Β Β
Β Β
Β Β
Β Β
Β Β
Β Β
Β Β
Β Β
Β Β
Β Β
AWS Services
Some of my Public Librariesβπ»
In this section I would like to share projects that I have done to solve real problems that I have faced and I have given them a more general approach to suit any scenario, I hope that some of them can be not only attractive but also useful for youπ₯.
DynaFlow: A Dynamic Workflow Execution Tool for Python π
π οΈ The Problem:
In one of my projects, I faced a challenging scenario: processing data through workflows that varied dynamically depending on legal regulations. These workflows involved:
- π Different validation orders.
- ββ Adding or removing steps.
- π’ Specific requirements based on the governmental entities involved.
- π§ New ways to perform the same tasks depending on updated regulations.
Moreover, depending on whether the data was historical or current, a different workflow needed to run. Implementing this was a cumbersome and hard-to-maintain process. π΅βπ«
π‘ The Solution:
Inspired by AWS Step Functions, I created DynaFlow, a Python library that allows you to:
- Define workflows in JSON (based on ASL).
- Provide a custom function catalog.
- Use a search function to dynamically locate the required functions.
This enables flexible and adaptive execution of workflows, simplifying even the most complex processes. β¨
π Key Benefits:
- π§ Reduced operational complexity: A single Docker image can handle multiple workflows.
- π¦ Flexibility: Workflows are stored as JSON in databases, eliminating the need for constant redeployments.
- π Scalability: Adding new capabilities is as simple as updating the function catalog.
π― The Outcome:
With DynaFlow, I optimized process execution in AWS Batch, significantly reducing deployment times and simplifying the management of workflow changes. This tool is perfect for environments with ever-evolving requirements, such as legal regulations or customized business processes.
P.S.
Iβve frequently mentioned a function catalog β essentially a Python dictionary storing functions and their versions. While you could build one manually, it can be a bit tedious.
But what if there was another library that streamlined this process, allowing you to focus solely on programming?
Well, check out my next project, "Function Registry," and see how it can make your life easier! π
π Function Registry: Manage Your Functions Like a Pro! π οΈ
Function Registry was born out of a real need during a challenging data processing project. I faced the issue of executing workflows with different validations depending on the legal framework in effect. Each framework required specific versions of functions, which inevitably complicated name management and created unnecessary confusion. π€―
To solve this, I developed Function Registry, a simple yet powerful library designed to manage multiple versions of functions in a clear and centralized way. π
This tool was created as a complement to DynaFlow, my solution for dynamic workflow execution. While both tools work completely independently, they form a perfect duo: DynaFlow executes workflows, and Function Registry organizes and versions the functions used in those workflows. π€
π What Does Function Registry Do?
- π§© Versioning: Register multiple versions of a function using:
- Sequential versioning (1, 2, 3...).
- Semantic versioning (1.0.0, 1.1.0, 2.0.0...).
- ποΈ Metadata: Each version can include additional information such as author, date, or any relevant data.
- π Advanced Searches: Find functions by name, specific version, or even using custom metadata searches. For example: "Find the version where the author is 'John Doe'."
- π― Easy to Use: Simply decorate your function with
@fr.save_version("name", version)and you're good to go! No more headaches. π
π€ How Did Function Registry Begin?
It all started with the development of DynaFlow. The issue was managing multiple ways to perform the same validation, depending on the legal framework or specific requirements. For example:
β
A function for historical validations.
β
Another for current data.
β
A new, optimized version for future changes.
Manually remembering which function applied to each context was unnecessary chaos. π€― Then came the idea: "Why not create a centralized system to manage functions and their versions, with metadata and custom searches?" And so, Function Registry was bornβa library that solved this problem once and for all. β¨
π Why Use Function Registry?
If your team manages multiple function versions for different workflows or contexts, this library will be your best ally. It saves time and reduces errors by centralizing version management in one place. Plus, you can easily integrate it with tools like DynaFlow to maximize productivity. π₯
π‘ P.S.
If you're interested in DynaFlow, donβt worry about manually creating the function catalog. Use Function Registry and simplify your workflow even further. Itβs all about making things easier and faster! π
π DocScribe: Bringing Your Documentation to Life βοΈπ
Documenting is tedious, but itβs essentialβespecially for roles like architects or team leads who need precise, up-to-date information. That's where DocScribe comes in, a CLI-powered Python library designed to make documentation alive and automated! π
π‘ How Did DocScribe Start?
While working on complex data architectures, I faced a challenge: keeping documentation up-to-date without adding extra manual work. Static documentation always falls behind reality. π€
What if documentation could update itself dynamically based on scripts that connect to APIs, scan repositories, or analyze code? This idea led to DocScribe, a tool designed to turn documentation into a living, automated process.
π§ What is DocScribe?
DocScribe is a CLI tool that allows you to:
- π Initialize a local or external repository for storing documentation templates.
- ποΈ Create templates in Markdown (
.md), Word (.docx), or plain text (.txt). - π€ Collaborate via external S3 repositories, enabling team members to share templates or final documents.
- βοΈ Customize and automate document generation using Python scripts:
- Each template has its
template file(Markdown, Word, or Text), ascript.py(user-defined logic), and aconfig.json(metadata, input parameters, and schema validation).
- Each template has its
π οΈ How Does DocScribe Work?
1οΈβ£ Initialize DocScribe:
Start by initializing DocScribe with a local repository or connecting to external S3 repositories.
2οΈβ£ Create or Import Templates:
Use the CLI to create new templates or import existing ones from external sources. Each document structure consists of:
template.(md|docx|txt)β The file defining the structure.script.pyβ A Python script that generates the JSON data to populate the template.config.jsonβ Defines inputs, schema validation, and dependencies.
3οΈβ£ Write Your Script:
Scripts can do anything: call APIs, scan repositories, or dynamically generate data.
DocScribe ensures scripts meet the schema defined in config.json.
4οΈβ£ Render the Document:
Run the CLI to generate the document:
- Input parameters can be defaulted or customized on the fly.
- Dependencies (like
faker) are auto-installed via your preferred package manager (piporpipenv). - Choose to export locally or to an external S3 repository.
5οΈβ£ Enjoy a Live Document!
DocScribe dynamically populates your template with the output of script.py.
π Key Features
- Dynamic Document Generation: Automate tedious documentation tasks with custom scripts.
- Multi-Format Support: Work with Markdown, Word, or plain text templates.
- Schema Validation: Ensure consistent outputs with JSON schema validation.
- Dependency Management: Auto-install dependencies (
faker,boto3, etc.) for your scripts. - Repository Integration: Store documents locally or in S3 (with more integrations planned).
- Pipeline-Ready: Easily integrate DocScribe into CI/CD pipelines.
π οΈ Example Use Cases
- π Changelog Generation: Automatically generate changelogs for commits in a repository.
- ποΈ Cloud Inventory: Scan AWS Lambda functions and dynamically document runtime, configuration, and names.
- π Security Reports: Use scripts to analyze vulnerabilities in project files and output a report.
- π Data Insights: Generate data summaries or insights and format them into professional reports.
π€ Why DocScribe?
DocScribe bridges the gap between automation and documentation, turning what was once a chore into an effortless process. By empowering users with Python scripting and dynamic templates, DocScribe ensures your documentation is always relevant and alive.
Whether you're generating security audits, cloud inventories, or changelogs, DocScribe adapts to your needs, making it an indispensable tool for architects, developers, and engineers. π