Workflow management with Apache Airflow

Some useful resources about Airflow: ETL best practices with Airflow Series of articles about Airflow in production: Part 1 - about usecases and alternatives Part 2 - about alternatives (Luigi and Paitball) Part 3 - key concepts Part 4 - deployment, issues More notes about production About start_time: Why isn’t my task getting scheduled? One cannot specify datetime.now() as start_date. Instead one has to provide datetime object without timezone. Probably UTC time is required....

February 6, 2018 · SergeM

Datetime in Python

Conversion from calendar week to date Sometimes one has to convert a date written as year, calendar week (CW), and day of week to an actual date with month and date. The behaviour in the begin/end of a year may be not straightforward. For example according to ISO 8601 monday date of the CW 1 year 2019 is 31 January 2018. As far as I can see there is no standard function for conversion in python....

January 17, 2018 · SergeM

Flask and SQLAlchemy explained

Awesome explanation of SQLAlchemy with examples and comparison to Django by Armin Ronacher: SQLAlchemy and You Flask-SQLAlchemy module Flask-SQLAlchemy is an extension for Flask that adds support for SQLAlchemy to your application. How to add SQLAlchemy to Flask application: from flask import Flask from flask_sqlalchemy import SQLAlchemy app = Flask(__name__) # configuration of the DB is read from flask configuration storage app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:////tmp/test.db' # here we define db object that keeps track of sql interactions db = SQLAlchemy(app) Now we are ready to define tables and objects using predefined db....

January 11, 2018 · SergeM

Ubuntu/linux settings

Some settings I find useful for a workstation CPU monitoring on the main panel Default Ubuntu desktop seems to become finally convenient enough for me starting from Ubuntu 18.04. Only several tweaks are missing. Constantly available CPU/Mem/HDD/Network monitor is one of them. Here is how to install a small widget for a top panel in the default GNOME desktop environment. sudo apt-get install gir1.2-gtop-2.0 gir1.2-networkmanager-1.0 gir1.2-clutter-1.0 Go to Ubuntu Software and then search for system monitor extension....

January 11, 2018 · SergeM

Pytest cheatsheet

Pytest is a powerful tool for testing in python. Here are some notes about hands-on experience. Running tests in pytest with/without a specified mark Execute pytest -m "integration" if you want to run only tests that have “@pytest.mark.integration” annotation. Similarly you can run only tests that don’t are not marked. pytest -m "not your_mark" That command will test everything that is not marked as “your_mark”. How to verify exception message using pytest One can use context manager pytest....

January 2, 2018 · SergeM

how to break a programmer

How to make an engineer to accept your crazy idea Invite him/her to a 2-6 hours meeting. It may also be several meeting. But within one day. The important meeting has to be the last in the row. Don’t send an agenda beforehand. They might prepare themselves. It is better to be spontaneous. Speak loud. The louder you are, the smarter you seem. Don’t bring any objective proofs or measurements that support your idea....

December 21, 2017 · SergeM

about management

Why your programmers just want to code @justzeros Interesting list of managers’ readmes: 12 “Manager READMEs” from Silicon Valley’s Top Tech Companies Michael Lopp. How to Rands – feels like really similar personality as me I am an introvert and that means that prolonged exposure to humans is exhausting for me. Weird, huh? Meetings with three of us are perfect, three to eight are ok, and more than eight you will find that I am strangely quiet....

December 2, 2017 · SergeM

why does your data science project fail again

why does your data science project fail again Your data scientists aren’t real scientists For example, science differs from non-science by writing down the results. Maybe your data scientists just talks about awesomeness of his new model and doesn’t provide you with written reports and measurements? Then probably you will go in circles of to the dead end. No progress proven == no need for changes at all. You don’t perform reproducible experiments...

October 29, 2017 · SergeM

Sample project with Ember and Flask

I want to use EmberJS with Flask application. Flask will provide an API. Ember frontend will consume and display data from the Flask backend. Let’s say we want our fronted to display a list of users of the system. We will have a main page and users page in our frontend. On the users page the client will see a list of users that we get from backend. Source code is here: ember_flask_example...

October 25, 2017 · SergeM

Speed up make/cmake build with multiprocessing

Build time can be reduced by using multiple cores of your processor. Parallelization for make: make -j8 Parallelization for cmake: cmake --build <bindir> -- -j 8 Sometimes you cannot pass parameters directly to make. For example you are installing a python module using python setup.py install Setup script doesn’t accept parameters. Then you could pass them through environment variables: export MAKEFLAGS="-j 8" see also intro to CMake video, russian.

July 29, 2017 · SergeM