Skip to content

CI: Move CircleCI job to Azure pipelines #23821

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
datapythonista opened this issue Nov 20, 2018 · 16 comments
Closed

CI: Move CircleCI job to Azure pipelines #23821

datapythonista opened this issue Nov 20, 2018 · 16 comments
Labels
CI Continuous Integration good first issue

Comments

@datapythonista
Copy link
Member

We've been moving our CI from other systems to Travis and Azure Pipelines, which makes things simpler, as we don't need to maintain several CI configuration files, and check for CI errors and logs in different environments.

We have almost everything in those two systems now, except one last job that is in CircleCI. This job requires PostgreSQL and MySQL databases, which hasn't been configured yet in Azure Pipelines. So, some research on how to set up them will be required.

In this issue we should create a new job in Azure Pipelines ci/azure/linux.yml equivalent to the one in CircleCI, and remove all CircleCI files .circleci.

CC: @TomAugspurger

@alexander-ponomaroff
Copy link
Contributor

I would like to work on this.

@datapythonista
Copy link
Member Author

That would be great, thanks!

@alexander-ponomaroff
Copy link
Contributor

Hey @datapythonista , I’ve been doing some research needed for this issue and I would like to ask for some tips. From what I understand in .circleci, PostgreSQL and MySQL databases are configured through docker. When I was looking into doing this in Azure Pipelines, I found documentation on how to set up MySQL and PostgreSQL natively on linux/windows/mac Azure virtual machine. Also I found a way to build images with Docker, like it is done in .circleci folder, however I need to login using id and password by running “docker login” command. Source: https://docs.microsoft.com/en-us/azure/devops/pipelines/languages/docker?view=vsts&tabs=yaml.

Sorry if I’m misunderstanding the task, this is my first time working with CI and Docker, but I would like to learn and explore this. I would be grateful for any tips you have.

@datapythonista
Copy link
Member Author

Unfortunately I'm not an expert myself in Azure Pipelines or docker. Setting the db servers natively sounds good unless it's slow. If you want to explore that option, that would be great. I'd check the docker option if that one is too complex or too slow. But if you want to check it, we can see that login you need.

@alexander-ponomaroff
Copy link
Contributor

@datapythonista I will try to explore both options and update you on what I decided to use. If you know somebody that may be able to give some tips on this. Please tag them if possible.

@datapythonista
Copy link
Member Author

I haven't work with db tests, but to provide some context to the best of my knowledge, in pandas/tests/io/test_sql.py there are some tests that connect to the different database flavors, and that are skipped if the driver modules (e.g. pymysql, pshycopg2) are not installed.

So, they will just run in some of our builds (one of them being the one in CircleCI). I was just checking for pymysql and I see that there are actually some builds in Azure that should be running the sql tests:

$ grep "pymysql" ci/deps/*
azure-27-compat.yaml:    - pymysql==0.6.0
azure-36-locale_slow.yaml:  - pymysql
azure-37-locale.yaml:  - pymysql
circle-36-locale.yaml:  - pymysql
travis-27.yaml:  - pymysql=0.6.3
travis-36-slow.yaml:  - pymysql
travis-36.yaml:  - pymysql

I'm not quite sure if there is some other reason for which the tests that do hit the db are being skipped. Or if we already have db servers working in Azure (may be they are provided by default).

@TomAugspurger could you let us know why MySQL and PostgreSQL servers are set up in CircleCI, but not in Azure, if both seem to be running the pandas sql tests?

@alexander-ponomaroff unrelated to that, note that there are some small changes in the CircleCI config in #23866. Shouldn't affect you much, but letting you know, so you are aware of them.

@TomAugspurger
Copy link
Contributor

I'm not quite sure if there is some other reason for which the tests that do hit the db are being skipped.

The tests are skipped when we can't connect to the database pandas_nosetest. I'm not sure exactly where in the class hierarchy that's done though. So I don't think all the sql tests are running on azure.

@TomAugspurger
Copy link
Contributor

If you go to https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=4052&view=ms.vss-test-web.test-result-details and filter the files for e.g. TestPostgreSQLAlchemy, you'll see skips like

<pandas.tests.io.test_sql.TestMySQLAlchemy object at 0x7f2eb6d6fc50> - can't connect to mysql server

@datapythonista
Copy link
Member Author

That's good to know, I didn't see it when checking the code.

What I'd do for now is to move the CircleCI build to Travis, where the db is already set up. And in a separate PR may be move something that doesn't use the db from travis to azure, so travis builds finish faster.

How does this sound?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Nov 26, 2018 via email

@datapythonista
Copy link
Member Author

@alexander-ponomaroff if what I said makes sense to you, can you give it a try? I think it'll make things simpler to have 2 CI systems instead of 3, both for the set up, and when checking the results in a PR.

Related to this PR, I created this issue: #23928

@alexander-ponomaroff
Copy link
Contributor

@datapythonista I will be looking into this today and tomorrow. Thanks.

@alexander-ponomaroff
Copy link
Contributor

alexander-ponomaroff commented Nov 30, 2018

@datapythonista Do I need to move the whole build from CircleCI that happens in install_circle.sh into Travis? Or do I need to move specific tests that happen in CircleCI that don't happen in Travis already? Meaning that I will need to go through all TravisCI files and compare to what's going on in CircleCI.

  - run:
      name: build
      command: |
        ./ci/circle/install_circle.sh
        export PATH="$MINICONDA_DIR/bin:$PATH"
        source activate pandas-dev
        python -c "import pandas; pandas.show_versions();"

This is the CircleCI build that runs the install_circle.sh. Am I moving all of this over to Travis for now?

@datapythonista
Copy link
Member Author

I don't think there would be significant differences. I'm working in other PRs on standardizing the scripts used by the CI, so we can always use the same, but with different parameters. Didn't check the CircleCI ones in detail, but the install_travis is not very different from the setup_conda_environment used in Azure. And the installation for circle looks very similar to install_travis.

So, in my opinion, the end result (not in this PR alone, but after all the cleaning) should be a single installation script used from any CI system. If in some cases we need to install the postgres client, and some others don't, I think the script should have a parameter, or use a environment variable, to be able to set in each call whether we want to install it or not.

In this PR, if you're able to use the travis scripts, with the required modifications, that would be great.

@datapythonista
Copy link
Member Author

Just seen that #23924 has been merged, which may slightly affect you, as it standardize the two testing scripts used in the CI into one.

@datapythonista
Copy link
Member Author

Finally moved to travis in #24449 and rebalanced the number of jobs later in #24460.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration good first issue
Projects
None yet
Development

No branches or pull requests

3 participants