Orient Me and mongoDB connection failures

I have been banging against a mongoDB wall for a good few days as explained in another post but I’m slowly getting there. The problem I was facing was that the migration application in the people-migrate container wasn’t working.

# npm run start migrate
npm info it worked if it ends with ok
npm info using npm@3.10.8
npm info using node@v6.9.1
npm info lifecycle people-datamigration-service@0.0.1~prestart: people-datamigration-service@0.0.1
npm info lifecycle people-datamigration-service@0.0.1~start: people-datamigration-service@0.0.1

> people-datamigration-service@0.0.1 start /usr/src/app
> cross-env NODE_ENV=production node lib/server.js “migrate”

2017-04-20T13:19:56.761Z – info: [migrator] Mongo DB URL: mongodb://mongo-0.mongo:27017,mongo-1.mongo:27017,mongo-2.mongo:27017/relationshipdb?replicaSet=rs0&readPreference=primaryPreferred&wtimeoutMS=2000
2017-04-20T13:19:56.766Z – info: [migrator] Mongo DB URL: mongodb://mongo-0.mongo:27017,mongo-1.mongo:27017,mongo-2.mongo:27017/datamigrationdb?replicaSet=rs0&readPreference=primaryPreferred&wtimeoutMS=2000
2017-04-20T13:19:56.767Z – info: [migrator] Mongo DB URL: mongodb://mongo-0.mongo:27017,mongo-1.mongo:27017,mongo-2.mongo:27017/profiledb?replicaSet=rs0&readPreference=primaryPreferred&wtimeoutMS=2000
Connection fails: MongoError: failed to connect to server [mongo-0:27017] on first connect [MongoError: getaddrinfo ENOTFOUND mongo-0 mongo-0:27017]
It will be retried for the next request.

/usr/src/app/node_modules/mongodb/lib/mongo_client.js:338
          throw err
          ^
MongoError: failed to connect to server [mongo-0:27017] on first connect [MongoError: getaddrinfo ENOTFOUND mongo-0 mongo-0:27017]
    at Pool.<anonymous> (/usr/src/app/node_modules/mongodb-core/lib/topologies/server.js:327:35)
    at emitOne (events.js:96:13)
    at Pool.emit (events.js:188:7)
    at Connection.<anonymous> (/usr/src/app/node_modules/mongodb-core/lib/connection/pool.js:274:12)
    at Connection.g (events.js:291:16)
    at emitTwo (events.js:106:13)
    at Connection.emit (events.js:191:7)
    at Socket.<anonymous> (/usr/src/app/node_modules/mongodb-core/lib/connection/connection.js:177:49)
    at Socket.g (events.js:291:16)
    at emitOne (events.js:96:13)
    at Socket.emit (events.js:188:7)
    at connectErrorNT (net.js:1020:8)
    at _combinedTickCallback (internal/process/next_tick.js:74:11)
    at process._tickCallback (internal/process/next_tick.js:98:9)

If I specify the location of migrationConfig I get the same result.

# npm run start migrate config:/usr/src/app/migrationConfig

Oddly enough, if I run the above command outside of /usr/src/app/ directory it fails. It doesn’t actually read the file you specify, it always looks for migrationConfig in relation to the working directory where you are when you issue it. Of course I may have the syntax wrong but if I do not then it’s a bit sloppy.

On to the problem which seems to be name resolution. The error I was getting was

Connection fails: MongoError: failed to connect to server [mongo-0:27017] on first connect [MongoError: getaddrinfo ENOTFOUND mongo-0 mongo-0:27017]

It seems to be trying to connect to mongo-0 over 27017.

# kubectl exec -it $(kubectl get pods | grep people-migrate | awk ‘{print $1}’) bash

# ping mongo-0
ping: mongo-0: Name or service not known

# ping mongo
PING mongo.default.svc.cluster.local (10.1.67.163) 56(84) bytes of data.
64 bytes from 10.1.67.163 (10.1.67.163): icmp_seq=1 ttl=63 time=0.063 ms

# ping mongo-0.mongo
PING mongo-0.mongo.default.svc.cluster.local (10.1.67.163) 56(84) bytes of data.
64 bytes from 10.1.67.163 (10.1.67.163): icmp_seq=1 ttl=63 time=0.087 ms

This was the cause, “mongo-0” was not resolving for me and this is confirmed by another that there container works the same. To work around this I added an entry to the container’s host file.

# cat /etc/hosts
# Kubernetes-managed hosts file.
127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
10.1.67.176     people-migrate-4029352936-n8fzl
10.1.67.163     mongo-0 mongo-0.mongo

Now the migration app works but I also have mongo-sidecar errors which I’m not clear on as to whether they are supposed to be there.

Update – 27/04/17

This only gets me so far. This allows me to get the data migrated from Connections Profiles in to MongoDB but when the container is torn down and replaced with another the host file entry is gone. Also, there are the following errors in the logs for itm-services containers that I cannot exec to to update the hosts file.

Connection fails: MongoError: failed to connect to server [mongo-0:27017] on first connect [MongoError: getaddrinfo ENOTFOUND mongo-0 mongo-0:27017]
It will be retried for the next request.

/usr/src/app/node_modules/mongodb/lib/mongo_client.js:338
          throw err
          ^
MongoError: failed to connect to server [mongo-0:27017] on first connect [MongoError: getaddrinfo ENOTFOUND mongo-0 mongo-0:27017]
    at Pool.<anonymous> (/usr/src/app/node_modules/mongodb-core/lib/topologies/server.js:327:35)
    at emitOne (events.js:96:13)
    at Pool.emit (events.js:188:7)
    at Connection.<anonymous> (/usr/src/app/node_modules/mongodb-core/lib/connection/pool.js:274:12)
    at Connection.g (events.js:291:16)
    at emitTwo (events.js:106:13)
    at Connection.emit (events.js:191:7)
    at Socket.<anonymous> (/usr/src/app/node_modules/mongodb-core/lib/connection/connection.js:177:49)
    at Socket.g (events.js:291:16)
    at emitOne (events.js:96:13)
    at Socket.emit (events.js:188:7)
    at connectErrorNT (net.js:1020:8)
    at _combinedTickCallback (internal/process/next_tick.js:74:11)
    at process._tickCallback (internal/process/next_tick.js:98:9)

Update 28/04/17

During the (excellent) Connections Pink Developer Workshop hosted by IBM we were given access to a SoftLayer server running CentOS 7.3 where we installed CfC and Orient Me. The installer worked just fine with no signs of the mongoDB errors above. I have come across two others who have the same errors I have documented above.

I sparked up a CentOS 7.3 server on Bluemix for a few hours and the install with the same binaries worked just fine. I compared what yum has installed and installed all on my local CentOS 7.3 server and the same problem occurred. I changed my NIC device name swapping it from ens192 to match Bluemix and eth0 but the result is the same.

Update 05/05/17

This week I was lucky to visit the Dublin labs with a customer discussing Watson Workspace, Watson Work Services, XPages and Pink. I used a couple of hours of those two days to have a chat with David McDonagh and a colleague of his Bruno to look into the problems I was having with Mongo.

The crux of it was that the node I was using as the master, boot, worker and proxy was under a great deal of strain, mainly CPU strain, which seemed to be causing the problem. This would make sense since the differences between my ESXi server and Bluemix are the resources available to it.

I bumped up the resources available to the single node but although the install went OK the problems persisted. It wasn’t until today that I got it working but not with a single node but rather two nodes. Node 1 ran boot, master and proxy roles whilst node 2 was the worker node. I gave a generous helping of resources to both and the thankfully the installation went smoothly and more importantly the errors above are no more.

I have some further work to see how much I can scale the resources back because it does have an impact on my ESXi host and the other guests on it.

Advertisements