HOMEPAGE.SR_RESUME_TOKENS duplicate data in IBM Connections

I was checking things after migrating IBM Connections from version 4.0 to 5.5 and found the following error in the application server hosting Search. It didn’t stop the search index and returning results.

[11/18/16 18:46:00:604 GMT] 000001ba XmlBeanDefini I org.springframework.beans.factory.xml.XmlBeanDefinitionReader loadBeanDefinitions Loading XML bean definitions from class path resource [org/springframework/jdbc/support/sql-error-codes.xml]
[11/18/16 18:46:00:627 GMT] 000001ba SQLErrorCodes I org.springframework.jdbc.support.SQLErrorCodesFactory <init> SQLErrorCodes loaded: [DB2, Derby, H2, HSQL, Informix, MS-SQL, MySQL, Oracle, PostgreSQL, Sybase]
[11/18/16 18:46:00:645 GMT] 000001ba IndexingTaskB W com.ibm.connections.search.ejbs.indexing.IndexingTaskBean processTask CLFRW0395E: An error occurred while running the scheduled indexing task named 15min-search-indexing-task.
                                 com.ibm.connections.search.admin.index.exception.IndexingTaskException: org.springframework.jdbc.UncategorizedSQLException: SqlMapClient operation; uncategorized SQLException for SQL []; SQL state [null]; error code [0]; Error: executeQueryForObject returned too many results.; nested exception is java.sql.SQLException: Error: executeQueryForObject returned too many results.

I Googled “returned too many results” and it hinted at duplicate data in databases for different IBM products. Hmmm.

I enabled the following trace and ran a one of indexing task, SearchService.indexNow(“all_configured”)

com.ibm.connections.search.index.indexing.*=all: com.ibm.connections.search.seedlist.*=all: com.ibm.connections.httpClient.*=all

In trace.log I saw more information and just prior to the database exception I saw resume token messages

[11/18/16 18:46:00:580 GMT] 000001ba ResumeTokenIn > com.ibm.connections.search.seedlist.crawler.util.ResumeTokenInterpreter getInitialResumeToken ENTRY wikis
[11/18/16 18:46:00:580 GMT] 000001ba ResumeTokenIn > com.ibm.connections.search.seedlist.crawler.util.ResumeTokenInterpreter resumeTokenFromDate ENTRY Thu Jan 01 01:00:00 GMT 1970 wikis
[11/18/16 18:46:00:580 GMT] 000001ba ResumeTokenIn < com.ibm.connections.search.seedlist.crawler.util.ResumeTokenInterpreter resumeTokenFromDate RETURN AAAAAAAAAAA=
[11/18/16 18:46:00:580 GMT] 000001ba ResumeTokenIn < com.ibm.connections.search.seedlist.crawler.util.ResumeTokenInterpreter getInitialResumeToken RETURN AAAAAAAAAAA=

Resume tokens and references to duplicate data in the database, hmmm. Well HOMEPAGE has the SR_RESUME_TOKENS table. I opened it in dbVisualizer and saw this.

resumetoken2

It didn’t look right and compared it with other deployments and found that others only have the one row per application. The knowledge center details how to manipulate them but not clear them.

I shut down all application servers and backed up HOMEPAGE database. I then cleared the table

# su – db2inst1
$ cd /opt2/db2backups/55_homepage_resumetokens/homepage/
$ db2 backup db homepage to ‘/opt2/db2backups/55_homepage_resumetokens/homepage/’
$ db2 connect to homepage
$ db2 “DELETE FROM HOMEPAGE.SR_RESUME_TOKENS WHERE NODE_ID = ‘*****Node01:InfraCluster_server1′”
$ db2 connect reset

On startup the errors have gone and there is only one row per application.

Active users showing as inactive in All Connections search

A customer was seeing some users marked as inactive when using the All Connections search but when clicking through to the user’s profile they were active and active in communities and all over areas of Connections.

Looking into the database tables I found that the “state” of these users were correct, for example, in the EMPINST.GIVEN_NAME a particular user had a PROF_USRSTATE equalling 0 which means he’s active. In the EMPINST.EMPLOYEE table affected users had their email addresses which are normally removed when they are made inactive.

After some investigation I found that by simply activating them would mark them as active without any changes to the various tables in PEOPLEDB.

This got me thinking that the problem was an index issue and without knowing how many people were affected I suggested that the customer recreate the index. I provided them with steps of how to back it up, delete it from the file system and create a new one but even after the index created users were still showing as inactive.

Thankfully I had access to the Control Center and decided to look at all the PEOPLEDB tables, none were useful. I then started looking at the next logical database, HOMEPAGE. Interestingly, in the HOMEPAGE.PERSON table there is a column called STATE and the affected users had a value of 1 in that column. Running the following command changed the STATE to 0 and then searching for the user using the All Connections search showed him as active.

wsadmin.bat -lang jython -port 8879
execfile(“D:\IBM\WebSphere\AppServer\profiles\AppSrv01\config\bin_lc_admin\profilesAdmin.py”)
ProfilesService.activateUserByUserId(“E4BB9E9D-43D3-B5A4-8025-7433003EFACB”,email=”ben.williams@acme.com”, displayName=”Ben Williams”)

Going further I had to identify how many users were affected and the below query gave me the column values I needed to activate users who were marked inactive.

SELECT PERSON.DISPLAYNAME, PERSON.EXID, PERSON.USER_MAIL_LOWER FROM HOMEPAGE.PERSON AS PERSON WHERE PERSON.USER_MAIL_LOWER IS  NOT  NULL  AND PERSON.STATE = 1

The above query helped but there were still a number of users that were not in HOMEPAGE.PERSON and are in PEOPLEDB. These people were showing as inactive in the All Connections search BUT had never logged into Connections and hence their email addresses had not populated the HOMEPAGE database. These I had cross referenced manually as I don’t have the know how to build a query over different databases 😦

There is a bit of history here. The customer is importing users manually via populate_from dn_file because they want to control who is being added until their Connections 4 environment has been signed off for production and a custom TDI assembly line has been created. A few months ago sync_all_dns was run accidentally which meant that a 1000 or so users had to be identified and then removed from Connections. I believe that this (in some) way caused these problems.

Missing status updates in Connections

A customer had a particularly difficult and awkward problem. A number of user reported that they could not see the status updates of others users who were part of the same network.

This seemed to affect a handful of users in the following pattern. User A cannot see User B’s status updates whilst User C can see User B’s updates. The problem is associated with User B. To c0mplicate things further User A shares a network with User B but does not follow him whilst User C also shares a network but does follow User B. So if you follow User B you get to see their status updates in your Status Updates. You should not need to follow the user to see their status updates the fact that you’re sharing a network should be sufficient.

I logged a PMR which thankfully found it’s way into the ever helpful hands of David McCarthy. An ISSC colleague mentioned that this had been seen in IBM’s deployment of Connections (w3) by Luiz Benietz and others so getting to the bottom of this would kill two birds with one stone.

IBM had me extract data from PEOPLEDB and HOMEPAGE databases using the DB2 commands below to provide the PERSON_ID of the users within HOMEPAGE to make sure it matches with the PROF_GUID within PEOPLEDB which for User B is E833339D-AED2-425F-8600-64CEFD85A3A5.

Comparison of NR_NETWORK showed that the data in there was correct.

PEOPLEDB.EMPLOYEE
PROF_GUID = E833339D-AED2-425F-8600-64CEFD85A3A5

HOMEPAGE.PERSON
PERSON_ID = 0bdd7880-8d9a-4fe1-ab13-77c54bb429bc
EXID = E833339D-AED2-425F-8600-64CEFD85A3A5

Using the PERSON_ID I was then able to retrieve data from the HOMEPAGE.NR_NEWS_STATUS_NETWORK table which details all the status updates posted by that PERSON_ID which in the HOMEPAGE.NR_NEWS_STATUS_NETWORK table is identified by ACTOR_UUID. This table also shows the READER_ID (detailed below) as well as a BRIEF_DESC column which shows the first 200 (or so) characters of the status update.

db2 “select * from HOMEPAGE.NR_NEWS_STATUS_NETWORK where ACTOR_UUID = ‘0bdd7880-8d9a-4fe1-ab13-77c54bb429bc‘”

HOMEPAGE.NR_NEWS_STATUS_NETWORK
ACTOR_UUID = 0bdd7880-8d9a-4fe1-ab13-77c54bb429bc
READER_ID = PERSON_ID of the person that is consuming the status updates ie those listed in the READER_ID column should be able to see the status updates.

To find out the names of the people who can read the updates you will need to use the READER_ID from HOMEPAGE.NR_NEWS_STATUS_NETWORK and find out the name of the user from the HOMEPAGE.PERSON by using the READER_ID converted to PERSON_ID.

db2 “select DISPLAYNAME from HOMEPAGE.PERSON where PERSON_ID = ‘714fbcd9-1740-477b-972d-a6456a35e4ea'”

I contacted the user and they could indeed see User B’s status updates.

To see how many people User B has in their network the following command was run which produced a value of about 140.

db2 “select count(*) from HOMEPAGE.NR_NETWORK where PERSON_ID = ‘0bdd7880-8d9a-4fe1-ab13-77c54bb429bc‘”

IBM asked for the following trace to be enabled whilst User B posted a status update and in the trace.log an exception was seen during the processing of one of the threads.

*=info: java.sql.*=all: com.ibatis.sqlmap.*=all: com.ibm.lconn.homepage.*=all: com.ibm.lconn.news.*=all

It was suggested that 3.0.1.1 be installed but that would not be possible. The following fixes were applied which aimed to improve that code path to handle the case where the use records in the Homepage database are out of sync.

3.0.1.0     LO63965     VBUT8LNH89     Enhancement to News aggregation service for better handling of exceptional cases where user records become inconsistent with Profiles

3.0.1.0     LO66468     SCRD8QCDDK     Profiles Sync issue. Too many requests made on Event consumption to Profiles colleague feed

Unfortunately the problem was still apparent after the iFixes so more debugging was enabled, this time com.ibm.lconn.news.*=all: com.ibm.lconn.events.*=all

Thankfully this time the trace.log provided much more valuable information.

[30/05/12 16:14:59:301 BST] 00000067 StatusUpdateS 3 com.ibm.lconn.news.service.impl.StatusUpdateService insertStatusUpdate Going to insert into db with a batch process this number of records: 144 and BUCKET_SIZE_BATCH_INSERT: 100
[ … snip …]
[30/05/12 16:14:59:308 BST] 00000067 NewsStatusNet 3 com.ibm.lconn.news.data.dao.impl.ibatis.NewsStatusNetworkDao insertBatch – doInSqlMapClient rowsaffected 100
[30/05/12 16:14:59:308 BST] 00000067 NewsStatusNet < com.ibm.lconn.news.data.dao.impl.ibatis.NewsStatusNetworkDao insertBatch RETURN 100
[30/05/12 16:14:59:312 BST] 00000067 StatusUpdateS < com.ibm.lconn.news.service.impl.StatusUpdateService insertBatch RETURN 100
[ Here – it’s missing the insert for the remaining 44 records ]
[30/05/12 16:14:59:312 BST] 00000067 StatusUpdateS 3 com.ibm.lconn.news.service.impl.StatusUpdateService insertStatusUpdate Going to insert into db with a batch process this number of records: 3 and BUCKET_SIZE_BATCH_INSERT: 100″

The response from L3 was:

The root cause is a coding error in a routine performing batch insert. The issue should be reproducible for people in the following conditions:

  • The number of people in the intersection of the sets (colleagues, followers)  for the author of the status update is > 100
  • The number of colleagues – number of people in the intersection is > 100 (this is what happen for this very specific PMR)
  • The number of followers – number of people in the intersection is > 100.

So the problem we are seeing is because the user had more than 100 people in their network and/or followers and the code would not add the 44 others to the database. pressing further I asked for an iFix for this which is too complicated and the only option is to go to 3.0.1.1 CR1.

“Upon further review – the issue appears to be indirectly fixed in 3.0.1.1….We performed a code review in 3.0.1.1 – this code path was entirely re-written. The final section of the batch is inserted in the new code in 3.0.1.1.”

Up until this point the problem still hasn’t been resolved as updating Connections has not been planned yet but IBM support are fairly confident that 3.0.1.1 CR1 will iron out this. With regards to w3, David McCarthy was able to track down one user who reported the problem initially who now (after the upgrade) can see all status updates.