Blank Sametime.ini

This morning it was brought to my attention that Sametime was not working for a customer. Last night the server was stopped and Windows patches applied and then started back up. In the Log.nsf I saw the following exceptions:

26/06/2012 23:35:39   Servlet engine initialization was successful
26/06/2012 23:35:40   HTTP JVM: com.lotus.sametime.configuration.DominoBootstrapServlet: init
26/06/2012 23:35:40   HTTP JVM: Exception in thread “Thread-2”
26/06/2012 23:35:40   HTTP JVM: java.lang.NullPointerException
26/06/2012 23:35:40   HTTP JVM:     at java.util.Hashtable.put(Hashtable.java:632)
26/06/2012 23:35:40   HTTP JVM:     at com.lotus.sametime.admin.util.INIStream.putIniPropertiesValue(Unknown Source)
26/06/2012 23:35:40   HTTP JVM:     at com.lotus.sametime.admin.util.INIStream.setSectionValue(Unknown Source)
26/06/2012 23:35:40   HTTP JVM:     at com.lotus.sametime.bootstrap.impl.DominoBootstrap.setProperty(Unknown Source)
26/06/2012 23:35:40   HTTP JVM:     at com.lotus.sametime.configuration.DominoBootstrapThread.setClusterName(Unknown Source)
26/06/2012 23:35:40   HTTP JVM:     at com.lotus.sametime.configuration.DominoBootstrapThread.runNotes(Unknown Source)
26/06/2012 23:35:40   HTTP JVM:     at lotus.domino.NotesThread.run(Unknown Source)
26/06/2012 23:35:40   HTTP JVM: com.lotus.sametime.configuration.DominoConfigurationServlet: init
26/06/2012 23:35:44   HTTP JVM: com.lotus.sametime.admin.authentication.AccessControlServlet: init
26/06/2012 23:35:44   HTTP JVM: com.lotus.sametime.admin.DominoAdminXPathRequestServletJAXP: init
26/06/2012 23:35:44   HTTP JVM: Error occurred while loading Servlet (admin)
26/06/2012 23:35:44   HTTP JVM:
—–Servlet Information—–
Servlet name:  admin
Servlet class: com.lotus.sametime.admin.DominoAdminXPathRequestServletJAXP
Servlet state: <Error>
Configuration parameters: Statistics.Servlet.Name=stats AccessControl.Roles=[SametimeAdmin];[SametimeMonitor];[DatabaseAdmin] ServletURL=admin Language.Resource.Relative.Path=strings/ Static.Resource.Relative.URL=/sametime/stadmin Default.Language.Suffix=en Static.Resource.Relative.Path=/domino/html/sametime/stadmin XML.File.Pa
26/06/2012 23:35:44   HTTP JVM: javax.servlet.ServletException: Exception while initializing admin servlet:null: Exception while initializing admin servlet:null
26/06/2012 23:35:44   HTTP JVM: com.lotus.sametime.meetingmanagement.remote.servlet.MMAPIServlet: init
26/06/2012 23:35:44   HTTP JVM: com.lotus.sametime.calendar.NotesCalendarServlet: init
26/06/2012 23:35:44   HTTP JVM: com.lotus.sametime.materials.servlets.FileUploadServlet: init
26/06/2012 23:35:44   HTTP JVM: com.lotus.sametime.materials.servlets.RAPFileServlet: init
26/06/2012 23:35:44   HTTP JVM: com.lotus.sametime.statistics.StatisticsServlet: init
26/06/2012 23:35:44   HTTP JVM: com.lotus.sametime.admin.policy.PolicyServlet: init
26/06/2012 23:35:45   HTTP JVM: com.lotus.sametime.admin.namechange.NameChangeServlet: init
26/06/2012 23:35:45   HTTP JVM: com.lotus.sametime.mtk.meeting.MeetingServlet: init
26/06/2012 23:35:45   HTTP JVM: com.lotus.sametime.telephonyservlet.TelephonyServlet: init
26/06/2012 23:35:55   HTTP JVM: com.ibm.sametime.userinfo.servlets.UserInfoServlet: init
26/06/2012 23:35:55   HTTP JVM: com.lotus.sametime.startup.SametimeStartupServlet: init
26/06/2012 23:35:58   HTTP Server: Started

I went down the route of checking that the various jars were still on the OS, I have seen this in the past here with other posts in the forums but all jars were present. I finished examining the Sametime.log in D:\Lotus\Domino and on closing it my eyes glanced to the Sametime.ini which was showing as 0KB……… Looking at the time date stamp I can see that the file was modified when the OS was starting up by cross-referencing the Windows event viewer.

Restoring the file from a back up and stopping and starting Domino did the trick but I’m no nearer knowing why….. Good old Windows…..

NFS file locking and it’s affect on Connections

A customer moved the shared storage (NAS) that Connections uses to new hardware. Check out by the testers showed that Connections was functioning correctly but by Monday morning it was apparent there were problems.

Looking at the various SystemOut.logs on monday morning I saw errors such as the following on both the Connections nodes.

[16/06/12 18:25:33:585 BST] 0000004c LocalExceptio E   CNTR0020E: EJB threw an unexpected (non-dered) exception during invocation of method “publishEvent” on bean “BeanId(Files#lc.events.publisher#EventPublisher, null)”. Exception data: com.ibm.lconn.events.internal.invoke.async.AsyncEventInerException: CLFWY0148E: Could not obtain an event connection
at com.ibm.lconn.events.internal.publish.impl.JMSPublisher.publishEvent(JMSPublisher.java9)

Caused by: javax.jms.JMSException: CWSIA0241E: An exception was received during the call to the method JmsManagedConnectionFactoryImpl.createConnection: com.ibm.websphere.sib.exception.SIResourceException: CWSIT0088E: There are currently no messaging engines in bus ConnectionsBus running. Additional failure information: CWSIT0103E: No messaging engine was found that matched the following parameters: bus=ConnectionsBus, targetGroup=null, targetType=BusMember, targetSignificance=Preferred, transpo
rtChain=InboundSecureMessaging, proximity=Bus..
at com.ibm.ws.sib.api.jms.impl.JmsManagedConnectionFactoryImpl.createConnection(JmsManagedConnectionFactoryImpl.java:195)
Caused by: com.ibm.websphere.sib.exception.SIResourceException: CWSIT0088E: There are currently no messaging engines in bus ConnectionsBus running. Additional failure information: CWSIT0103E: No messaging engine was found that matched the following parameters: bus=ConnectionsBus, targetGroup=null, targetType=BusMember, targetSignificance=Preferred, transportChain=InboundSecureMessaging, proximity=Bus.
at com.ibm.ws.sib.trm.client.TrmSICoreConnectionFactoryImpl.localBootstrap(TrmSICoreConnectionFactoryImpl.java:363)

Looking at the SystemOut.log after start up I found that the SibMessages stopped whilst trying to initialise the Log file. The desired output is further down.

Current:
[16/06/12 17:06:21:001 BST] 00000015 SibMessage    I   [ConnectionsBus:dogearcluster.000-ConnectionsBus] CWSIS1581I: The file store is attempting to initalise its log file: /opt/IBM/LotusConnections/data/shared/messageStores/dogearcluster/log/Log

Desired output:
[16/06/12 08:07:48:299 BST] 000000a1 SibMessage    I   [ConnectionsBus:filescluster.000-ConnectionsBus] CWSID0016I: Messaging engine filescluster.000-ConnectionsBus is in state Starting.
[16/06/12 08:07:48:435 BST] 000000a1 SibMessage    I   [ConnectionsBus:filescluster.000-ConnectionsBus] CWSIS1581I: The file store is attempting to initalise its log file: /opt/IBM/LotusConnections/data/shared/messageStores/filescluster/log/Log
[16/06/12 08:07:48:552 BST] 000000a1 ObjectManager I   CWSOM0002I: ObjectManager using logFile /opt/IBM/LotusConnections/data/shared/messageStores/filescluster/log/Log was warm started logFileType=FILE.
[16/06/12 08:07:48:676 BST] 000000a1 SibMessage    I   [ConnectionsBus:filescluster.000-ConnectionsBus] CWSIS1584I: The file store has initialised successfully.
[16/06/12 08:07:48:676 BST] 000000a1 SibMessage    I   [ConnectionsBus:filescluster.000-ConnectionsBus] CWSIS1585I: The file store is attempting to start its permanent store (PermanentStore) and temporary store (TemporaryStore).
[16/06/12 08:07:48:677 BST] 000000a1 SibMessage    I   [ConnectionsBus:filescluster.000-ConnectionsBus] CWSIS1586I: The file stores permanent store has been started successfully.
[16/06/12 08:07:48:677 BST] 000000a1 SibMessage    I   [ConnectionsBus:filescluster.000-ConnectionsBus] CWSIS1587I: The file stores temporary store has been started successfully.
[16/06/12 08:07:48:678 BST] 000000a1 SibMessage    I   [ConnectionsBus:filescluster.000-ConnectionsBus] CWSIS1559I: The data in the permanent store file occupies 1,434,580 bytes.
[16/06/12 08:07:50:362 BST] 000000a1 SibMessage    I   [ConnectionsBus:filescluster.000-ConnectionsBus] CWSID0016I: Messaging engine filescluster.000-ConnectionsBus is in state Started.

That evening the contents of the /messageStores/* were deleted as detailed in High number of publication points observed for Connections Files – no notifications. Profiles was started first to check that the new Log file was created successfully but the following errors were seen.

[18/06/12 23:15:02:094 BST] 00000012 SystemOut O at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
[18/06/12 23:15:02:095 BST] 00000012 SystemOut O at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:56)
[18/06/12 23:15:02:095 BST] 00000012 SystemOut O at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:39)
[18/06/12 23:15:02:095 BST] 00000012 SystemOut O at java.lang.reflect.Constructor.newInstance(Constructor.java:527)
[18/06/12 23:15:02:095 BST] 00000012 SystemOut O at com.ibm.ws.objectManager.utils.Utils.getImpl(Utils.java:44)
[18/06/12 23:15:02:095 BST] 00000012 SystemOut O at com.ibm.ws.objectManager.utils.FileLock.getFileLock(FileLock.java:41)
[18/06/12 23:15:02:095 BST] 00000012 SystemOut O at com.ibm.ws.objectManager.ObjectManagerState.<init>(ObjectManagerState.java:408)
[18/06/12 23:15:02:095 BST] 00000012 SystemOut O at com.ibm.ws.objectManager.ObjectManager.createObjectManagerState(ObjectManager.java:293)
[18/06/12 23:15:02:095 BST] 00000012 SystemOut O at com.ibm.ws.objectManager.ObjectManager.initialise(ObjectManager.java:237)
[18/06/12 23:15:02:095 BST] 00000012 SystemOut O at com.ibm.ws.objectManager.ObjectManager.<init>(ObjectManager.java:197)
[18/06/12 23:15:02:095 BST] 00000012 SystemOut O at com.ibm.ws.sib.msgstore.persistence.objectManager.PersistentMessageStoreImpl.start(PersistentMessageStoreImpl.java:354)
[18/06/12 23:15:02:095 BST] 00000012 SystemOut O at com.ibm.ws.sib.msgstore.impl.MessageStoreImpl.start(MessageStoreImpl.java:1518)
[18/06/12 23:15:02:095 BST] 00000012 SystemOut O at com.ibm.ws.sib.admin.impl.JsMessagingEngineImpl.start(JsMessagingEngineImpl.java:609)
[18/06/12 23:15:02:095 BST] 00000012 SystemOut O at com.ibm.ws.sib.admin.impl.HAManagerMessagingEngineImpl.activate(HAManagerMessagingEngineImpl.java:995)
[18/06/12 23:15:02:095 BST] 00000012 SystemOut O at com.ibm.ws.sib.admin.impl.JsActivationThread.run(JsActivationThread.java:92)
[18/06/12 23:15:02:095 BST] 00000012 SystemOut O Caused by: java.io.IOException: No locks available
[18/06/12 23:15:02:095 BST] 00000012 SystemOut O at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:914)
[18/06/12 23:15:02:096 BST] 00000012 SystemOut O at java.nio.channels.FileChannel.tryLock(FileChannel.java:973)
[18/06/12 23:15:02:096 BST] 00000012 SystemOut O at com.ibm.ws.objectManager.utils.FileLockImpl.<init>(FileLockImpl.java:58)

We then tried starting the other node which did start correctly, the Log files were created and the SystemOut.logs looked good. We then started the affected node and everything checked out.

The following day myself and a colleague asked for some data from the Linux servers and noticed in dmesg the following errors.

statd: server localhost not responding, timed out
lockd: cannot monitor 111.125.65.7
lockd: failed to monitor 111.125.65.7
statd: server localhost not responding, timed out
lockd: cannot monitor 111.125.65.7
lockd: failed to monitor 111.125.65.7
statd: server localhost not responding, timed out
lockd: cannot monitor 111.125.65.7
lockd: failed to monitor 111.125.65.7

The services mentioned are related to NFS and file locking which fits with what we are seeing and the change the customer made over the weekend.

The customer’s Unix team found a match on Redhat’s website which describes the condition and they wanted to remount and restart the nfslock service. Unix needed all applications stopped on the affected node that access the share and they ran the following commands.

/etc/init.d $ umount /net/connections
/etc/init.d $ ./nfslock stop
/etc/init.d $ ./nfslock start
/etc/init.d $ mount /net/connections

I started all the application servers on the affected node and then to test whether file locking was working I stopped two of the applications on the other node so that the affected node took over and thankfully the affected node showed healthy messages in the SystemOut.logs.

[19/06/12 15:48:01:243 BST] 00000047 SibMessage I [ConnectionsBus:forumcluster.000-ConnectionsBus] CWSIS1581I: The file store is attempting to initalise its log file: /opt/IBM/LotusConnections/data/shared/messageStores/forumcluster/log/Log
[19/06/12 15:48:01:406 BST] 00000047 ObjectManager I CWSOM0002I: ObjectManager using logFile /opt/IBM/LotusConnections/data/shared/messageStores/forumcluster/log/Log was warm started logFileType=FILE.
[19/06/12 15:48:01:524 BST] 00000047 SibMessage I [ConnectionsBus:forumcluster.000-ConnectionsBus] CWSIS1584I: The file store has initialised successfully.
[19/06/12 15:48:01:524 BST] 00000047 SibMessage I [ConnectionsBus:forumcluster.000-ConnectionsBus] CWSIS1585I: The file store is attempting to start its permanent store (PermanentStore) and temporary store (TemporaryStore).
[19/06/12 15:48:01:525 BST] 00000047 SibMessage I [ConnectionsBus:forumcluster.000-ConnectionsBus] CWSIS1586I: The file stores permanent store has been started successfully.
[19/06/12 15:48:01:525 BST] 00000047 SibMessage I [ConnectionsBus:forumcluster.000-ConnectionsBus] CWSIS1587I: The file stores temporary store has been started successfully.
[19/06/12 15:48:01:526 BST] 00000047 SibMessage I [ConnectionsBus:forumcluster.000-ConnectionsBus] CWSIS1559I: The data in the permanent store file occupies 1,255,938 bytes.
[19/06/12 15:48:01:526 BST] 00000047 SibMessage I [ConnectionsBus:forumcluster.000-ConnectionsBus] CWSIS1551I: The current minimum reserved size of the permanent store file is 209,715,200 bytes. The current maximum size is 524,288,000 bytes.
[19/06/12 15:48:01:527 BST] 00000047 SibMessage I [ConnectionsBus:forumcluster.000-ConnectionsBus] CWSIS1560I: The data in the temporary store file occupies 8,197 bytes.
[19/06/12 15:48:01:527 BST] 00000047 SibMessage I [ConnectionsBus:forumcluster.000-ConnectionsBus] CWSIS1555I: The current minimum reserved size of the temporary store file is 209,715,200 bytes. The current maximum size is 524,288,000 bytes.
[19/06/12 15:48:01:528 BST] 00000047 SibMessage I [ConnectionsBus:forumcluster.000-ConnectionsBus] CWSIS1550I: The current size of the log file is 104,857,600 bytes. The size in the configuration information of the log file is 104,857,600 bytes.
[19/06/12 15:48:01:532 BST] 00000047 SibMessage I [ConnectionsBus:forumcluster.000-ConnectionsBus] CWSIS1564I: The messaging engine, ME_UUID=392DF5A35DD2FFC2, INC_UUID=71C871C80536E9D9, is attempting to obtain an exclusive lock on the file store.
[19/06/12 15:48:01:533 BST] 00000047 SibMessage I [ConnectionsBus:forumcluster.000-ConnectionsBus] CWSIS1566I: A single previous owner was found in the messaging engine’s file store, ME_UUID=392DF5A35DD2FFC2, INC_UUID=1FE31FE301D82BD2
[19/06/12 15:48:01:542 BST] 00000047 SibMessage I [ConnectionsBus:forumcluster.000-ConnectionsBus] CWSIS1563I: The messaging engine, ME_UUID=392DF5A35DD2FFC2, INC_UUID=71C871C80536E9D9, has acquired an exclusive lock on the file store.
[19/06/12 15:48:01:573 BST] 00000047 SibMessage I [ConnectionsBus:forumcluster.000-ConnectionsBus] CWSIS1588I: The file store has started successfully.
[19/06/12 15:48:02:356 BST] 00000047 SibMessage I [ConnectionsBus:forumcluster.000-ConnectionsBus] CWSIP0212I: Messaging engine forumcluster.000-ConnectionsBus on bus ConnectionsBus is starting to reconcile the WCCM destination and link configuration.
[19/06/12 15:48:02:375 BST] 00000047 SibMessage I [ConnectionsBus:forumcluster.000-ConnectionsBus] CWSIP0213I: Messaging engine forumcluster.000-ConnectionsBus on bus ConnectionsBus has finished reconciling the WCCM destination and link configuration.
[19/06/12 15:48:02:488 BST] 00000047 SibMessage I [ConnectionsBus:forumcluster.000-ConnectionsBus] CWSJZ0021I: WS-ReliableMessaging is enabled for messaging engine forumcluster.000-ConnectionsBus on bus ConnectionsBus.
[19/06/12 15:48:02:542 BST] 00000047 SibMessage W [ConnectionsBus:forumcluster.000-ConnectionsBus] CWSIP0028W: Support for multiple subscribers to subscription commands##forums, on destination connections.platformCommands, has been disabled.
[19/06/12 15:48:02:549 BST] 00000047 SibMessage W [ConnectionsBus:forumcluster.000-ConnectionsBus] CWSIV1101W: The MDB maximum concurrency configuration value has been changed from the value 1 to 1.
[19/06/12 15:48:02:554 BST] 00000047 SibMessage W [ConnectionsBus:forumcluster.000-ConnectionsBus] CWSIP0181W: The maximum batch size for the consumer on destination connections.platformCommands, has been set to 1.
[19/06/12 15:48:02:556 BST] 00000047 SibTr$Suppres I CWSIU0006I: Future occurences of “CWSIP0181W: The maximum batch size for the consumer on destination connections.platformCommands, has been set to 1.” will be suppressed.
[19/06/12 15:48:02:557 BST] 00000047 SibMessage I [ConnectionsBus:forumcluster.000-ConnectionsBus] CWSIV0777I: A connection to messaging engine forumcluster.000-ConnectionsBus for destination connections.platformCommands on bus ConnectionsBus has been successfully created.
[19/06/12 15:48:02:558 BST] 00000047 SibMessage I [ConnectionsBus:forumcluster.000-ConnectionsBus] CWSID0016I: Messaging engine forumcluster.000-ConnectionsBus is in state Started.

I found the following statement in the Connections wiki, the customer is using NFS v3.

“In a multi-node cluster, you need to configure network share directories for content stores and search indexes. When using NFS, use NFS v4 because NFS v3 lacks advanced locking capability.”

Dave Hay came up against a similar problem in his post “Lotus Connections 2.5.0.1 – Problems with Service Integration Bus after configuring Shared Message Store” which is worth a read too.