Let SMF Deal With That: An introduction to the Service Management Framework Ellard Roush Sun Microsystems
1
Agenda • Introduction to Service Management Framework (SMF) • Commands Demo • Service Manifest and Development • Q&A
2
Service Management pre-SMF
• Daemons started by scripts delivered into /etc/rc*.d, or by inetd (through /etc/inetd.conf) > Dependencies expressed through script numbering
(fragile, imprecise)
• Common operations like stopping a service now & forever required two different steps > Easy to forget one > Often undone by patching or upgrade anyway
• Daemon death ignored after start • OS didn't know consequences of memory errors in daemons – had to panic 3
What is SMF? • It is solution to all the problems on the last slide • It is half of Predictive Self Healing > It works with the Solaris Fault Manager to gracefully
recover from uncorrectable hardware errors
• It provides public, documented interfaces that ISVs and customers can use • It is used automatically > No need to turn it on > No way to turn it off 4
SMF basics: svc.startd • A new system daemon, svc.startd, has taken over most of init's responsibilities in starting system services • init still uses inittab, and /etc/rc*.d scripts are still run
• svc.startd can automatically restart services
> If sshd is “enabled”, then it is – started at boot – restarted if it dies (even if killed) > sshd
may be disabled by a single command
– stopped – not started at boot – not started after patch or upgrade 5
Service states • SMF lets the admin set whether each service is enabled or disabled • SMF keeps a state for each service > uninitialized > disabled > offline > online > degraded > maintenance
has not been evaluated yet service is disabled, not running enabled, waiting for dependencies enabled and running running below full performance service problem occurred 6
Service dependencies
• Services may declare dependencies on each other • svc.startd starts services in dependency order > Independent services started in parallel → faster boot
• Uncorrectable hardware errors handled better > Daemon is restarted > Services which depend on it can be restarted
• Enabled services hang out in the offline state until their dependencies are met > A new command answers “What services is service X waiting for?”
7
SMF configuration • Service meta-configuration (enabled status, state, dependencies, methods, etc.) is kept in the Service Configuration Facility (SCF), also known as the SMF repository • The repository is controlled by svc.configd, another new daemon • The repository is (currently) stored in /etc/svc/repository.db
svc.startd
SMF tools
svc.configd
repository.db
8
Service names: FMRIs • Services are named by Fault Management Resource Identifiers, or FMRIs > URI syntax svc:/system/cron:default
service name
instance name
• Note that while the service name usually contains slashes, there are no service directories! The namespace is flat. • Commands accept abbreviations (system/cron, cron) and glob patterns 9
Service instances • To allow configuration sharing, services are represented as instance nodes which are children of service nodes • Both service nodes and instance nodes can have properties • If an instance doesn't have property X, the service's property X is used • Dependencies on a service are satisfied if any of its instances are online > Frees dependents from knowing implementation
repository service
properties
instance
properties
instance
properties
service service
10
Commands: svcs(1) • Without arguments, lists state, state-time, and FMRI of services that are enabled; with -a, lists all services • Show dependencies (-d) and dependents (-D) • Show member processes (-p), additional details (-v/-l)
$ svcs STATE STIME .... online 18:18:30 online 18:18:29 .... $ svcs -p sendmail STATE STIME online 18:18:29 18:18:29 18:18:29
FMRI svc:/network/http:apache2 svc:/network/smtp:sendmail FMRI svc:/network/smtp:sendmail 100180 sendmail 100181 sendmail
$ svcs -d sendmail STATE STIME FMRI online 18:17:44 svc:/system/identity:domain online 18:17:52 svc:/network/service:default .... 11
Commands: svcs -x • Answers the question: What's wrong with my system? • Explains why services are offline, impact of non-running services • Gives pointers to knowledge documents, log files to help you determine the cause and find a remedy $ svcs telnet STATE offline
STIME
FMRI
7:38:17 svc:/network/telnet:default
$ svcs -x svc:/network/inetd:default (inetd) State: disabled since Wed Jan 25 07:38:17 2006 Reason: Disabled by an administrator. See: http://sun.com/msg/SMF-8000-05 See: inetd(1M) See: /var/svc/log/network-inetd:default.log Impact: 17 dependent services are not running.
(Use -v for list.)
12
Commands: svcadm(1M) • svcadm manipulates services
enables services, services start when dependencies are ready svcadm disable disables services svcadm restart stops and starts services svcadm refresh commits the current properties (to the running snapshot) and instructs the service to re-read its configuration svcadm clear signals that a service in maintenance has been fixed
> svcadm enable > > > >
• These commands are asynchronous: they issue commands to svc.startd and return immediately • With -s, enable & disable wait until completion (synchronous) • With -t, enable & disable are temporary (until next boot) 13
Commands: svccfg(1M) • Interactive access to properties and snapshots # svccfg svc:> select network/http:apache2 svc:/network/http:apache2> listprop ... general framework general/enabled boolean false ... start method start/exec astring "/lib/svc/method/http-apache2 start" start/timeout_seconds count 60 start/type astring method svc:/network/http:apache> editprop [ $EDITOR launches on a temporary file containing property settings ] svc:/network/http:apache2> exit # svcadm refresh apache2 # read latest configuration # svcadm restart apache2 # restart with latest configuration
14
Commands Demo
15
Troubleshooting • Service failures printed to console, syslog • Start with svcs -x output > Often gives concise reason > Provides link to knowledge document at sun.com > Gives path to log file • Use svcadm clear to clear maintenance state from repaired services • Use svccfg to tweak debugging variables: > svccfg -s system/foo setenv LD_PRELOAD libumem.so > svccfg -s system/foo setenv UMEM_DEBUG default
16
Recovery • If a single service is broken, make sure you've got the latest service config: svcadm refresh • Follow instructions from svcs -x pointer • Revert to a previous snapshot. $ svccfg -s system/cron:default svc:/system/cron:default> listsnap initial last-import previous running start svc:/system/cron:default> revert start svc:/system/cron:default> exit $ svcadm refresh cron $ svcadm restart cron
17
Delegated Restarters
• svc.startd's model isn't right for all services (inetd, clustering)
• SMF allows a service to be a delegated restarter for other services > Start, stop, and refresh services however they want > Responsible for managing instance states > svc.startd
still handles enabledness & dependencies, though
• inetd was reimplemented as a delegated restarter
> Methods are called inetd_start, inetd_stop, etc. > Services come online when inetd starts listening for them > The repository is used for configuration instead of inetd.conf
• A public delegated restarter API is planned
18
/etc/inetd.conf& inetadm(1M) • inetd.conf is no longer the primary configuration
• Most Solaris inet services have been converted • Entries in inetd.conf are automatically converted during install & upgrade by inetconv(1M) • If something adds an entry to /etc/inetd.conf, inetd(1M) will detect and issue a warning message > Run inetconv again to convert the new entry
• inetadm(1M) can be used to modify inetd-
specific properties
19
Service Development: Benefits • Services appear with SMF FMRIs > Visible using standard Solaris tools; your service appears
in administrative heads-up displays > Manageable using standard Solaris tools; admin can leverage existing knowledge to use your service > New generic tools developed will automatically see your service
• Built-in restart due to administrative error, software, or hardware fault • Participation in future software diagnosis capabilities 20
Service Development: Tasks • An existing Solaris service may be converted incrementally, and to different levels > Get it working: write a manifest using existing init script
as start/stop method > Handle error cases: refine methods > Full restartability: if service has multiple components, split them into individual services > Customized error/restart handling: avoid service restart if fault can be handled internally
21
Service manifests • A service is delivered by an XML file called a manifest > Describes dependencies, methods, and properties
• Manifests are delivered into /var/svc/manifest • During startup, new manifests in /var/svc/manifest and old manifests which have changed are loaded into the repository with the svccfg(1M) command • Do not edit manifests in /var/svc/manifest; make customizations with svccfg(1M), etc. > Repository customizations will be preserved across patch & upgrade 22
Manifest Creation • • • • • • • •
Name your service Identify whether your service may have multiple instances Identify how your service is started/stopped Determine faults to be ignored, if any Identify dependencies Identify dependents Create at least one instance Create template information to describe your service
23
Example Manifest: utmpd(1M) <service name='system/utmp' type='service' version='1'> <single_instance /> <dependency name='milestone' grouping='require_all' restart_on='none' type='service'> <service_fmri value='svc:/milestone/sysconfig'/> <dependent name='utmpd_multi-user' grouping='optional_all' restart_on='none'> <service_fmri value='svc:/milestone/multi-user'/> <exec_method type='method' name='start' exec='/lib/svc/method/svc-utmpd' timeout='60' /> <exec_method type='method' name='stop' exec=':kill' timeout='60' /> <stability value='Unstable' /> utmpx monitoring <documentation> <manpage title='utmpd' section='1M' manpath='/usr/share/man' /> 24
Method refinement • On failure, explain the problem to stdout or stderr (goes to a log) and exit with a non-0 code > If the failure is not transient, return
$SMF_EXIT_ERR_FATAL or $SMF_EXIT_ERR_CONFIG from /lib/svc/share/smf_include.sh
• On success, don't return until service is ready to serve clients > Dependent services may be started immediately
25
Commands: svcprop(1) • List properties of services and instances • Fetch individual properties for use in scripts $ svcprop network/http:apache2 ... physical/entities fmri svc:/network/physical:default physical/grouping astring optional_all physical/restart_on astring error physical/type astring service start/exec astring /lib/svc/method/http-apache2\ start start/timeout_seconds count 60 start/type astring method stop/exec astring /lib/svc/method/http-apache2\ stop stop/timeout_seconds count 60 stop/type astring method restarter/auxiliary_state astring none restarter/next_state astring none restarter/state astring disabled restarter/state_timestamp time 1102030556.737590000 $ svcprop -p enabled network/http:apache2 false
26
Development: Other Examples • Manifest DTD is documented; read it at /usr/share/lib/xml/dtd/service_bundle.dtd.1
• Explore /var/svc/manifest for similar services is a simple standalone daemon system/coreadm is a simple configuration service network/telnet is an inetd-managed daemon
> system/utmp > >
• Explore /lib/svc/method for similar methods
27
Service Packaging • Use i.manifest and r.manifest from /usr/sadm/install/scripts > (from S10U2 or OpenSolaris)
• Manifests delivered into /var/svc/manifest with type “f” and class “manifest” > Use /var/svc/manifest/site if the service is
specific to your site > Use another directory if you're an ISV, but remember a uniquifier (e.g. stock ticker)
• Methods delivered with your application binaries (/opt strongly recommended) 28
Developer References • Manifest development > /usr/share/lib/xml/dtd/service_bundle.dtd.1
> Look in /var/svc/manifest for examples
to create an empty inetd manifest smf_method(5) – information for writing methods inetd(1M) – inetd-specific method information
> inetconv -i file
> >
29
Additional Resources • Discussion and further information at http://opensolaris.org/os/community/smf
• Additional quickstart and developer documentation available at http://www.sun.com/bigadmin/content/selfheal/
• Solaris System Administration Guide has SMF information: http://docs.sun.com/app/docs/doc/817-1985 • smf(5) manpage introduces the facility • Blogs: > http://blogs.sun.com/sch > http://blogs.sun.com/lianep 30
Let SMF Deal With That: An introduction to the Service Management Framework Ellard Roush
31